It is very easy to generate geolocation charts of website accesses by using shell scripts based on access_log
. The following document offers a way to generate a datafile containing the number of hits based on all the existing ccTLDs at the current time of writing.
#!/bin/bash # Copyright (C) Wizardry and Steamworks. # # Licensed to Wizardry and Steamworks under # the GPLv3 GNU License which can be found at: # http://www.gnu.org/licenses/gpl.html # TLDS=(ac ad ae af ag ai al am an ao aq ar as at au aw ax az ba bb bd be bf bg bh bi bj bm bn bo br bs bt bv bw by bz ca cc cd cf cg ch ci ck cl cm cn co cr cs cu cv cx cy cz dd de dj dk dm do dz ec ee eg eh er es et eu fi fj fk fm fo fr ga gb gd ge gf gg gh gi gl gm gn gp gq gr gs gt gu gw gy hk hm hn hr ht hu id ie il im in io iq ir is it je jm jo jp ke kg kh ki km kn kp kr kw ky kz la lb lc li lk lr ls lt lu lv ly ma mc md me mg mh mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nc ne nf ng ni nl no np nr nu nz om pa pe pf pg ph pk pl pm pn pr ps pt pw py qa re ro rs ru rw sa sb sc sd se sg sh si sj sk sl sm sn so sr ss st su sv sy sz tc td tf tg th tj tk tl tm tn to tp tr tt tv tw tz ua ug uk us uy uz va vc ve vg vi vn vu wf ws ye yt yu za zm zw) SCORE=(0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0) for ip in `cat access_log | awk '{ print $1 }'`; do XX=`dig -x $ip +noall +answer +short | awk 'BEGIN { FS="." } { if ( NF > 0 ) print $(NF-1) }'` for TLD in ${!TLDS[*]}; do CC=${TLDS[$TLD]} if [ "$XX" == "$CC" ]; then (( SCORE[$TLD]++ )) # bash increment fi done done for TLD in ${!TLDS[*]}; do printf "%s = %d\n" ${TLDS[$TLD]} ${SCORE[$TLD]} >> stats.dat done
What this script does is to go through the access_log
of the webserver and extract the IP addresses of the page hits. After that, it performs a reverse DNS lookup on the IP addresses to determine the country of origin. While doing that it keeps a score of the TLDs matching the IPs, incrementing the old value every time a new lookup matches a TLD. After the whole access_log
is examined, the script dumps all the data to a file called stats.dat
which you can use further and generate charts.
The output format is given by the following line:
printf "%s = %d\n" ${TLDS[$TLD]} ${SCORE[$TLD]} >> stats.dat
which can be changed, for example to obtain a CSV datafile.
The result of the above formatting is the sample output:
... wf = 0 ws = 0 ye = 0 yt = 0 ...