Typical to websites, analytics can provide insights that reveal usage patterns of websites, usage patterns that can then be leveraged to enhance the website or its services. Also typically, most analytics and insights run on the web-browser by injecting some JavaScript code, image or cookies that are responsible for tracking the user on the website, and sometimes off the domain to be tracked.
Most of the time injecting code using invasive analytics are not needed and log files provided by the web server of choice are sufficient to build a good overview of the website usage. The same data is contained more-or less within the server log files, as it would be by using invasive analytics, such that it becomes a aggregating the data and making some useful sense of it for a given purpose.
The obvious advantage to using server-side analytics is that user browsers are not overloaded with various invasive trackers that are most of time time blocked anyway due to people using ad-blockers.
The setup is going to use Apache as the web-server (but the same procedure applies more or less to NGINX), some software called goaccess
(a package that is already provided on the most recent Debian release), and a systemd service file that will be responsible for restarting goaccess
.
goaccess
is a tool that processes web-server log files and is able to then display an ncurses real-time overview of website accesses on the terminal as well as being capable of generating an HTML file, with an embedded WebSockets connection to goaccess
, that can then be read by a browser in order to display website access metrics in real time.
The systemd service file is created at /etc/systemd/system/goaccess
and has the following contents:
[Unit] Description=goaccess After=network.target [Service] ExecStart=/bin/sh -c '/usr/bin/goaccess /var/log/apache2/*access_log* --log-format=COMMON --real-time-html -o /var/www/goaccess/index.html --geoip-database /usr/share/GeoIP/GeoLite2-Country.mmdb --persist --db-path /var/www/goaccess/' Restart=always RestartSec=10 StandardOutput=journal StandardError=journal SyslogIdentifier=goaccess User=root Group=root Environment=PATH=/usr/bin/:/usr/local/bin/ WorkingDirectory=/var/www/goaccess [Install] WantedBy=multi-user.target
where the main attraction is the command that is executed to start goaccess
:
/usr/bin/goaccess /var/log/apache2/*access_log* --log-format=COMMON --real-time-html -o /var/www/goaccess/index.html --geoip-database /usr/share/GeoIP/GeoLite2-Country.mmdb --persist --db-path /var/www/goaccess/
where:
/var/log/apache2/*access_log*
is a glob-path that will make goaccess read all the access logs under /var/log/apache2
(incidentally, the usage of the wildcard makes it necessary to start goaccess using /bin/sh
),–log-format=COMMON
specifies that the access logs read are in the "common" Apache2 format,–real-time-html
will make goaccess generate an HTML file that additionally contains a connector via WebSockes to goaccess that will be running as a daemon and updating the statistics in real-time,-o /var/www/goaccess/index.html
tells goaccess to create the HTML file at /var/www/goaccess/index.html
,–geoip-database /usr/share/GeoIP/GeoLite2-Country.mmdb
will make goaccess use the MaxMind geoip database in order to be able to transform IPs to countries,–persist
will make goaccess persist its data and store it in databases placed in /var/www/goaccess
via the option –db-path /var/www/goaccess/
With the SystemD service file in place goaccess can be enabled and started via systemctl
:
systemctl enable goaccess
systemctl start goaccess
Note that the canonical way of retrieving MaxMind GeoIP databases is to use their provided geoipupdate
script (also available as a package for distributions such as Debian) and that by using geoipupdate
the downloading and updating of MaxMind GeoIP databases will be taken care of automatically.
goaccess does not contain a built-in web-serer such that the generated HTML file should be served by any web-server of choice. In other words, the file defined as /var/www/goaccess/index.html
in the SystemD service file, should be made available.
While serving the /var/www/goaccess/index.html
is trivial, note that some access control should be added in order to make accessing the file private. In this case, it was chosen to dedicate an entire subdomain to goaccess such that /var/www/goaccess
was configured as a virtual-host root and in order to protect the virtual host, IP restrictions were placed (although, adding authentication would also have been possible).
Opening up the browser inspector tool and checking the console for errors such as could not connect to wss://
is an indicator that the WebSockets connection to the goaccess instance cannot be established. In most cases, the error is due to SSL certificates having to be specified in the goaccess configuration.
Editing /etc/goaccess/goaccess.conf
allows TLS/SSL certificates to be specified:
# Path to TLS/SSL certificate. # Note that ssl-cert and ssl-key need to be used to enable TLS/SSL. # ssl-cert /etc/letsencrypt/live/website/cert.pem # Path to TLS/SSL private key. # Note that ssl-cert and ssl-key need to be used to enable TLS/SSL. # ssl-key /etc/letsencrypt/live/website/privkey.pem
In this case, letsencrypt certificates have been used, but the snakeoil self-signed certificates should be enough to make the secure-WebSockets URL path work (wss://
).