About

Typical to websites, analytics can provide insights that reveal usage patterns of websites, usage patterns that can then be leveraged to enhance the website or its services. Also typically, most analytics and insights run on the web-browser by injecting some JavaScript code, image or cookies that are responsible for tracking the user on the website, and sometimes off the domain to be tracked.

Most of the time injecting code using invasive analytics are not needed and log files provided by the web server of choice are sufficient to build a good overview of the website usage. The same data is contained more-or less within the server log files, as it would be by using invasive analytics, such that it becomes a aggregating the data and making some useful sense of it for a given purpose.

The obvious advantage to using server-side analytics is that user browsers are not overloaded with various invasive trackers that are most of time time blocked anyway due to people using ad-blockers.

Setup

The setup is going to use Apache as the web-server (but the same procedure applies more or less to NGINX), some software called goaccess (a package that is already provided on the most recent Debian release), and a systemd service file that will be responsible for restarting goaccess.

goaccess is a tool that processes web-server log files and is able to then display an ncurses real-time overview of website accesses on the terminal as well as being capable of generating an HTML file, with an embedded WebSockets connection to goaccess, that can then be read by a browser in order to display website access metrics in real time.

SystemD Service File

The systemd service file is created at /etc/systemd/system/goaccess and has the following contents:

[Unit]
Description=goaccess
After=network.target
 
[Service]
ExecStart=/bin/sh -c '/usr/bin/goaccess /var/log/apache2/*access_log* --log-format=COMMON --real-time-html -o /var/www/goaccess/index.html --geoip-database /usr/share/GeoIP/GeoLite2-Country.mmdb --persist --db-path /var/www/goaccess/'
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal
SyslogIdentifier=goaccess
User=root
Group=root
Environment=PATH=/usr/bin/:/usr/local/bin/
WorkingDirectory=/var/www/goaccess
 
[Install]
WantedBy=multi-user.target

where the main attraction is the command that is executed to start goaccess:

/usr/bin/goaccess /var/log/apache2/*access_log* --log-format=COMMON --real-time-html -o /var/www/goaccess/index.html --geoip-database /usr/share/GeoIP/GeoLite2-Country.mmdb --persist --db-path /var/www/goaccess/

where:

  • /var/log/apache2/*access_log* is a glob-path that will make goaccess read all the access logs under /var/log/apache2 (incidentally, the usage of the wildcard makes it necessary to start goaccess using /bin/sh),
  • –log-format=COMMON specifies that the access logs read are in the "common" Apache2 format,
  • –real-time-html will make goaccess generate an HTML file that additionally contains a connector via WebSockes to goaccess that will be running as a daemon and updating the statistics in real-time,
  • -o /var/www/goaccess/index.html tells goaccess to create the HTML file at /var/www/goaccess/index.html,
  • –geoip-database /usr/share/GeoIP/GeoLite2-Country.mmdb will make goaccess use the MaxMind geoip database in order to be able to transform IPs to countries,
  • –persist will make goaccess persist its data and store it in databases placed in /var/www/goaccess via the option –db-path /var/www/goaccess/

With the SystemD service file in place goaccess can be enabled and started via systemctl:

systemctl enable goaccess
systemctl start goaccess

Note that the canonical way of retrieving MaxMind GeoIP databases is to use their provided geoipupdate script (also available as a package for distributions such as Debian) and that by using geoipupdate the downloading and updating of MaxMind GeoIP databases will be taken care of automatically.

Serving the HTML File

goaccess does not contain a built-in web-serer such that the generated HTML file should be served by any web-server of choice. In other words, the file defined as /var/www/goaccess/index.html in the SystemD service file, should be made available.

While serving the /var/www/goaccess/index.html is trivial, note that some access control should be added in order to make accessing the file private. In this case, it was chosen to dedicate an entire subdomain to goaccess such that /var/www/goaccess was configured as a virtual-host root and in order to protect the virtual host, IP restrictions were placed (although, adding authentication would also have been possible).

Real-Time Display

Opening up the browser inspector tool and checking the console for errors such as could not connect to wss:// is an indicator that the WebSockets connection to the goaccess instance cannot be established. In most cases, the error is due to SSL certificates having to be specified in the goaccess configuration.

Editing /etc/goaccess/goaccess.conf allows TLS/SSL certificates to be specified:

# Path to TLS/SSL certificate.
# Note that ssl-cert and ssl-key need to be used to enable TLS/SSL.
#
ssl-cert /etc/letsencrypt/live/website/cert.pem

# Path to TLS/SSL private key.
# Note that ssl-cert and ssl-key need to be used to enable TLS/SSL.
#
ssl-key /etc/letsencrypt/live/website/privkey.pem

In this case, letsencrypt certificates have been used, but the snakeoil self-signed certificates should be enough to make the secure-WebSockets URL path work (wss://).


web/server-side_analytics_using_goaccess.txt · Last modified: 2024/10/03 08:14 by office

Access website using Tor Access website using i2p Wizardry and Steamworks PGP Key


For the contact, copyright, license, warranty and privacy terms for the usage of this website please see the contact, license, privacy, copyright.