Table of Contents

About

One of the problems with running software within isolated containers and in particular given the ease with which programs can be deployed, one of the problem that arises is the difficulty to track the individual errors that take place within a container that might render the running program defunct. In order to counter that, typically docker allows a healthcheck to be implemented that can be used to test whether the program within the container is running properly but many times the healthcheck is not rigorous enough and can only really report in practice whether a service port is open or whether some part of the service is functional without being sure in case the service partially fails.

This section uses "graylog", "elasticsearch" and "mongodb" in order to centralize log files from all containers running within a swarm such that the logs can then be inspected or grok patterns built in order to take actions upon the errors being reported from all containers.

Graylog

Graylog is software that is meant to centralize logs from various sources via different sinks. One such sink is the gelf sink that will be used to pass the logs from docker containers to Graylog. Furthermore, compared to logstash, Graylog also has built-in interface along with various embedded functionality such as triggering actions depending on log file matches, all of which can be configured with just a browser, which makes Graylog much preferable and contained than logstash. Setting up Graylog varies depending on the environment that is desired with distributions typically packaging all the necessary requirements and the developers of Graylog themselves having pre-made binary packages available for various distributions.

Since the guide will focus on monitoring the containers within a docker swarm, Graylog will be made to run within the swarm itself as well, in order to keep everything together without too many external dependencies.

The following is a Graylog compose file that is tailored with a single monolithical build without data nodes that should be suitable for most small to medium swarms.

version: '3.8'
services:
  graylog:
    image: graylog/graylog:6.1.2
    user: root
    ports:
      - "5044:5044/tcp"   # Beats
      - "5140:5140/udp"   # Syslog
      - "5140:5140/tcp"   # Syslog
      - "5555:5555/tcp"   # RAW TCP
      - "5555:5555/udp"   # RAW UDP
      - "9000:9000/tcp"   # Server API
      - "12201:12201/tcp" # GELF TCP
      - "12201:12201/udp" # GELF UDP
      #- "10000:10000/tcp" # Custom TCP port
      #- "10000:10000/udp" # Custom UDP port
      - "13301:13301/tcp" # Forwarder data
      - "13302:13302/tcp" # Forwarder config
    volumes:
      - /mnt/docker/data/graylog/data:/usr/share/graylog/data/data
      - /mnt/docker/data/graylog/journal:/usr/share/graylog/data/journal
    environment:
      GRAYLOG_PASSWORD_SECRET: "..."
      GRAYLOG_ROOT_PASSWORD_SHA2: "..."
      GRAYLOG_NODE_ID_FILE: "/usr/share/graylog/data/data/node-id"
      GRAYLOG_HTTP_BIND_ADDRESS: "0.0.0.0:9000"
      GRAYLOG_HTTP_EXTERNAL_URI: "http://localhost:9000/"
      GRAYLOG_MONGODB_URI: "mongodb://...:...@docker.tld/graylog"
      GRAYLOG_ELASTICSEARCH_HOSTS: "http://docker.tld:9200"
      GRAYLOG_ELASTICSEARCH_INDEX_PREFIX: "graylog"
    deploy:
      replicas: 1
      placement:
        max_replicas_per_node: 1

The following changes have to be made:

MongoDB

MongoDB is a NOSQL database system that Graylog needs to be setup. Unfortunately, MongoDB since version 5 requires that the platform that it is running on to have a CPU with AVX extensions. That is very weird, such that a service with MongoDB compiled without AVX is preferrable. There are several available but for this tutorial the l33tlamer/mongodb-without-avx image was chosen at the latest version.

The following is an example compose file that can be used with docker that will launch a MongoDB variant with the AVX requirement compiled out.

version: '3.9'
services:
  mongo:
    image: l33tlamer/mongodb-without-avx:6.2.1
    healthcheck:
      test: echo 'db.stats().ok' | mongo localhost:27017/test --quiet
      interval: 10s
      timeout: 10s
      retries: 5
    user: root
    volumes:
      - /mnt/docker/data/mongo/db:/data/db
      - /mnt/docker/data/mongo/configdb:/data/configdb
      - /mnt/docker/data/mongo/init:/docker-entrypoint-initdb.d/:ro
    ports:
      - 27017:27017
    environment:
      - MONGO_INITDB_ROOT_USERNAME=myuser
      - MONGO_INITDB_ROOT_PASSWORD=mypassword
    deploy:
      replicas: 1
      placement:
        max_replicas_per_node: 1

Note that the path /mnt/docker/data/mongo/ must be adjusted to point to some storage space where docker can save the mongo files and that MONGO_INITDB_ROOT_USERNAME and MONGO_INITDB_ROOT_PASSWORD must be adjusted in order to set the root username and password.

In order to create a database to be used with Graylog, the instructions on the MongoDB FUSS page can be followed. The MongoDB will then be accessed from Graylog, by changing the Graylog docker compose file and setting GRAYLOG_MONGODB_URI to an URL corresponding to the username, password and the database generated.

mongodb://user:pass@docker.tld/graylog
           ^     ^     ^         ^
           |     |     |         |
           +     |     +         |
       username  |   docker      +
                 |  hostname   database
                 +
             password

Setting up GELF Input and Modifying Swarm Services

After running Graylog, a GELF input has to be set up by following the menu SystemInputs and adding a new GELF UDP input. Choosing UDP over TCP has the benefit that UDP maximizes the data payload that can be carried within an UDP packet at the expense of data consistency. However, in this setup it is more important to pass data fast and not really ensure that all log entries produced are received consistently.

The next part involves editing docker compose files in order to set logging to use GELF and then pass the log output to the Graylog instance. The following could be an example on how to accomplish the former:

    logging:
      driver: gelf
      options:
        gelf-address: "udp://docker.tld:12201"
        tag: ...

where:

The tag here filled in with dots can be pretty much anything and helps identification on Graylog where logs can be observed and acted upon depending on various properties and pattern matching.

Conclusion

Managing large clusters can be difficult without being able to centrally audit the state of the individual machines and the software running on the machines. Centralizing logs with Graylog is not a complete solution because Graylog does not include any performance statistics about the swarm nodes nor the containers themselves. However, Graylog is useful in order to check upon the consistency of the programs running within the container which is a perfect approach for the Docker paradigm itself that shifts the focus from technology to data particularly given that the programs and software itself is what generates data in this case.