10…
) such that if any such network already exists, just installing docker and starting a container will interfere with the local networks.The Watchtower container can be ran along with the other containers, will automatically monitor the various docker containers and will then automatically stop the containers, update the image and restart the containers.
Similarly, for swarms, shepherd is able to be deployed within a swarm in order to update containers for which a Wizardry and Steamworks guide exists.
The following command:
docker service ls -q | xargs -n1 docker service update --detach=false --force
will list all the services within a swarm and then force the services to be re-balanced and distributed between the nodes of the swarm.
After executing the command, all nodes can be checked with:
docker ps
to see which service got redistributed to which node.
One idea is to run the rebalancing command using crontab in order to periodically rebalance the swarm.
Typically, to open a console, the user would write:
docker run -it CONTAINER bash
where:
CONTAINER
is the name or hash of the container to start "bash" withinHowever, given that containers are distributed in a swarm, one should first locate on which node the container is running by issuing:
docker service ps CONTAINER
where:
CONTAINER
is a container running in the swarmThe output will display in one of the columns the current node that the container is executing on. Knowing the node, the shell of the node has to be accessed and then the command:
docker ps
can be used to retrieve the container ID (first column).
Finally, the console can be started within the distributed container by issuing:
docker exec -it CONTAINER_ID sh
where:
CONTAINER_ID
is the id of the container running in the swarm on the local nodeThe syntax is as follows:
docker login <REGISTRY_HOST>:<REGISTRY_PORT> docker tag <IMAGE_ID> <REGISTRY_HOST>:<REGISTRY_PORT>/<APPNAME>:<APPVERSION> docker push <REGISTRY_HOST>:<REGISTRY_PORT>/<APPNAME>:<APPVERSION>
If a worker cannot find the swarm manager when it starts up, at the current time of writing, Docker is made to terminate. This is problematic because the manager might go online after a while such that the workers should just wait to connect.
On some Linux distributions, such as Debian, Docker is started via a service file located at /lib/systemd/system/docker.service
and it can be copied to /etc/systemd/system
with some modifications in order to make SystemD restart Docker if it terminates.
On Debian, the service file is missing the RestartSec
configuration line, such that it should be added to /etc/systemd/system/docker.service
after being copied. Here is the full service file with the added line:
[Unit] Description=Docker Application Container Engine Documentation=https://docs.docker.com After=network-online.target docker.socket firewalld.service containerd.service Wants=network-online.target containerd.service Requires=docker.socket [Service] Type=notify # the default is not to use systemd for cgroups because the delegate issues still # exists and systemd currently does not support the cgroup feature set required # for containers run by docker EnvironmentFile=-/etc/default/docker ExecStart=/usr/sbin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock $DOCKER_OPTS ExecReload=/bin/kill -s HUP $MAINPID LimitNOFILE=1048576 # Having non-zero Limit*s causes performance problems due to accounting overhead # in the kernel. We recommend using cgroups to do container-local accounting. LimitNPROC=infinity LimitCORE=infinity # Uncomment TasksMax if your systemd version supports it. # Only systemd 226 and above support this version. TasksMax=infinity TimeoutStartSec=0 # set delegate yes so that systemd does not reset the cgroups of docker containers Delegate=yes # kill only the docker process, not all processes in the cgroup KillMode=process # restart the docker process if it exits prematurely Restart=on-failure StartLimitBurst=3 StartLimitInterval=60s RestartSec=10s [Install] WantedBy=multi-user.target
With this change, SystemD will try to bring Docker up every 10s after it has failed. Unfortunately, this fix has to be applied for all nodes in a Docker swarm.
Unfortunately, services do not spread evenly through the swarm such that re-balancing is necessary. In fact, the strategy of distributing services across the swarm is surprisingly bad, with the manager node taking upon itself most of the services and with very few left over to the last node of the swarm. It seems Docker spreads services on a bucket-fill-like strategy where services are only spread out if the current node is deemed somehow full.
Irrespective of the lack of a strategy, here is one constructed command:
docker service ls | \ awk '{ print $1 }' | \ tail -n +2 | \ xargs docker service ps --format "{{.Node}}" --filter "desired-state=running" | \ awk ' { node[$0]++ } END { for (i in node) print node[i] } ' | \ awk '{ x += $0; y += $0 ^ 2 } END { print int(sqrt( y/NR - (x/NR) ^ 2)) }' | \ xargs -I{} test {} -gt 2 && docker service ls -q | xargs -n1 docker service update --detach=false --force
that performs the following operations in order:
In other words, the distribution strategy of the cluster is to place an equal share of services per available nodes.
Intuitively, the command can be placed in a cron script and, compared to just calling the swarm re-distribution command, the script should have no effect when the services are distributed evenly across the nodes due to the standard deviation falling well under (with being the theoretical point that the standard deviation should be when the services are evenly spread out).
Some packages have to be compiled manually such that it is beneficial to use a distributed compiler in order to distribute the compilation workload across multiple computers. However, the system should be flexible enough to include the edge case when a distributed compiler is not available.
To that end, here is a Dockerfile
that is meant to define some variables such that "distcc" will be used to distribute the compilation across a range of computers:
FROM debian:latest AS builder # define compilation variables ARG DISTCC_HOSTS="" ARG CC=gcc ARG CXX=g++ # install required packages RUN apt-get --assume-yes update && apt-get --assume-yes upgrade && \ apt-get --assume-yes install \ build-essential \ gcc \ g++ \ automake \ distcc # ... compile ... RUN DISTCC_HOSTS="${DISTCC_HOSTS}" CC=${CC} CXX=${CXX} make
and the invocation will be as follows:
docker build \ -t TAG \ --build-arg DISTCC_HOSTS="a:35001 b:35002" \ --build-arg CC=distcc \ --build-arg CXX=distcc \ .
where:
TAG
is a tag to use for the build (can be used to upload to a registry),DISTCC_HOSTS
, CC
and CXX
are the environment variables setting the compiler to distcc
and the hosts to be used to compile (in this case, computers a
and b
listening on port 35001
and 35002
)
If you would like a ready-made container for distcc
, you can use the Wizardry and Steamworks build.
Even though multiple replicas of a container can exist even on the same system or spread out through a swarm, due to the nature of TCP/IP a single port might be allocated at the same time for any single process, such that when starting a series of clones of a program, there must exist a way to specify a port range or a series of ports for each instance of the program being launched.
The syntax is as follows:
START_PORT-END_PORT:CONTAINER_PORT
where:
START_PORT
and END_PORT
delimit a range from a starting port to an ending port that the clones of the programs will use to select their listening outbound port and,CONTAINER_PORT
represents the port for the program running within the container that will be exposed.Interestingly, this feature does not work as expected and whilst the ports will be used for all nodes within the swarm for all replicas of the service, all ports will be replicated by all nodes such that accessing one port within the port range successively will lead to a service on a different node within the docker swarm. If stickyness is desired, the current solution at the time of writing is to either use jwilder/nginx-proxy or to just declare multiple services of the same image with the constraints set appropriately to each node in the swarm.
Depending on the application, in some rare cases some containers must be restarted. For example, invidious documents stat that invidious should be restarted at least once per day or invidious will stop working. There are multiple ways to accomplish that, either by using the system scheduling system, such as cron on Linux, but the most compact seems to use docker-cli
and trigger a restart of the service. For example, the following additional service can be added to the invidious service in order to restart invidious at 8pm:
invidious-restarter: image: docker:cli restart: unless-stopped volumes: ["/var/run/docker.sock:/var/run/docker.sock"] entrypoint: ["/bin/sh","-c"] command: - | while true; do if [ "$$(date +'%H:%M')" = '20:00' ]; then docker restart invidious fi sleep 60 done
When running under a swarm, it gets a little more complicated due to the controlling service only being present on master nodes such that the supplementary service has to only be deployed on master nodes in order to restart the service. Here is the modified snippet:
invidious-restarter: image: docker:cli restart: unless-stopped volumes: ["/var/run/docker.sock:/var/run/docker.sock"] entrypoint: ["/bin/sh","-c"] deploy: replicas: 1 placement: constraints: - node.role == manager command: - | while true; do if [ "$$(date +'%H:%M')" = '20:00' ]; then docker service ls --filter name=general_invidious --format "{{.ID}}" | \ head -n 1 | \ xargs -I{} docker service update --force --with-registry-auth "{}" fi sleep 60 done
that will make sure that the general_invidious
service will be restarted every day at 8pm.
Here are some useful changes to mitigate various issues with running a Docker swarm.
Docker does not take the available amount of RAM per machine into account such that any process that is distributed to a machine that will end up consuming more RAM than the machine has available will simply end up using up all the RAM on that machine.
The typical Linux mitigation is the "Out of Memory OOM Killer", a kernel process that monitors processes and kills off a process as a last resort in order to prevent the machine from crashing. Unfortunately, the Liunx OOM killer has a bad reputation by either firing too late when the machine is already too hosed to be able to even kill a process, either by "really being the last resort" meaning that the OOM killer will not be too efficient at killing the right process and wait too long while heavy processes are already running (desktop environment, etc).
The following packages can be used to add an additional OOM killer to systems within a Docker swarm, all of these being userspace daemons:
systemd-oomd
, oomd
or earlyoom
Furthermore, the following sysctl parameter:
vm.oom_kill_allocating_task=1
when added to the system sysctl, will make Linux kill the process allocating the RAM that would overcommit instead of using heuristics and picking some other process to kill, which makes the most sense for the described Docker swarm scenario.
The following script was written in order to query the currently running containers on a machine running Docker and then create a directory and write within that directory PID files containing the PIDs of the services being ran within the Docker container.
The script was used for monitoring services on multiple machines in a Docker swarm where it was found necessary to retrieve the PID of the services within a Docker container without breaking container isolation.
#!/usr/bin/env bash ########################################################################### ## Copyright (C) Wizardry and Steamworks 2024 - License: MIT ## ########################################################################### # path to the swarm state directory where PID files will be stored STATE_DIRECTORY=/run/swarm if [ ! -d $STATE_DIRECTORY ]; then mkdir -p $STATE_DIRECTORY fi DOCKER_SWARM_SERVICES=$(docker container ls --format "{{.ID}}" | \ xargs docker inspect -f '{{.State.Pid}} {{(index .Config.Labels "com.docker.stack.namespace")}} {{(index .Config.Labels "com.docker.swarm.service.name")}}') while IFS= read -r LINE; do read -r PID NAMESPACE FULLNAME <<< "$LINE" IFS='_' read -r NAMESPACE NAME <<< "$FULLNAME" PIDFILE="$STATE_DIRECTORY/$NAME"".pid" if [ ! -f "$PIDFILE" ]; then echo $PID >"$PIDFILE" continue fi test $(cat "$PIDFILE") -eq $PID || \ echo $PID >"$PIDFILE" done <<< "$DOCKER_SWARM_SERVICES"
It seems that the Docker logs
command will print out the logs on stderr
such that piping the output to grep
or other tools will not work properly. In order to making piping work, stderr
has to be redirected to stdout
and then piped to whatever tool needs to be used:
docker service logs --follow general_mosquitto 2>&1 | grep PING