This tutorial demonstrates how to use Linux load-balancing and high-availability in order to distribute services amongst docker containers in a swarm in order to allow a more fine-grained control. The LVS topology chosen will be a direct-connect with the swarm nodes capable of answering requests themselves without needing to go back through the LVS director.
Here is an overview of a typical layout where caddy stands in front of a Docker swarm with services defined on multiple machines:
| connection v +-----+-----+ | caddy | +-----+-----+ | +---------+---------+ | | | ...................................... | | | +---+---+ +---+---+ +---+---+ | s_1 | | s_2 | | s_3 | ... Docker swarm +-------+ +-------+ +-------+ ......................................
The corresponding caddy configuration is something along the lines:
*.domain.duckdns.org, domain.duckdns.org { ... handle @service_1 { reverse_proxy docker.lan:8000 } handle @service_2 { reverse_proxy docker.lan:8001 } ... }
where services such as service_1
are reverse-proxied into the swarm at docker.lan
on the different ports that they listen to.
Now, the docker address docker.lan
can be defined with some internal DNS server using A records for each node in the swarm:
docker.lan. A 192.168.1.81 docker.lan. A 192.168.1.82 docker.lan. A 192.168.1.83
such that upon every DNS lookup of the docker.lan
host, any of the three IP addresses are returned in round-robin fashion.
One of the problems in this setup is that DNS requests end up cached, such that accesses into the Docker swarm will occur predominantly over one of the IP addresses instead of being spread out to the entire set of IP addresses.
If one changes the caddy configuration to:
*.domain.duckdns.org, domain.duckdns.org { ... handle @service_1 { reverse_proxy 192.168.1.81:8000 } handle @service_2 { reverse_proxy 192.168.1.81:8001 } ... }
where all access go to 192.168.1.81
the problem remains the same, even if Docker ensures that the request will internally be distributed to the apropriate node within the Docker swarm.
A different topology will be the following:
| connection v +-----+-----+ | caddy | +-----+-----+ | +-----+-----+ | IPVS | +-----+-----+ | +---------+---------+ | | | ...................................... | | | +---+---+ +---+---+ +---+---+ | s_1 | | s_2 | | s_3 | ... Docker swarm +-------+ +-------+ +-------+ ......................................
where Linux IPVS functions as a load-balancer meant to spread out traffic among the nodes s_1
through to s_3
.
The following configuration will be implemented:
| connection v +-----+-----+ | caddy | +-----+-----+ | +-----+-----+ | IPVS | VIP: 192.168.1.100 +-----+-----+ | +---------------------------+--------------------------+ | | | ............................................................................... | | | RIP: 192.168.1.101 | RIP: 192.168.1.102 | RIP: 192.168.1.103 | VIP (lo:0): 192.168.1.100 | VIP (lo:0): 192.168.1.100 | VIP (lo:0): 192.168.1.100 | | | | +---+---+ +---+---+ +---+---+ | s_1 | | s_2 | | s_3 | ... +-------+ +-------+ +-------+ ...............................................................................
where:
192.168.1.100
is a virtual IP created on the director,192.168.1.101
, 192.168.1.102
and 192.168.1.103
are the real IP addresses of nodes within the Docker swarm,192.168.1.100
is set using an interfacealias
Just for testing, the setup will be created using command line tools without any sort of persistence.
ipvsadm
needs to be installed first; on Debian, the command would be:
apt-get install ipvsadm
following which, the next commands:
ipvsadm -A -f 80 -s lc ipvsadm -a -f 80 -r 192.168.1.101 -g ipvsadm -a -f 80 -r 192.168.1.102 -g ipvsadm -a -f 80 -r 192.168.1.103 -g
will:
-s lc
,-g
)
On the director, the iptables marking line can be made permanent such that the rule will be restored on reboot by installing the netfilter-persistent
package. Following the example, the line that must be added to /etc/iptables/rules.v4
is the following:
-A PREROUTING -d 192.168.1.100 -j MARK --set-mark 80
Finally, the IPVS balancer can be made persistent using ipvsadm-save
, respectively ipvsadm-restore
to restore the rules.
On each node in the swarm, the virtual IP has to be assigned to a virtual interface to make sure that the node will listen and respond to requests coming in from the virtual IP. The loopback interface can be used:
ifconfig lo:0 192.168.1.100 netmask 255.255.255.255 -arp up
In order to check that the different nodes are accessed, the ipvsadm
command can be issued, perhaps also with the -lcn
flag set in order to see any established and pending connections.
The command:
ipvsadm -lcn
should show something along the lines of:
IPVS connection entries pro expire state source virtual destination TCP 00:22 SYN_RECV x.x.x.x:31847 192.168.1.100:1883 192.16.1.101:1883 TCP 14:48 ESTABLISHED y.y.y.y:63330 192.168.1.100:1883 192.16.1.101:1883 TCP 00:53 SYN_RECV z.z.z.z:19167 192.168.1.100:1883 192.16.1.102:1883
where you can see various source connections routed through the virtual IP address 192.168.1.100
to the Docker swarm destination nodes 192.16.1.101
and 192.16.1.102
.
On Debian, the interface configurations can be created in /etc/network/interfaces
or defined in a separate file in /etc/network/interfaces.d
.
The interface configuration for defining the virtual IP address on the loopback interface that must be defined for all Docker swarm nodes:
auto lo:0 iface lo:0 inet static address 192.168.1.100 netmask 255.255.255.255
keepalived can be used to achieve high-availability by checking that the connection to the Docker swarm nodes can still be made and mitigating the situation by redirecting traffic to other Docker swarm nodes.
A very simple configuration based on the firewall marking example provided by keepalived
can be seen here:
global_defs { router_id io } virtual_server fwmark 80 { delay_loop 6 lb_algo lc lb_kind DR real_server 192.168.1.101 { weight 1 MISC_CHECK { misc_path '"/usr/bin/ping -c 3"' misc_timeout 5 warmup 5 } } real_server 192.168.1.102 { weight 1 MISC_CHECK { misc_path '"/usr/bin/ping -c 3"' misc_timeout 5 warmup 5 } } real_server 192.168.1.103 { weight 1 MISC_CHECK { misc_path '"/usr/bin/ping -c 3"' misc_timeout 5 warmup 5 } } }
The configuration sends an ICMP-echo packet every five seconds in order to check whether the host is up and if it is not up then the next connection will be redirected to the remainder of nodes that are still alive.