Recovering from a Split-Brain Condition

Here is an overview of a split-brain situation where the storage ended up desynchronized between two primary master nodes.

On one node the contents of /dev/drbd are:

version: 8.4.11 (api:1/proto:86-101)
srcversion: 96ED19D4C144624490A9AB1
 0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:71590684

and on the other node, the contents of /dev/drbd are:

version: 8.4.11 (api:1/proto:86-101)
srcversion: 96ED19D4C144624490A9AB1
 0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
    ns:0 nr:0 dw:98114 dr:537665 al:266 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:33818816

indicating that:

it is unknown which the primary node is,
it is unknown whether the data in storage is up to date

Split-brain conditions are also seen in the log files:

[  141.442715] block drbd0: helper command: /sbin/drbdadm split-brain minor-0
[  141.452923] block drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0)

You can imagine this as a decision tree or a timeline (similar to git branches) where at some point either one or the other replica started accumulating data that was not synchronized to the other replica such that the only solution to make the data coherent is to return to an earlier stage before the branching. In order to accomplish that with DRBD, one node is made forced to secondary and made to dump all its data whilst the other node is made primary and made to reconnect to the other node.

First, start DRBD on the nodes at the same time by issuing on both nodes:

systemctl restart drbd

after that, one one node execute:

drbdadm secondary all
drbdadm disconnect all
drbdadm -- --discard-my-data connect all

in order to make the node primary and discard all the data relative to the next connected node.

On the other node, issue:

drbdadm primary all
drbdadm disconnect all
drbdadm connect all

In order to make the node primary and connect to the other node that was made secondary and told to trash its data.

The resyncrhonizaton process can be observed by looking at /proc/drbd on both nodes.

On one node:

version: 8.4.11 (api:1/proto:86-101)
srcversion: 96ED19D4C144624490A9AB1
 0: cs:SyncTarget ro:Primary/Primary ds:Inconsistent/UpToDate C r-----
    ns:1430150 nr:2067540 dw:2703946 dr:361936 al:355 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:75028412
        [>....................] sync'ed:  2.1% (73268/74812)M
        finish: 42:27:15 speed: 476 (1,880) want: 680 K/sec

and then on the other:

version: 8.4.11 (api:1/proto:86-101)
srcversion: 96ED19D4C144624490A9AB1
 0: cs:SyncSource ro:Primary/Primary ds:UpToDate/Inconsistent C r-----
    ns:2075474 nr:1430511 dw:1898629 dr:1820768 al:137 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:75025100
        [>....................] sync'ed:  2.1% (73264/74812)M
        finish: 35:55:53 speed: 568 (1,872) K/sec

it is also possible to watch the process at $1s$ time intervals by issuing watch -n 1 cat /proc/drbd on both nodes.

In case the setup is a double-primary setup then it is recommended to wait for the resynchronization process to complete before making both nodes primaries again.