Even though libvirt disposes of many ways to perform virtual machine backups, the main problem with libvirt backups and backups in general is that backups do not conceptually additionally imply fast restore and deployment. In fact, a "good backup device", would be a slow tape-drive like machine that would perform incremental backups and tombstone them in case of necessity in the far future. In what concerns libvirt, backups made by copying files, either via the traditional disk dumper tool (dd
) or just transferring QCOW images take a very long while and will take more or less the same amount of time to restore when a backup is to be restored.
Restoring a backup implies downtime where services are unavailable until the backup is fully restored and then the virtual machine is booted up again. One alternative is to run servers in parallel for redundancy such that once a machine fails, the other machine can take over such that no downtime is perceived and additionally a backup always exists and is ready to be deployed. Taking snapshots and restoring them on the same machine is trivial, but the extra difficulty is to transparently maintain a parallel server across the network that will be ready to go in case a virtual machine fails.
One solution is to use "Distributed Replicated Block Device" (DRBD) on Linux that acts on the block layer and can make sure that block-level mirrors of the current hard-drive are maintained on a network. There are several drawbacks using DRBD that are implied:
One solution is to leverage libvirt's ability to use write-only snapshots and then use the Network Block Device feature to remotely attach to the virtual machine and clone the device. This can be performed periodically, during which the machines that copy over the virtual machine can hibernate.
A diagram of the setup is as follows:
The host runs multiple virtual machines labeled dom_1
, dom_2
, etc. and exports the virtual machine block device (or QCOW file-based virtual machines) over the network via an NBD server. A client clone_1
connects via an NBD client and connects to the block device of dom_1
at which point the client clone_1
copies over the entire block device using the disk dumper tool (dd
) or similar.
For this process to be transparent, the virtual machines must not be shutdown such that the block commit feature of libvirt
will be used. It is assumed that clone_1
will shut down its own operations on the block device or file that it uses to run the virtual machine.
A protocol diagram would be as follows:
where the following operations will take place in order:
with the following remarks:
prerun
script, the NBD server does not allow the connection in order to prevent copying a block device that is currently in use,The setup is remarkably simple for what it does. The only requirements are that the server runs an NBD server and the clients install the NBD client.
NBD relies on the nbd
kernel module that is not loaded automatically such that it has to be added to the list of modules to be loaded. In order to do that, edit /etc/modules
and add nbd
to the end of the file. Next, load the NBD module manually via modprobe nbd
.
Next, setup the server by editing /etc/nbd-server/config
and adjust the configuration to export a libvirt block device (this can also be a file):
[generic] # If you want to run everything as root rather than the nbd user, you # may either say "root" in the two following lines, or remove them # altogether. Do not remove the [generic] section, however. # user = nbd # group = nbd includedir = /etc/nbd-server/conf.d # What follows are export definitions. You may create as much of them as # you want, but the section header has to be unique. [mydomain] exportname = /dev/mapper/vms-mydomain readonly = true prerun = /usr/bin/virsh snapshot-create-as --domain "mydomain" "mydomain-nbd-snap" --diskspec "hda",file="/var/lib/libvirt/snapshots/mydomain-nbd-snap" --disk-only --atomic --no-metadata --quiesce postrun = /bin/sh -c '/usr/bin/virsh blockcommit "mydomain" "hda" --active --pivot && rm /var/lib/libvirt/snapshots/mydomain-nbd-snap'
where:
nbd
have been commented out such that nbd-server
may run with root permission in order to access the block devices,mydomain
is a libvirt domain, [mydomain]
is the export section where mydomain
is the name of the export,exportname = /dev/mapper/vms-mydomain
is the path to the libvirt virtual machine block device - this can be an LVM LV,mydomain-nbd-snap
is a snapshot file that will be created under /var/lib/libvirt/snapshots
and it will be used for the virtual machine to store changes whilst the block device is copied over the network,hda
is device name local to the virtual machine to be snapshot,prerun
script will be executed whenever an NBD client connects but before yielding the block device to the client; in case the exit status of the command passed to prerun
is non-zero, then the NBD server will not grant access to the client - in this case, if libvirt fails to create a snapshot, then the whole process is aborted,postrun
script is executed after a client issues a disconnect from the NBD server, in this case, two operations are performed sequentially, iff. the operations also succeed sequentially:An additional remark is that after the client disconnects, iff. the snapshot file has not been removed, for either of the reasons:
then when a client connects again, the snapshot will not be overwritten and the command will fail such that the client will not be granted access.
In order to clone the remote machine block device, the client may perform the following operations in order:
# Connect to the server and request to map the mydomain export to /dev/nbd0 # The server's prerun script will be executed at this point on the server. nbd-client server.tld -N mydomain /dev/nbd0 # Transfer over the block device. dd if=/dev/nbd0 of=... # Disconnect from the NBD server. # The server's postrun script will be executed at this point on the server. nbd-client -d /dev/nbd0
where:
server.tld
is the machine hosting the virtual machines and the NBD server,mydomain
is an NBD export,/dev/nbd0
is an NBD device to which a remote block device will be mapped locally toThe operations can be performed whenever the client feels like updating its block device from the server such that the commands may be run via cron at established times periodically.