Table of Contents

About

Even though libvirt disposes of many ways to perform virtual machine backups, the main problem with libvirt backups and backups in general is that backups do not conceptually additionally imply fast restore and deployment. In fact, a "good backup device", would be a slow tape-drive like machine that would perform incremental backups and tombstone them in case of necessity in the far future. In what concerns libvirt, backups made by copying files, either via the traditional disk dumper tool (dd) or just transferring QCOW images take a very long while and will take more or less the same amount of time to restore when a backup is to be restored.

Restoring a backup implies downtime where services are unavailable until the backup is fully restored and then the virtual machine is booted up again. One alternative is to run servers in parallel for redundancy such that once a machine fails, the other machine can take over such that no downtime is perceived and additionally a backup always exists and is ready to be deployed. Taking snapshots and restoring them on the same machine is trivial, but the extra difficulty is to transparently maintain a parallel server across the network that will be ready to go in case a virtual machine fails.

Alternatives

One solution is to use "Distributed Replicated Block Device" (DRBD) on Linux that acts on the block layer and can make sure that block-level mirrors of the current hard-drive are maintained on a network. There are several drawbacks using DRBD that are implied:

Solution

One solution is to leverage libvirt's ability to use write-only snapshots and then use the Network Block Device feature to remotely attach to the virtual machine and clone the device. This can be performed periodically, during which the machines that copy over the virtual machine can hibernate.

Diagram

A diagram of the setup is as follows:

The host runs multiple virtual machines labeled dom_1, dom_2, etc. and exports the virtual machine block device (or QCOW file-based virtual machines) over the network via an NBD server. A client clone_1 connects via an NBD client and connects to the block device of dom_1 at which point the client clone_1 copies over the entire block device using the disk dumper tool (dd) or similar.

For this process to be transparent, the virtual machines must not be shutdown such that the block commit feature of libvirt will be used. It is assumed that clone_1 will shut down its own operations on the block device or file that it uses to run the virtual machine.

A protocol diagram would be as follows:

where the following operations will take place in order:

  1. the client initiates a connection to the NBD server,
  2. the server, before allowing the connection, creates a snapshot of the VM such that the block device will not be written to but the snapshot file will handle all the writes,
  3. in case the snapshot succeded, the client is allowed to connect,
  4. the client copies over the read-only block device to its own block device,
  5. the client disconnects from the NBD server,
  6. the NBD server merges all write operations that took place during the block copy operation to the block device

with the following remarks:

Setup

The setup is remarkably simple for what it does. The only requirements are that the server runs an NBD server and the clients install the NBD client.

NBD relies on the nbd kernel module that is not loaded automatically such that it has to be added to the list of modules to be loaded. In order to do that, edit /etc/modules and add nbd to the end of the file. Next, load the NBD module manually via modprobe nbd.

Server

Next, setup the server by editing /etc/nbd-server/config and adjust the configuration to export a libvirt block device (this can also be a file):

[generic]
# If you want to run everything as root rather than the nbd user, you
# may either say "root" in the two following lines, or remove them
# altogether. Do not remove the [generic] section, however.
#       user = nbd
#       group = nbd
        includedir = /etc/nbd-server/conf.d

# What follows are export definitions. You may create as much of them as
# you want, but the section header has to be unique.
[mydomain]
    exportname = /dev/mapper/vms-mydomain
    readonly = true
    prerun = /usr/bin/virsh snapshot-create-as --domain "mydomain" "mydomain-nbd-snap" --diskspec "hda",file="/var/lib/libvirt/snapshots/mydomain-nbd-snap" --disk-only --atomic --no-metadata --quiesce
    postrun = /bin/sh -c '/usr/bin/virsh blockcommit "mydomain" "hda" --active --pivot && rm /var/lib/libvirt/snapshots/mydomain-nbd-snap'

where:

An additional remark is that after the client disconnects, iff. the snapshot file has not been removed, for either of the reasons:

then when a client connects again, the snapshot will not be overwritten and the command will fail such that the client will not be granted access.

Client

In order to clone the remote machine block device, the client may perform the following operations in order:

# Connect to the server and request to map the mydomain export to /dev/nbd0
# The server's prerun script will be executed at this point on the server.
nbd-client server.tld -N mydomain /dev/nbd0
 
# Transfer over the block device.
dd if=/dev/nbd0 of=... 
 
# Disconnect from the NBD server.
# The server's postrun script will be executed at this point on the server.
nbd-client -d /dev/nbd0

where:

The operations can be performed whenever the client feels like updating its block device from the server such that the commands may be run via cron at established times periodically.