contact

Loading enum info ...

Automated Distributed Backup

This document will describe the setup I made for automating the backuptask for all laptops here in the house.
My servers use the same backup server and infrastructure, but right now they don't have the checks and scripts because they are online 24/7 and my backup server is triggering the backup process.
This is however not true at all for the laptops. They can be at different places, powered down, suspended, put to sleep etc. So I needed a different approach for them.
Besides those things, all laptops get their ip address from DHCP, so triggering the backup from my server was not an option, since the clients ip's are unknown.

The idea I got was:
All laptops should trigger the backup process. This has to be something that happens automagically, because if it's not, users forget to backup. (I know I forget). What better place then the boot process can handle this?
The laptops should check if the backup server is there, and that it's actually our backup server and not a server with the same ip but on a different network.
The server should have some form of message system to tell the laptops backups cannot be made if there's something wrong. Think of full disks, services not running, backup space not mounted etc.

Required Software

Server:
* rdiff-backup
* ssh server
* dataq

Clients:
* rdiff-backup
* ssh client
* netcat
* ping
* arp
* bash

If you happen to run Debian (which is what I use for every system except firewalls, which run OpenBSD), all the software can be apt-get install-ed. Debian sarge has all the software available at no charge.

Server Setup

The clients will use the root account to login. To make this as secure as possible, you need to tell ssh root can only startup the rdiff-backup --server program. This can be done in the .ssh/authorized_keys file. Simply prepend the ssh-[d|r]sa line for the clients with: command="rdiff-backup --server"
Create a directory that will hold the backups, and create a dir for every system that should be backed up.

The following step is optional. I use it to tell my laptops if the server is in good shape and ready to receive the backups. You can skip this if you like.
DataQ
DataQ is a small message and data queueing server, written in Python, featuring a very simple text-based protocol which makes it very easy to implement clients for it. It features FILO and FIFO queues and various queue restricitions.
The basic idea behind DataQ is to make it easy to have multiple clients at various locations to report to a single target which, in turn, can be queried from a single or multiple sources.
For more information see it's website

Install and setup DataQ for our backup scripts.
Installation is simple.
Untar the download.
Copy config/dataq.xml.example to /etc/dataq.xml
Copy src/dataq.py to /usr/local/sbin
Modify /etc/dataq.xml
The file is documented so I'm not gonna paste my whole file here.
You should have the following line in it though:

Now start DataQ:
dataq -c /etc/dataq.xml

Now you can store a message in the queue:
To allow backups:
echo "PUSH username:password@backup allowed" | nc server_ip 50000
To disallow backups:
echo "PUSH username:password@backup disallowed" | nc server_id 50000

My backup server runs several scripts every minute to check diskspace, mountpoints, connectivity, systemload and some more and updates the backup queue information as needed. Only if everything is in good shape backups are allowed.

Client Setup

The client needs some more work. Lucky you I did all the work for you ;)
I made a script for the clients that does some checking and the backuping.
It will prompt the user to press 'Q' to abort. If no input in 10 seconds, the backup will continue.
The script will then issue 1 ping to the ip address of the backup server. Ferry Boender told me this is sometimes needed for clients because the arp table has no entry for the server if it has not connected it before. Since this script is run at boot, and most probably the backup server is not your dhcp server or router, no arp entry will be there.
It will then check the arp table if the hardware address (MAC) for the ip is the same as we know.
If the MAC matches, it will consult DataQ if we are allowed to make backups. If the server agrees, rdiff-backup will start sending the changes. If all changes are sent, the script will clean history older then 30 days.
Since the client triggers all this, you can even restore a machine that was dead for years.

I can post script snippets now, but you can get the whole example backup script here, so that should be easier.
If you also want the boot setup download the whole package here. I know there's no documentation in there, that's why I wrote this webarticle.
You should open the backup-to-server.sh script with your favourite editor and change the setting variables at the start of the file. If you did setup DataQ set use_data="YES" (it's off by default)

One last tip: Backup the server's backup space to DVD or CD on regular intervals. The server runs on harddisks too, and they WILL fail on you sometime.

Credits and Thanks

Ferry Boender for:
- DataQ
- Tips on 'read' and 'ping'

Leonieke Aalders for:
- Reading this article and giving me positive feedback

Nancy van Baak for:
- listening to my techy talk
- waking me up on sunday morning so I could do this during the weekend

Steve Kemp for:
- Pointing me to the lack of explenation about DataQ. It's not in Debian, so you need to do some work by hand.



 Feedback on this page or article:

This article has no feedback yet

 Give feedback:

To give feedback on this article, click here
Copyright (c) 2006-2008 Michiel van Baak.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.2
or any later version published by the Free Software Foundation;
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
A copy of the license is included in the section entitled "GNU Free Documentation License".
< back | print | text | Automated Distributed Backup