"Better safe then sorry". That's my motto. Although I updated my new server with a software RAID1 and using rsync to backup my data to an external hard-disk. One thing I didn't protect it from is hardware failures. So I used an other computer that would be my new backup server. Now I could have installed Debian again and doing some rsyncing but I don't know if that's good enough. What I want is a 1 on 1 clone. So you would probably thinking of:
$ dd if=/dev/sda of=/dev/sdc
To keep thinks short. That was not an option for me. I needed to do it over the network. So I soon figured out the netcat. Which I think you probably already known it's doing by reading it's name. It's using cat over the network. Just what I needed! After reading some posts on the internet I figured out to do some cloning. Here is my result.
I'm using software RAID1 on my Debian server. I will disable one hard-drive and cloning that one over the network. Note that my server can still be used because my other hard-drive is still running. So no downtime! First I need to know something about my RAID1. So I do the following:
$ cat /proc/mdstat Personalities : [raid1] md6 : active raid1 sdb8[1] sda8[0] 441016256 blocks [2/2] [UU] md5 : active raid1 sdb7[1] sda7[0] 2931712 blocks [2/2] [UU] md4 : active raid1 sdb6[1] sda6[0] 4883648 blocks [2/2] [UU] md3 : active raid1 sdb5[1] sda5[0] 19534912 blocks [2/2] [UU] md2 : active raid1 sdb3[1] sda3[0] 19534976 blocks [2/2] [UU] md1 : active raid1 sdb2[1] sda2[0] 289088 blocks [2/2] [UU] md0 : active raid1 sdb1[1] sda1[0] 192640 blocks [2/2] [UU] unused devices:
Note that your mdstat will probably look different. I want to disable the second SATA hard-drive (sdb). First you need to set the faulty bit per partition and then you can remove them. If I see my mdstat output I could use the following commands to disable them.
$ mdadm --manage /dev/md0 --fail /dev/sdb1 $ mdadm --manage /dev/md1 --fail /dev/sdb2 $ mdadm --manage /dev/md2 --fail /dev/sdb3 $ mdadm --manage /dev/md3 --fail /dev/sdb5 $ mdadm --manage /dev/md4 --fail /dev/sdb6 $ mdadm --manage /dev/md5 --fail /dev/sdb7 $ mdadm --manage /dev/md6 --fail /dev/sdb8 $ mdadm --manage /dev/md0 --remove /dev/sdb1 $ mdadm --manage /dev/md1 --remove /dev/sdb2 $ mdadm --manage /dev/md2 --remove /dev/sdb3 $ mdadm --manage /dev/md3 --remove /dev/sdb5 $ mdadm --manage /dev/md4 --remove /dev/sdb6 $ mdadm --manage /dev/md5 --remove /dev/sdb7 $ mdadm --manage /dev/md6 --remove /dev/sdb8
Now I'm ready to clone this drive over the network. For that you need to start a listening server on the backup server. So at this point start your backup server with a LiveCD like SysRescCD. In SysRescCD you need to start your ethernet card. This worked for me:
$ dhcpcd eth0
Write down your local IP. Now start the listening server with netcat. You're gonna pipe the output from netcat to dd which in return writes the bits and pieces to the backup hard-drive (sda).
$ nc -l -p 1983 | dd of=/dev/sda
Now your backup server's waits for the production server to send some data. So lets do that! Go back to your production servers terminal and send the data!
$ dd if=/dev/sdb | nc 192.168.0.110 1983
Note that the 1983 is the port number. You can change that if you like. Now wait a very long time. For me a backup of the 500Gb hard-drive took me a whopping 20 hours! So have a coffee, do some work go to sleep or whatever. Some note. You don't see any progress. When all is gone good you see the production server saying it has send X bytes. At this point you need to CTRL-C in the terminal on the production server. Don't know why but this will close the connection between the two.
Ok! Now first restore our RAID1 array on our production server. The following command's will do:
$ mdadm --manage /dev/md0 --add /dev/sdb1 $ mdadm --manage /dev/md1 --add /dev/sdb2 $ mdadm --manage /dev/md2 --add /dev/sdb3 $ mdadm --manage /dev/md3 --add /dev/sdb5 $ mdadm --manage /dev/md4 --add /dev/sdb6 $ mdadm --manage /dev/md5 --add /dev/sdb7 $ mdadm --manage /dev/md6 --add /dev/sdb8
From this point you got your production server up and running again. You're not far now. I encounter two problems when restarting my backup server. 1: I didn't got any GRUB screen. And 2: My RAID array was messed up. Lucky these are easy to solve. Maybe you don't have these problems. First restore our GRUB: Start your LiveCD again. You need to mount a partition which has the /boot directory. For me it was /dev/md0.
$ mount /dev/md0 /mnt/fred $ /mnt/fred/sbin/grub grub> root (hd0,0) grub> setup (hd0) grub> quit
$ mdadm --stop /dev/md0 $ mdadm --stop /dev/md1 $ mdadm --stop /dev/md2 $ mdadm --stop /dev/md3 $ mdadm --stop /dev/md4 $ mdadm --stop /dev/md5 $ mdadm --stop /dev/md6 $ mdadm --create /dev/md0 --verbose --level=raid1 --raid-devices=2 /dev/sda1 missing $ mdadm --create /dev/md1 --verbose --level=raid1 --raid-devices=2 /dev/sda2 missing $ mdadm --create /dev/md2 --verbose --level=raid1 --raid-devices=2 /dev/sda3 missing $ mdadm --create /dev/md3 --verbose --level=raid1 --raid-devices=2 /dev/sda5 missing $ mdadm --create /dev/md4 --verbose --level=raid1 --raid-devices=2 /dev/sda6 missing $ mdadm --create /dev/md5 --verbose --level=raid1 --raid-devices=2 /dev/sda7 missing $ mdadm --create /dev/md6 --verbose --level=raid1 --raid-devices=2 /dev/sda8 missing
The 'missing' parameter is there because I only have one hard-drive yet. But I create a 'spot' just in case I add a second hard-drive. Which would be natural because that's what's RAID for! The last thing you probably need to do is change your network configuration in case it's has a static IP.
Now you're done! Good luck!
Sources:
http://www.sysresccd.org/
http://linux.die.net/man/8/mdadm
http://arstechnica.com/civis/viewtopic.php?f=16&t=121767
http://kevin.deldycke.com/2007/03/how-to-recover-a-raid-array-after-having-zero-ized-superblocks/
http://www.devil-linux.org/documentation/1.0.x/ch01s05.html
http://sattia.blogspot.com/2010/02/linux-cloning-over-network-using-netcat.html