Software RAID1, Debian squeeze fresh installation, GRUB2


Absolutely no warranty, use entirely at your own risk, the author accepts no responsibility. It is quite likely some statements in this document are inaccurate. I created this document solely for my own personal use.

Oh joy, I have a need to build an SMTP/IMAP server, and because all of our company's email will be stored on that server, I must not loose that data. I am going to use software RAID1 to mirror the hard drive and also use rdiff-backup to make a backup once each day. I experimented for days using bits and pieces of the many HOWTOs I found on the subject of RAID1 mirroring on Linux (and specifically Debian) and I found that there is a lot of stuff that did not work for me or is ancient history (including some of my own documents). I am not an expert on the subject by any means.

The easiest way to get RAID1 functionality is to configure it during Debian installation, which this document describes. Of course this is only useful if you are building a new system. If you already have a system up and running with a single drive, and you wish to add a second drive and configure them as a RAID1 array, this document is not for you. See http://verchick.com/mecham/public_html/raid/raid-index.html for other choices. Currently however, I do not have a document that describes the 'degraded' process for Debian squeeze using GRUB2.

This document in itself is not designed to get RAID1 functional on your computer. It is designed to get you comfortable with doing so in a test environment. I am using Windows 7 with VirtualBox to simulate configuring RAID1 on a Debian squeeze server. Some of the steps we perform should not be performed on a production system with data on it. We do these things to educate ourselves. Once you are comfortable in your understanding of the process, my job is done.

My VirtualBox VM setup is a 32-bit based installation of Debian squeeze. I am using the Ext3 file system and the GRUB2 boot loader, which is what the squeeze installation typically would use. I have not tested with any other combination.

I am going to talk about both SATA and EIDE hard drives because I have tested this with both, but the examples will be SATA. There are not a lot of differences. Simply substitute 'sda' with 'hda', sdb with 'hdc' and so on. You will need two identical hard drives to complete this project. They can actually be different (within reason) but if they differ in size, the smallest drive will dictate the size of the partitions. Identical is better. In a production environment it is also a good idea to have a spare drive available if one of the other two drives fails. That is the point isn't it?

Name your hard drives so it is easier for me to refer to them. Actually label these names on them. Name one of them apple, and the other pie. If the drives differ in size, label the smaller one apple. Since I'm using VirtualBox, when I create a Virtual Disk, I create it as VHD (Virtual Hard Disk) of Fixed size. When asked for the location, this is where I enter a name for the drive (apple and pie). For an EIDE system, one drive must be installed on the primary master drive connector, and the other on the secondary master drive connector. I am going to refer to the EIDE drive that is connected to the primary master connector as being in the primary position. Linux should recognize this drive as /dev/hda. I am going to refer to the EIDE drive that is connected to the secondary master connector as being in the secondary position. Linux should recognize this drive as /dev/hdc.

For a SCSI system with both drives on one SCSI adapter, one drive should be configured as SCSI device id 0 (on my system removing all ID jumpers on the drive sets it to 0), and the other drive is typically configured as SCSI device id 1 (I add a jumper to the drive to set this). I am going to refer to the SCSI drive that is configured as SCSI device id 0 as being in the primary position. Linux should recognize this drive as /dev/sda. I am going to refer to the SCSI drive that is configured as SCSI device id 1 as being in the secondary position. Linux should recognize this drive as /dev/sdb. These statements assume both drives are installed. If only one drive is installed, it will be in the primary position regardless of the jumper setting.

For a SCSI system with each drive on separate SCSI adapters, both drives are typically configured as SCSI device id 0. You will need to determine which adapter is recognized first, and which is recognized second. If the adapters are the same model by the same manufacturer this is a more difficult task. You may have to temporarily remove one of the drives to see which adapter is recognized first. Then you may want to label them. I am going to refer to the SCSI drive that is on the adapter that is recognized first as being in the primary position. Linux usually recognizes this drive as /dev/sda. I am going to refer to the SCSI drive that is on the adapter that is recognized second as being in the secondary position. Linux should recognize this drive as /dev/sdb. These statements assume both drives are installed. If only one drive is installed, it will be in the primary position regardless of which adapter it is installed on.

Like SCSI drives, SATA drives are also given the names /dev/sda, /dev/sdb and so on, so they are very similar to work with (other than they plug into SATA ports as opposed to SCSI adapters).

All the data on the drives we use will be erased. Any drive that may be used in the future to replace a drive that has failed must be clean. Let me explain why. Let's pretend for a moment that we used apple in our array, then unplugged it and replaced it, then put it away for later use as an emergency replacement. A year from now one of our drives goes bad, so we shut the machine down and place apple in its place. Then we boot up, and to our horror, the remaining good drive has synched itself to the data stored on apple and not the other way around. Ideally the two drives we start with should be cleaned. To clean an actual physical drive (as opposed to a Virtual one), I install it in a computer by itself, boot up using a DBAN disk and change the Method to Quick. This will replace every bit on the drive with a zero. To clean a VirtualBox drive, I detach and delete the Virtual Hard Disk, being very careful to delete the correct file on my PC. I use File, Virtual Media Manager to remove the VHD before creating a new one of the same name. Note that the UUID of the new drive is likely to be different than the old one. I also verify that the drives are connected to the correct SATA ports. Here is my VirtualBox setup:



Install apple in the primary position and pie in the secondary position. On a real PC, a good place for the CDROM drive is the secondary slave EIDE interface but I should mention that having a CDROM on the same cable with an EIDE drive will slow the EIDE drive down. If you are using IDE drives, once Debian is installed it is best to unplug the CDROM drive if possible. Boot up using the appropriate Debian installer media. I use the stable netinst CD. I have compiled a document that walks you through installing Debian squeeze and configuring RAID1. Since I'm setting up a server, this document does not have you install a GUI, but of course you can if you desire.
http://verchick.com/mecham/public_html/raid-squeeze/Debian-server1a.html.

After squeeze is installed per the instructions in the walkthrough, log in as root (or if using a GUI, log in as regular user, then open a terminal and su to root). Personally I install ssh and then log in remotely using PuTTY (I have the network set to "Bridged Adapter" in VirtualBox). Log in as root, then: Use 'cat /etc/fstab' and make sure all your md devices are shown. If they are not, something went wrong. You might consider starting over from the beginning:
cat /etc/fstab
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
proc            /proc           proc    defaults        0       0
# / was on /dev/md0 during installation
UUID=e607eab9-15fb-4a47-8364-0763a221f81b /               ext3    errors=remount-ro 0       1
# swap was on /dev/md1 during installation
UUID=f3286a7b-e152-499e-abcc-ccd7e8ed5d6b none            swap    sw              0       0
/dev/scd0       /media/cdrom0   udf,iso9660 user,noauto     0       0
Use 'cat /proc/mdstat' to see what is going on.
cat /proc/mdstat

Output on my machine:
Personalities : [raid1]
md1 : active (auto-read-only) raid1 sda2[0] sdb2[1]
      770036 blocks super 1.2 [2/2] [UU]

md0 : active raid1 sda1[0] sdb1[1]
      7615476 blocks super 1.2 [2/2] [UU]

unused devices: <none>
Looks like these are the devices GRUB2 is interested in:
cat /boot/grub/device.map
(hd0)   /dev/disk/by-id/ata-VBOX_HARDDISK_VB202fe7ae-eb781f1b
(hd1)   /dev/disk/by-id/ata-VBOX_HARDDISK_VB5c2b8277-48a967c9
Another command of potential interest:
ls -l /dev/disk/by-uuid
total 0
lrwxrwxrwx 1 root root 9 Dec 23 14:58 e607eab9-15fb-4a47-8364-0763a221f81b -> ../../md0
lrwxrwxrwx 1 root root 9 Dec 23 14:58 f3286a7b-e152-499e-abcc-ccd7e8ed5d6b -> ../../md1
You may also find  ls -l /dev/disk/by-id  interesting:
total 0
lrwxrwxrwx 1 root root  9 Dec 23 14:58 ata-VBOX_HARDDISK_VB202fe7ae-eb781f1b -> ../../sda
lrwxrwxrwx 1 root root 10 Dec 23 14:58 ata-VBOX_HARDDISK_VB202fe7ae-eb781f1b-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Dec 23 14:58 ata-VBOX_HARDDISK_VB202fe7ae-eb781f1b-part2 -> ../../sda2
lrwxrwxrwx 1 root root  9 Dec 23 14:58 ata-VBOX_HARDDISK_VB5c2b8277-48a967c9 -> ../../sdb
lrwxrwxrwx 1 root root 10 Dec 23 14:58 ata-VBOX_HARDDISK_VB5c2b8277-48a967c9-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Dec 23 14:58 ata-VBOX_HARDDISK_VB5c2b8277-48a967c9-part2 -> ../../sdb2
lrwxrwxrwx 1 root root  9 Dec 23 14:58 md-name-sfa:0 -> ../../md0
lrwxrwxrwx 1 root root  9 Dec 23 14:58 md-name-sfa:1 -> ../../md1
lrwxrwxrwx 1 root root  9 Dec 23 14:58 md-uuid-4d93a302:e6d1aba5:eeef1168:26535a7d -> ../../md1
lrwxrwxrwx 1 root root  9 Dec 23 14:58 md-uuid-b75cef10:d76deb2a:e91bcc2a:42e55211 -> ../../md0
lrwxrwxrwx 1 root root  9 Dec 23 14:58 scsi-SATA_VBOX_HARDDISK_VB202fe7ae-eb781f1b -> ../../sda
lrwxrwxrwx 1 root root 10 Dec 23 14:58 scsi-SATA_VBOX_HARDDISK_VB202fe7ae-eb781f1b-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Dec 23 14:58 scsi-SATA_VBOX_HARDDISK_VB202fe7ae-eb781f1b-part2 -> ../../sda2
lrwxrwxrwx 1 root root  9 Dec 23 14:58 scsi-SATA_VBOX_HARDDISK_VB5c2b8277-48a967c9 -> ../../sdb
lrwxrwxrwx 1 root root 10 Dec 23 14:58 scsi-SATA_VBOX_HARDDISK_VB5c2b8277-48a967c9-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Dec 23 14:58 scsi-SATA_VBOX_HARDDISK_VB5c2b8277-48a967c9-part2 -> ../../sdb2
I suggest running the commands above and keeping a copy for future reference. At this time, if our primary drive (apple) were to fail, the system would not boot. I simulated this situation by removing apple from the system. The reason it will not boot is because we did not have GRUB2 set up a boot partition on the drive. All is not lost if you get in this situation. You could boot up the installation CD/DVD and go into rescue mode (Advanced options, Rescue mode). Just walk through the installer, Assemble Raid array (automatic), Device to use as root file system in my case is md0, then Reinstall GRUB boot loader on /dev/sda. Remember that with apple removed, pie is now recognized as the primary device (/dev/sda). In the real world, if your disk in the primary position fails, to avoid confusion, I suggest powering down and physically removing it from the system. Place the good drive in the primary position, a clean drive in the secondary position, and power the system back up. Don't re-use apple unless you clean it first for reasons I described above.

HOWEVER, it would be best to avoid the need to use Rescue mode. You do so by manually installing GRUB on the secondary drive in order to make it bootable. This can be accomplished on this test system by running the command:
grub-install /dev/sdb

Now if the system should need to, it can boot from either drive. If we were to remove apple, the system should boot with the drive missing. I will simulate this. I have disconnected apple (due to a hypothetical drive failure), and created a new drive called apple2. I have placed pie on SATA port 0 and apple on SATA Port 1. So pie will become /dev/sda and apple2 will become /dev/sdb due to the position of the drives. Note that apple2 will have a new UUID and this will have to be dealt with later, or you could have an unbootable system.



Now if we run  cat /proc/mdstat  we get:
Personalities : [raid1]
md1 : active (auto-read-only) raid1 sda2[1]
      770036 blocks super 1.2 [2/1] [_U]

md0 : active raid1 sda1[1]
      7615476 blocks super 1.2 [2/1] [_U]

unused devices: <none>
We see that now only /dev/sda1 and /dev/sda2 are mapped to md0 and md1 respectively. /dev/sdb1 and /dev/sdb2 are missing so only one physical device is up for each md device. It's important to run this command in order to determine which drive is up and which is not. To add the new (clean and empty) /dev/sdb drive to the RAID1 arrays, we first have to copy the disk structure from the working drive to the clean/empty drive. It's important that you get this right, or you could potentially wipe out the wrong drive and end up loosing all your data. I'm confident that I want to copy the structure from /dev/sda to /dev/sdb, so I issue the command:
sfdisk -d /dev/sda | sfdisk /dev/sdb

sfdisk will complain:
Checking that no-one is using this disk right now ...
OK

Disk /dev/sdb: 1044 cylinders, 255 heads, 63 sectors/track

sfdisk: ERROR: sector 0 does not have an msdos signature
 /dev/sdb: unrecognized partition table type
Old situation:
No partitions found
New situation:
Units = sectors of 512 bytes, counting from 0

   Device Boot    Start       End   #sectors  Id  System
/dev/sdb1   *      2048  15235071   15233024  fd  Linux raid autodetect
/dev/sdb2      15235072  16775167    1540096  fd  Linux raid autodetect
/dev/sdb3             0         -          0   0  Empty
/dev/sdb4             0         -          0   0  Empty
Warning: partition 1 does not end at a cylinder boundary

sfdisk: I don't like these partitions - nothing changed.
(If you really want this, use the --force option.)
So I will use --force since everything looks valid to me:
sfdisk -d /dev/sda | sfdisk /dev/sdb --force

I'm not concerned with this ERROR or this Warning, this is the desired result:
Checking that no-one is using this disk right now ...
OK

Disk /dev/sdb: 1044 cylinders, 255 heads, 63 sectors/track

sfdisk: ERROR: sector 0 does not have an msdos signature
 /dev/sdb: unrecognized partition table type
Old situation:
No partitions found
New situation:
Units = sectors of 512 bytes, counting from 0

   Device Boot    Start       End   #sectors  Id  System
/dev/sdb1   *      2048  15235071   15233024  fd  Linux raid autodetect
/dev/sdb2      15235072  16775167    1540096  fd  Linux raid autodetect
/dev/sdb3             0         -          0   0  Empty
/dev/sdb4             0         -          0   0  Empty
Warning: partition 1 does not end at a cylinder boundary
Successfully wrote the new partition table

Re-reading the partition table ...

If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1
(See fdisk(8).)
Looks good to me. Now I have to add /dev/sdb1 back into md0, and add /dev/sdb2 back into md1:
mdadm --add /dev/md0 /dev/sdb1

mdadm --add /dev/md1 /dev/sdb2


You can monitor the synchronization by running
watch -n 6 cat /proc/mdstat  (Ctrl+c to exit)
Personalities : [raid1]
md1 : active raid1 sdb2[2] sda2[1]
      770036 blocks super 1.2 [2/1] [_U]
        resync=DELAYED

md0 : active raid1 sdb1[2] sda1[1]
      7615476 blocks super 1.2 [2/1] [_U]
      [>....................]  recovery =  4.7% (362048/7615476) finish=3.3min speed=36204K/sec

unused devices: <none>
In the end, all appears well:
Personalities : [raid1]
md1 : active raid1 sdb2[2] sda2[1]
      770036 blocks super 1.2 [2/2] [UU]

md0 : active raid1 sdb1[2] sda1[1]
      7615476 blocks super 1.2 [2/2] [UU]

unused devices: <none>
Compare this to the first time we ran  cat /proc/mdstat:
Personalities : [raid1]
md1 : active (auto-read-only) raid1 sda2[0] sdb2[1]
      770036 blocks super 1.2 [2/2] [UU]

md0 : active raid1 sda1[0] sdb1[1]
      7615476 blocks super 1.2 [2/2] [UU]
	  
unused devices: <none>
The thing I noticed was how sda1[0] is now sda1[1] and sdb1[1] is now sdb1[2] and so forth. We once again need to make the drive in the secondary position (apple2) bootable. Have GRUB2 build a new device map (due to the device ID change) and then reinstall GRUB2 on both drives:
mv /boot/grub/device.map /boot/grub/device.map.old

grub-mkdevicemap

update-grub2

grub-install /dev/sda

grub-install /dev/sdb


Compare GRUB's new device map to the previous one (when apple was in the primary position, and pie was in the secondary position):
cat /boot/grub/device.map
(hd0)   /dev/disk/by-id/ata-VBOX_HARDDISK_VB5c2b8277-48a967c9
(hd1)   /dev/disk/by-id/ata-VBOX_HARDDISK_VB8377e783-28b8cef1
Previously it was:
(hd0)   /dev/disk/by-id/ata-VBOX_HARDDISK_VB202fe7ae-eb781f1b
(hd1)   /dev/disk/by-id/ata-VBOX_HARDDISK_VB5c2b8277-48a967c9
You can see the old hd1 is now hd0, and hd1 has a new ID. Also, compare this new data:
ls -l /dev/disk/by-uuid
total 0
lrwxrwxrwx 1 root root 9 Dec 23 15:18 e607eab9-15fb-4a47-8364-0763a221f81b -> ../../md0
lrwxrwxrwx 1 root root 9 Dec 23 15:18 f3286a7b-e152-499e-abcc-ccd7e8ed5d6b -> ../../md1
To the old (which this time around has not changed):
total 0
lrwxrwxrwx 1 root root 9 Dec 23 14:58 e607eab9-15fb-4a47-8364-0763a221f81b -> ../../md0
lrwxrwxrwx 1 root root 9 Dec 23 14:58 f3286a7b-e152-499e-abcc-ccd7e8ed5d6b -> ../../md1
And compare the new:
ls -l /dev/disk/by-id
total 0
lrwxrwxrwx 1 root root  9 Dec 23 15:32 ata-VBOX_HARDDISK_VB5c2b8277-48a967c9 -> ../../sda
lrwxrwxrwx 1 root root 10 Dec 23 15:18 ata-VBOX_HARDDISK_VB5c2b8277-48a967c9-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Dec 23 15:18 ata-VBOX_HARDDISK_VB5c2b8277-48a967c9-part2 -> ../../sda2
lrwxrwxrwx 1 root root  9 Dec 23 15:32 ata-VBOX_HARDDISK_VB8377e783-28b8cef1 -> ../../sdb
lrwxrwxrwx 1 root root 10 Dec 23 15:27 ata-VBOX_HARDDISK_VB8377e783-28b8cef1-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Dec 23 15:27 ata-VBOX_HARDDISK_VB8377e783-28b8cef1-part2 -> ../../sdb2
lrwxrwxrwx 1 root root  9 Dec 23 15:18 md-name-sfa:0 -> ../../md0
lrwxrwxrwx 1 root root  9 Dec 23 15:18 md-name-sfa:1 -> ../../md1
lrwxrwxrwx 1 root root  9 Dec 23 15:18 md-uuid-4d93a302:e6d1aba5:eeef1168:26535a7d -> ../../md1
lrwxrwxrwx 1 root root  9 Dec 23 15:18 md-uuid-b75cef10:d76deb2a:e91bcc2a:42e55211 -> ../../md0
lrwxrwxrwx 1 root root  9 Dec 23 15:32 scsi-SATA_VBOX_HARDDISK_VB5c2b8277-48a967c9 -> ../../sda
lrwxrwxrwx 1 root root 10 Dec 23 15:18 scsi-SATA_VBOX_HARDDISK_VB5c2b8277-48a967c9-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Dec 23 15:18 scsi-SATA_VBOX_HARDDISK_VB5c2b8277-48a967c9-part2 -> ../../sda2
lrwxrwxrwx 1 root root  9 Dec 23 15:32 scsi-SATA_VBOX_HARDDISK_VB8377e783-28b8cef1 -> ../../sdb
lrwxrwxrwx 1 root root 10 Dec 23 15:27 scsi-SATA_VBOX_HARDDISK_VB8377e783-28b8cef1-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Dec 23 15:27 scsi-SATA_VBOX_HARDDISK_VB8377e783-28b8cef1-part2 -> ../../sdb2
To the old:
total 0
lrwxrwxrwx 1 root root  9 Dec 23 14:58 ata-VBOX_HARDDISK_VB202fe7ae-eb781f1b -> ../../sda
lrwxrwxrwx 1 root root 10 Dec 23 14:58 ata-VBOX_HARDDISK_VB202fe7ae-eb781f1b-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Dec 23 14:58 ata-VBOX_HARDDISK_VB202fe7ae-eb781f1b-part2 -> ../../sda2
lrwxrwxrwx 1 root root  9 Dec 23 14:58 ata-VBOX_HARDDISK_VB5c2b8277-48a967c9 -> ../../sdb
lrwxrwxrwx 1 root root 10 Dec 23 14:58 ata-VBOX_HARDDISK_VB5c2b8277-48a967c9-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Dec 23 14:58 ata-VBOX_HARDDISK_VB5c2b8277-48a967c9-part2 -> ../../sdb2
lrwxrwxrwx 1 root root  9 Dec 23 14:58 md-name-sfa:0 -> ../../md0
lrwxrwxrwx 1 root root  9 Dec 23 14:58 md-name-sfa:1 -> ../../md1
lrwxrwxrwx 1 root root  9 Dec 23 14:58 md-uuid-4d93a302:e6d1aba5:eeef1168:26535a7d -> ../../md1
lrwxrwxrwx 1 root root  9 Dec 23 14:58 md-uuid-b75cef10:d76deb2a:e91bcc2a:42e55211 -> ../../md0
lrwxrwxrwx 1 root root  9 Dec 23 14:58 scsi-SATA_VBOX_HARDDISK_VB202fe7ae-eb781f1b -> ../../sda
lrwxrwxrwx 1 root root 10 Dec 23 14:58 scsi-SATA_VBOX_HARDDISK_VB202fe7ae-eb781f1b-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Dec 23 14:58 scsi-SATA_VBOX_HARDDISK_VB202fe7ae-eb781f1b-part2 -> ../../sda2
lrwxrwxrwx 1 root root  9 Dec 23 14:58 scsi-SATA_VBOX_HARDDISK_VB5c2b8277-48a967c9 -> ../../sdb
lrwxrwxrwx 1 root root 10 Dec 23 14:58 scsi-SATA_VBOX_HARDDISK_VB5c2b8277-48a967c9-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Dec 23 14:58 scsi-SATA_VBOX_HARDDISK_VB5c2b8277-48a967c9-part2 -> ../../sdb2
Which simply illustrates changes in device IDs. So to summarize, if a drive fails, move the good drive into the primary position and a new clean drive into the secondary position, then use sfdisk to copy the disk structure from the currently working drive to the new drive, then use mdadm to add new disk partitions to the RAID arrays and finally run the GRUB commands above to make both drives bootable. Now we are prepared for the next potential drive failure.

mr88talent at yahoo dot com
23 DEC 2011