Debian Etch software RAID1 starts degraded HOWTO

Software RAID1 (starts degraded), Debian Etch 4.0, kernel 2.6.18, GRUB

Absolutely no warranty, use entirely at your own risk, the author accepts no responsibility. It is quite likely some statements in this document are inaccurate.

Oh joy, I have a need to create a RAID1 array on an SMTP/IMAP server, and because all of our company's email will is stored on that server, I must not loose the data. I am going to use software RAID1 to mirror the hard drive and make a backup once each day via http://www.howtoforge.com/linux_rdiff_backup. RAID only prevents data loss in the case of hard drive failure. RAID is not a substitute for a good backup strategy. I experimented (on a test system) for days using bits and pieces of the many HOWTOs I found on the subject of RAID1 mirroring on Linux (and specifically Debian) and I found that there is a lot of stuff that did not work for me or is ancient history. I am not an expert on the subject by any means, but I have learned a few things:

If you use this document on anything but Debian Etch 4.0, you might loose all your data.
If you don't understand the reason you are performing a step, you might loose all your data.
If you don't know exactly what will happen when you perform a step, you might loose all your data.
If you blindly copy and paste commands in this document to your shell, you might loose all your data.
If you are not paying attention and are not free from distraction, you might loose all your data.
If you have not first practiced on a test system, you might loose all your data.
If you make typo errors, you might loose all your data.
If you do not first back up your data, you might loose all your data.
If you don't perform the steps in the proper order, you might loose all your data.
If you become impatient, you might loose all your data.
If you try this using SATA disks, you might loose all your data.
If you don't document how to repair your system in the event of a hard drive failure, you might loose all your data.
Other than that, it's really pretty simple.

The easiest way to get RAID1 functionality is to configure it using the 'partman' partition manager and 'mdcfg' presented to you when you first install Debian. Of course this is only useful if you are building a new system. This document however is for a system that is currently up and running on a single drive and you wish to add a second drive to mirror the first. If you are building a new system and wish to configure your drives as a RAID1 array or are using the LILO bootloader, this document may not be for you. See http://verchick.com/mecham/public_html/raid/raid-index.html for other choices.

This document in itself is not designed to get RAID1 functional on your production computer. It is designed to get you comfortable with doing so in a test environment. The test environment you create should be as close as possible to the system you will eventually configure RAID on. A few of the steps I perform definitely should not be performed on a production system with data on it. We may do things to illustrate or prove a point; to educate us. When we finish training ourselves on a test system we should be confident enough to continue on to our production box.

My setup is on an i386 based machine. I am using the 2.6.18 kernel, the ext3 file system and the GRUB boot loader. I have not tried this with LVM.

There is a major problem with getting RAID1 to function. The software modules that are needed to read the data from the devices in the array need to be loaded at boot time, or the devices cannot be read. The problem is, these modules are not normally included in the boot ramdisk image (/boot/initrd.img-x.x.x). For our purposes we need two modules, 'md' (multi-disk) and 'raid1' (redundant array of inexpensive/independent disks, level 1). This is very similar to the problem Windows administrators face when dealing with device drivers for hard disk controllers that are not included with Windows. You have to install the device driver on the hard drive before you install the controller, or you cannot read your hard drive. While there is evidence it may be possible, I know of no straightforward way to include the needed modules in a series of boot floppies. The bottom line is, if you cannot get the modules loaded into the ramdisk, you may not be able to boot your machine from your hard drives. Also, it is not enough to get the modules into the boot ramdisk image. Doing so will get one or two RAID devices running but the remainder depend on additional RAID software that loads later on in the boot process. In addition, you must get your system configured in such a way that BOTH hard drives are bootable so you can boot your system from either drive if the other drive has failed, or is removed. I have to admit that I don't understand (whether Windows or Linux) how the disk is read if the software needed to read the disk is on the disk! It must be some special low-level boot magic.

I suggest reading:
man initrd

I am going to talk about both SCSI and EIDE hard drives because I have tested this with both, but the examples will be EIDE. There are not a lot of differences. Simply substitute 'hda' with 'sda', 'hdc' with 'sdb' and such. Sorry, but I have heard this will not work with SATA drives. You will need two identical hard drives to complete this project (in addition to the one currently installed in your system). They can actually be different (within reason) but if they differ in size, the smallest drive will dictate the size of the partitions, so the drives must be equal in size or larger than your current hard disk. Identical is better. Why two additional drives you say? I thought RAID1 only used two drives. True, but once you install one of the drives in your production system we want to have a spare drive available if one of the other two drives fails. That is the point isn't it? Besides, we are going to use the two spare drives as test drives prior to installing one of them in the production machine.

Name your hard drives so it is easier for me to refer to them. Actually label these names on them. Name one of them apple, and one of then pie. If one of the drives is smaller than the other, label the smaller of the two apple. For an EIDE system, one drive must be installed on the primary master drive connector and the other on the secondary master drive connector. I am going to refer to the EIDE drive that is connected to the primary master connector as being in the primary position. Linux should recognize this drive as /dev/hda. I am going to refer to the EIDE drive that is connected to the secondary master connector as being in the secondary position. Linux should recognize this drive as /dev/hdc.

For a SCSI system with both drives on one SCSI adapter, one drive should be configured as SCSI device id 0 (on my system removing all ID jumpers on the drive sets it to 0), and the other drive is typically configured as SCSI device id 1 (I add a jumper to the drive to set this). I am going to refer to the SCSI drive that is configured as SCSI device id 0 as being in the primary position. Linux should recognize this drive as /dev/sda. I am going to refer to the SCSI drive that is configured as SCSI device id 1 as being in the secondary position. Linux should recognize this drive as /dev/sdb. These statements assume both drives are installed. If only one drive is installed, it will be in the primary position and be recognized as /dev/sda regardless of the jumper setting.

For a SCSI system with each drive on separate SCSI adapters, both drives are typically configured as SCSI device id 0 (but I prefer to set the one on the second adapter as SCSI device id 1). You may need to determine which adapter is recognized first and which is recognized second. If the adapters are the same model by the same manufacturer this is a more difficult task. You may have to temporarily remove one of the drives to see which adapter is recognized first. Then you may want to label them. I am going to refer to the SCSI drive that is on the adapter that is recognized first as being in the primary position. Linux usually recognizes this drive as /dev/sda. I am going to refer to the SCSI drive that is on the adapter that is recognized second as being in the secondary position. Linux should recognize this drive as /dev/sdb. These statements assume both drives are installed. If only one drive is installed, it will be in the primary position and be recognized as /dev/sda regardless of which adapter it is installed on.

All the data on the two drives we use for testing will be erased. Any drive that may be used in the future to replace a drive that has failed MUST be clean. Any drive that has been used in a RAID array at any time in the past must also be cleaned. Let me explain why. Let's pretend for a moment that we used apple in a RAID array, then unplugged it and replaced it, then put it away for later use as an emergency replacement. A year from now one of our drives goes bad, so we shut the machine down and place apple in its place. Then we boot up and to our horror, the good drive has synced itself to the data stored on apple and not the other way around. To clean a drive, install it in a system by itself, and boot up using a DBAN disk. You can change the Method to Quick erase. This will write zeros to each bit on the disk. You should also have a rescue disk (like the Etch CD) available.

Install a cleaned apple in the primary position and leave pie out of the computer. A good place for the CDROM drive is the secondary slave EIDE interface. Boot up using the appropriate Debian installer media. At this time Etch is still 'testing', so look here for a CD: http://www.debian.org/devel/debian-installer/. Once Etch becomes stable, look here: http://www.debian.org/distrib/netinst.

This illustration shows what the end product of my test machine will look like. Some people use a separate /boot partition and I have only tested this setup with one in place. When installing Debian on your system you should set the partitions up in the same manner as your production box so you can gain experience with something closer to your setup. I will not detail installing Debian, but I will say that since this is just a test system you will only need to install the absolute minimum number of software packages. When using the partition manager you do not want to configure RAID in any way. You should install the GRUB boot loader. I suggest either printing (or otherwise capturing) the contents of /etc/fstab. Then, on the printed (captured) /etc/fstab, next to each current partition (/dev/hda1, /dev/sda2 etc.), add the new md partition that will replace it (/dev/md0, /dev/md1 etc.). The numbering of disk devices and md devices do not correspond one to one, especially if you have extended partitions or another OS, so having that mapped out in black and white will prove essential, e.g.:

device     mount  md-device temp-mount boot partition-type
/dev/hdc1  /boot  /dev/md0  /mnt/md0   *    primary (100MB)
/dev/hdc5  swap   /dev/md1                  logical (1GB) 
/dev/hdc6  /      /dev/md2  /mnt/md2        logical (remainder of disk)

Continue on with the Debian installer until you get to the point you can log in as root.

Now you can remove apple, install pie in its place and clean it per the instructions mentioned earlier (unless you have already done so). Then place apple back in the primary position and place pie in the secondary position and start up the computer.

Back up a few files. Your initrd.img may be a different version. If so, I suggest saving this document to your computer and doing a search and replace of the kernel version number:


cp /etc/fstab /etc/fstab-backup

cp /etc/mtab /etc/mtab-backup

cp /etc/initramfs-tools/modules /etc/initramfs-tools/modules-backup

cp /etc/modules /etc/modules-backup

cp /etc/initramfs-tools/initramfs.conf /etc/initramfs-tools/initramfs.conf-backup

cp /boot/grub/menu.lst /boot/grub/menu.lst-backup

cp /boot/initrd.img-2.6.18-4-686 /boot/initrd.img-2.6.18-4-686-backup

I personally need my vim, and we need to install our main program, mdadm (multi-disk administrator). Obviously you may choose to use a different editor:


apt-get update

apt-get install vim

apt-get install initramfs-tools mdadm

Accept the default answer (possibly 'all') when prompted.

cat /etc/default/mdadm

Should have these entries configured in a similar manner as shown:


INITRDSTART='all'

AUTOSTART=true

AUTOCHECK=true

START_DAEMON=true

DAEMON_OPTIONS="--syslog"

VERBOSE=false

USE_DEPRECATED_MDRUN=false

Now we will include the needed modules in the ramdisk image. We start by modifying a file that the compiler uses:

vim /etc/modules

and insert at the end of the list of modules:


md

raid1

raid5

raid0

Save and exit the file. This part is important to get right or our system will not boot up into the md devices. You need to copy all the modules listed in /etc/modules to /etc/initramfs-tools/modules that deal with our hard disk drives, motherboard chipset and raid (in the same order they are listed in /etc/modules). You would not need to include drivers (modules) that obviously deal with things like the CDROM drive or mouse. If you are not certain, then it is better to include it. If our hard drives are not recognized prior to our md devices, our system will not boot. You might see modules like 'ide-detect', 'ide-disk', 'ide-scsi' and others. Essentially copy all the modules to /etc/initramfs-tools/modules, then remove any that do not pertain to our hard drives (psmouse, ide-cd):


grep -vE '^$|^#' /etc/modules >> /etc/initramfs-tools/modules

vim /etc/initramfs-tools/modules

Once you have added (and possibly removed duplicated) modules there, save and exit the file. Now we make the new initrd.img. We actually end up doing this three or four different times during this setup (because our system will be going through changes):


update-initramfs -u

If the above command fails with an error message something like:
/boot/initrd.img-2.6.18-4-686 has been altered. Cannot update.
then try to fix the situation with this command:
update-initramfs -k `uname -r` -t -u

In order to load the new image into memory, we must reboot:
reboot

When the system comes back up run 'cat /proc/mdstat' to see if we now have a system capable of using a RAID array:
cat /proc/mdstat

The resulting output on my machine:

Personalities : [raid1] [raid6] [raid5] [raid4] [raid0] [linear] [multipath] [raid10]
unused devices: <none>

If [raid1] is not shown here then you are not loading the needed modules. You cannot continue past this point until you are.

Now we will copy the partition structure from apple to pie. This is one of those things you must be careful doing because this will destroy all data on the target disk. Since we have already cleaned the target disk we should not have to --force this to work (but if required, add --force to the end of the command). The target disk must be of equal or greater size than the source disk. Make sure the command reflects what you want to accomplish:

Run 'df' to first make sure we are currently using the disk you think we are:
df

Here we copy the structure from /dev/hda to /dev/hdc:
sfdisk -d /dev/hda | sfdisk /dev/hdc

Now we will use cfdisk to edit the partition table on pie (in the secondary position) and change the partition types to "Linux raid autodetect". This may also destroy all data on a disk so be careful you are editing the correct disk. To change the partition type, first use up and down arrows to select a partition, then left and right arrows to select [Type] from the menu. Press [Enter] to change the type. The type you want is 'FD' (lower case is fine). Repeat for all partitions, then [Write] the changes, then [Quit]. Your original drive should have had a partition flagged as bootable that was copied to this target drive. Make sure you don't accidentally toggle this off.

cfdisk /dev/hdc

My finished product:

                                  cfdisk 2.12r

                              Disk Drive: /dev/hdc
                        Size: 10005037056 bytes, 10.0 GB
              Heads: 16   Sectors per Track: 63   Cylinders: 19386

    Name        Flags      Part Type  FS Type          [Label]        Size (MB)
 ------------------------------------------------------------------------------
    hdc1        Boot        Primary   Linux raid autodetect               98.71*
    hdc5                    Logical   Linux raid autodetect             1003.49*
    hdc6                    Logical   Linux raid autodetect             8899.76*

At this point we will reboot again so our system properly recognizes the changes made to this disk:
reboot

Now we can start the process of creating the degraded array. We start by doing some additional cleaning of our drive in the secondary position (pie). This is to insure there are no remnants from prior RAID installations (these commands may fail if there are no superblocks to zero out - in this case failure is expected). Zero the superblock for each of the partitions we configured as type "Linux raid autodetect":


mdadm --zero-superblock /dev/hdc1

mdadm --zero-superblock /dev/hdc5

mdadm --zero-superblock /dev/hdc6

Now we create md devices for each of our partitions, with one partition present on each md device and one partition missing. The ones that are missing are on our primary drive (apple). We can't add these to our array at this time because those partitions are currently in use and they are not the of the type of partition we want. The general plan is to create the RAID structure on the first RAID disk (pie), copy all the data from the original disk (apple) to that RAID disk, reboot to that degraded RAID disk, then reformat the original disk and add it to our RAID array (at which time the two disks will begin to synchronize). There are obvious risks in doing this and the process is prone to error. One thing that could be difficult to keep track of is: a number of files related to RAID must of course be on the RAID drive. When we boot to the RAID drive, it must be configured as a RAID drive. Some people first copy all the data from the original drive to the RAID drive, then modify the RAID related files on the RAID drive prior to rebooting into it. Then if they have problems and need to make changes to the system they often make the mistake of trying to fix the RAID related files by editing the files on the original drive. They get confused. I prefer to configure everything on the original drive and then copy the data over at the very last moment. If things get really ugly we can boot up with the rescue disk and make a few changes to the original disk to enable us to boot up into it (provided we have not reformatted it yet). Then we can make the necessary changes and copy the data over once again. Anyway, lets create the needed md devices. Edit as required and then run these one at a time:


mdadm --create /dev/md0 --level=1 --raid-disks=2 missing /dev/hdc1

mdadm --create /dev/md1 --level=1 --raid-disks=2 missing /dev/hdc5

mdadm --create /dev/md2 --level=1 --raid-disks=2 missing /dev/hdc6

Once again, run 'cat /proc/mdstat':


cat /proc/mdstat

You should get something similar to this which displays the fact that one out of two disk devices are up [_U] for each of our md devices (and the other ones are missing). This is called 'degraded':

Personalities : [raid1] [raid6] [raid5] [raid4] [raid0] [linear] [multipath] [raid10]
md2 : active raid1 hdc6[1]
      8691008 blocks [2/1] [_U]

md1 : active raid1 hdc5[1]
      979840 blocks [2/1] [_U]

md0 : active raid1 hdc1[1]
      96256 blocks [2/1] [_U]

If your system does not show something with a similar structure then you must fix it before continuing. Now we create file systems on our md devices that match the file systems currently in use on our original devices. This also erases data on the target devices. I use ext3 and of course a swap partition:


mkfs.ext3 /dev/md0

mkswap /dev/md1

mkfs.ext3 /dev/md2

When the system boots up into our RAID system, it should automatically assemble at least one of the md devices we created (so we can start the boot process) but it may not assemble the rest. This could result in a failure to complete to boot process. This task of reassembling the remaining devices is handled by /etc/init.d/mdadm-raid. This init script uses the command 'mdadm -A -s -a' which means: "automatically assemble all of our md devices using the information stored in /etc/mdadm/mdadm.conf". Well, we must update the information in mdadm.conf so it correctly reflects our current state (as shown by /proc/mdstat). To do so we start by making a copy of the original mdadm.conf that was created when we installed mdadm. We will use the copy as a basis for any new mdadm.conf we create. The original file (with comments removed) looks like this on my system:


DEVICE partitions

CREATE owner=root group=disk mode=0660 auto=yes

HOMEHOST <system>

MAILADDR root


cp /etc/mdadm/mdadm.conf /etc/mdadm/mdadm.conf-original

Now we will populate mdadm.conf with information about our existing arrays:


mdadm --examine --scan >> /etc/mdadm/mdadm.conf

Now let's display the file this created (comments removed):


cat /etc/mdadm/mdadm.conf

With comments removed it should show something like this (do not use these exact UUID numbers in your own file!):


DEVICE partitions

CREATE owner=root group=disk mode=0660 auto=yes

HOMEHOST <system>

MAILADDR root

ARRAY /dev/md0 level=raid1 num-devices=2 UUID=2307ad50:dce81757:e5540b2b:ae2626a5

ARRAY /dev/md1 level=raid1 num-devices=2 UUID=0cdcc3a4:d8f14699:e5540b2b:ae2626a5

ARRAY /dev/md2 level=raid1 num-devices=2 UUID=60b4594a:14050c2f:e5540b2b:ae2626a5

Our RAID system has changed, so once again we are going to make a new initrd.img (you may notice you no longer get mdadm error messages):


update-initramfs -u

Before we attempt booting up into our md devices we are first going to do a test to insure our md devices are assembled after a reboot and that they are mountable during the boot process. First we make a mount point for each of our devices (all except the swap partition):


mkdir /mnt/md0

mkdir /mnt/md2

Run 'free' and make a note of how much total swap space you have:
free

Then edit /etc/fstab and make some changes. At the bottom of the file place directives to mount each of our md devices to the mount points we created (or swap). Here is a sample:


vi /etc/fstab

and insert (edit as needed to reflect your system):

/dev/md0        /mnt/md0        ext3    defaults        0       0
/dev/md1        none            swap    sw              0       0
/dev/md2        /mnt/md2        ext3    defaults        0       0

With these lines added my /etc/fstab now looks like this:

proc            /proc           proc    defaults        0       0
/dev/hda6       /               ext3    defaults,errors=remount-ro 0       1
/dev/hda1       /boot           ext3    defaults        0       2
/dev/hda5       none            swap    sw              0       0
/dev/hdd        /media/cdrom0   udf,iso9660 user,noauto     0       0
/dev/fd0        /media/floppy0  auto    rw,user,noauto  0       0
/dev/md0        /mnt/md0        ext3    defaults        0       0
/dev/md1        none            swap    sw              0       0
/dev/md2        /mnt/md2        ext3    defaults        0       0

Save and exit the file, then let's reboot and see if this works:


reboot

When the system comes up, run 'mount' to see if the devices were mounted.
There is no point continuing past this point unless they were:


mount

If you run 'free' again, it should show the total swap space is twice the size it was before.
Assuming you have a swap partition, you must get this working before you continue on:
free

Then run 'cat /proc/mdstat' again and verify all the md devices that used to be there are still there.
There is no point continuing past this point unless they are:


cat /proc/mdstat

OK. If everything is working (it must be working before you continue), now comes the scary part. Don't reboot until I tell you to. We are going to continue to configure RAID related files on our original drive, then we are going to copy all our data from the original devices to the md devices, then create a boot record on the secondary drive, then boot up using the md devices on the secondary drive instead of our original devices on the primary drive. There are a few things to think about as far as the copy process goes. The machine should not be in the middle of processing stuff, so you should drop into single user mode and possibly disconnect the ethernet cable. You should also consider stopping processes that may be writing to the disk (like your MTA). Here is a note on what I stopped: http://verchick.com/mecham/public_html/raid/stop.txt. Because you will be in single user mode you will run the actual copy commands at the console (not remotely). You should not delay once the data is transferred and it comes time to reboot. If you successfully boot up into the md devices then be aware that the data on the original drive will soon become stale, so in the event you need to boot back into the original drive be aware that you may loose data. Hopefully you will have no need to do that and your new RAID devices will hold current data from now on. If you are able to boot up using your md devices, the scary part is over (but opportunities to destroy your system still remain). We are going to configure /etc/fstab and /etc/mtab to boot up into the md devices; we are going to create another initrd.img that knows about our md devices and we are going to tell grub to boot into our md devices. We will also configure grub to boot from our secondary drive. We will start by editing /etc/fstab again. We must remove (or comment out) the lines we added previously (they were just a test), then change the corresponding /dev/hda devices in /etc/fstab to /dev/md devices:
vi /etc/fstab

and modify it in a similar manner to this sample. Of course the mount points must correctly correspond to appropriate md devices. Refer to the notes you should have made. My finished /etc/fstab file will look like this:

proc            /proc           proc    defaults        0       0
/dev/md2        /               ext3    defaults,errors=remount-ro 0       1
/dev/md0        /boot           ext3    defaults        0       2
/dev/md1        none            swap    sw              0       0
/dev/hdd        /media/cdrom0   iso9660 ro,user,noauto  0       0
/dev/fd0        /media/floppy0  auto    rw,user,noauto  0       0

Save and exit the file. I also find it useful to edit /etc/mtab to reflect what our new system will look like:
vi /etc/mtab

Here is mine before the changes:


/dev/hda6 / ext3 rw,errors=remount-ro 0 0

[...other stuff...]

/dev/hda1 /boot ext3 rw 0 0

/dev/md0 /mnt/md0 ext3 rw 0 0

/dev/md2 /mnt/md2 ext3 rw 0 0

and after I edited it in the same manner I edited /etc/fstab (note that I removed /dev/md0 and /dev/md2) it shows:


/dev/md2 / ext3 rw,errors=remount-ro 0 0

[...other stuff...]

/dev/md0 /boot ext3 rw 0 0

Make sure there are no mistakes in /etc/fstab or /etc/mtab. Once again we would update our initrd.img:


update-initramfs -u

Now we are going to update the GRUB menu. Edit grub's menu.lst:


vi /boot/grub/menu.lst

We are going to add a new menu item that tells grub to boot from our secondary drive (grub refers to it as hd1). We will also add a fallback entry that (hopefully) will automatically choose the next item in the menu if the first item fails. So, just below "default 0", add this entry:

fallback 1

Make a duplicate of your existing top menu stanza, place the duplicate above the existing stanza and modify it in the same manner I have. I changed hd0 to hd1 and /dev/hda6 to /dev/md2. This example shows partition 0 is the partition flagged as bootable on my system. You can run something like 'fdisk -l /dev/hda' to determine which partition is bootable on your system but your original stanza will be correct. Remember that grub starts counting from zero:

title           Debian GNU/Linux, kernel 2.6.18-4-686 RAID (hd1)
root            (hd1,0)
kernel          /vmlinuz-2.6.18-4-686 root=/dev/md2 ro
initrd          /initrd.img-2.6.18-4-686
savedefault

title           Debian GNU/Linux, kernel 2.6.18-4-686
root            (hd0,0)
kernel          /vmlinuz-2.6.18-4-686 root=/dev/hda6 ro
initrd          /initrd.img-2.6.18-4-686
savedefault

Just a note: because we mount a /boot partition, you will not see the above entries in the form "/boot/initrd.img-2.6.18-4-686". If you do not mount a /boot partition, you will see the entries in that form.

If you have been following this HOWTO correctly, our md devices will still be mounted to the mount points we had in /etc/fstab when we booted up. Note however that if you run the 'mount' command at this time, it will lie to you because we have edited /etc/mtab, so don't let this mislead you. If they are not mounted for some reason (shame on you, I told you not to reboot), you will need to remount them. For example: "mount /dev/md2 /mnt/md2", "mount /dev/md0 /mnt/md0". Now we are going to copy our data. In my case I want to copy all the data in the root partition to /mnt/md2, and all the data in the /boot partition to /mnt/md0. The copy from root to the md mount point is straightforward but other mount points such as /boot are not as straightforward. For those I first change to that directory, then use the period (.) to signify "here". In other words "copy from here to there" as opposed to "copy this to that". The prevents me from copying /boot to /mnt/md0 and ending up with a /mnt/md0/boot directory instead of a /mnt/md0 directory containing all the files in the /boot directory.

At the console get into single user mode:


init 1

then work on the copy process. All files on the disk need to get copied so use your head:


cp -dpRx / /mnt/md2



cd /boot

cp -dpRx . /mnt/md0

Run some tests and make sure the source and destination match for each mount point. Fix it if they don't:


ls -al /

ls -al /mnt/md2



ls -al /boot

ls -al /mnt/md0

On my system, grub was booting off of hard disk 0, partition 0, and it was told root was mounted on /dev/hda6. Now I have instructed it to boot off of hard disk 1, partition 0, and I told it root is mounted on /dev/md2. Now I must install grub on hard disk 1, partition 0 (the secondary drive). Start the grub shell prompt:

grub

at the grub> prompt enter these commands to install grub on both drives (edit partition number if needed):


root (hd0,0)

setup (hd0)

root (hd1,0)

setup (hd1)

quit

Hopefully those commands succeeded. If there was a failure I'm not sure you should continue. OK. Now comes the butterflies in your stomach.
Knock on wood, throw salt over your shoulder, rub your lucky rabbit's foot, cross your fingers.


reboot

If it crashes, don't freak out just yet. Read this. If it reboots, run 'df' and check that it is in fact our md devices we are using. Run 'cat /proc/mdstat' again and insure all md devices are shown there, If all is well, we are no longer using the original drive. If all is not well, it must be fixed before we continue:


df

cat /proc/mdstat

My df looked like this:

Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/md2               8554524    624688   7495288   8% /
tmpfs                   258344         0    258344   0% /lib/init/rw
udev                     10240        88     10152   1% /dev
tmpfs                   258344         0    258344   0% /dev/shm
/dev/md0                 93207     26191     62204  30% /boot


free

Shows that swap has returned to its normal size once again.. OK. Now we will reformat the original drive (apple /dev/hda) and then add it to our array. I hope everything is working great so far and all our files were successfully copied because we now must destroy all data on the original drive. Run cfdisk on the original drive and (just as we did for our secondary drive) change the type of each partition to type "FD" (lower case is fine). This is the part where (if you are working on a production box) you should have a good backup of the drive because this will destroy all the data on the original disk:

cfdisk /dev/hda

Change all the partition types, then write and quit. Make sure you have not toggled off the boot flag.

Now we can add the partitions on /dev/hda to our RAID array. Edit this if necessary to suit your system. Do this one at a time:


mdadm --add /dev/md2 /dev/hda6

mdadm --add /dev/md0 /dev/hda1

mdadm --add /dev/md1 /dev/hda5

Now you will just have to WAIT until the disks synchronize. NEVER REBOOT while disks are synchronizing.
You can monitor the progress with:


watch -n 6 cat /proc/mdstat

Mine looks like this after a while. Notice we now are using both drives and md2 has fully synced:

Personalities : [raid1] [raid6] [raid5] [raid4] [raid0]
md2 : active raid1 hda6[0] hdc6[1]
      8691008 blocks [2/2] [UU]

md1 : active raid1 hda5[2] hdc5[1]
      979840 blocks [2/1] [_U]
      [=>...................]  recovery =  8.3% (82368/979840) finish=0.9min speed=16473K/sec

md0 : active raid1 hda1[2] hdc1[1]
      96256 blocks [2/1] [_U]
        resync=DELAYED
			  
unused devices: <none>

Of course, it's [Ctrl]+c to cancel 'watch'. Once the sync has completed (and not until then), we need to tell mdadm.conf about our new drives and make another initrd.img (for the last time):


cp /etc/mdadm/mdadm.conf-original /etc/mdadm/mdadm.conf

mdadm --examine --scan >> /etc/mdadm/mdadm.conf

Note that there may be an alternate way (a nice script) to create a new mdadm.conf:
/usr/share/mdadm/mkconf force-generate

Either way, this should still show all our arrays are present:


cat /etc/mdadm/mdadm.conf

For the last time:


update-initramfs -u

We need to edit grub's menu.lst one last time. We are booting off of the secondary drive (and will continue to do so) but now if that should fail we want it to boot off the primary drive (now also configured as a RAID device). Make a copy of the first menu choice stanza, place it in the second position, and modify it in a manner similar to the provided sample:

vi /boot/grub/menu.lst

title           Debian GNU/Linux, kernel 2.6.18-4-686 RAID (hd1)
root            (hd1,0)
kernel          /vmlinuz-2.6.18-4-686 root=/dev/md2 ro
initrd          /initrd.img-2.6.18-4-686
savedefault

title           Debian GNU/Linux, kernel 2.6.18-4-686 RAID (hd0)
root            (hd0,0)
kernel          /vmlinuz-2.6.18-4-686 root=/dev/md2 ro
initrd          /initrd.img-2.6.18-4-686
savedefault

While you are at it, modify the '# kopt=root=' line to reflect our current situation (I changed /hda6 to /md2).
Don't remove the # in front of it, it has meaning. (double ## are comments in this special AUTOMAGIC section):


# kopt=root=/dev/md2 ro

These options are read by the update-grub script (which you should not run - it will overwrite our stanzas). If you upgrade the kernel, it's possible update-grub may run. If this is the case then only one of the two stanzas may remain (booting to hd0,0). This means your system will only want to boot from hd0,0. I suggest manually adding the stanza that boots from hd1,0 back in, otherwise you may have to move the secondary disk into the primary position should the primary disk fail (which is something I do anyway). If everything is working properly you should remove the menu stanza that boots to a non-raid partition. You would corrupt your system if you were to boot up to something like /dev/hda6 and edit files on that drive.

Your system is complete. You need to reboot one more time to make sure it comes up, and because we changed the partition table on hda.

OK, now I'm going to simulate a failed drive. I don't recommend you try this (your system may explode), but at least you can learn from my system. I am carefully going to remove the power cable from the primary drive, apple. Once I do this, it will be "dirty" and should not be used again in this system without first being cleaned. This is what mdstat shows as a result: hda1 and hda5 still show they are up because we have not had any read/write operations on them recently, hda6 shows it has failed (Faulty).

md0 : active raid1 hda1[0] hdc1[1]
      96256 blocks [2/2] [UU]

md1 : active raid1 hda5[0] hdc5[1]
      979840 blocks [2/2] [UU]

md2 : active raid1 hda6[2](F) hdc6[1]
      8691008 blocks [2/1] [_U]

If your hardware supports hot swappable drives I think you should mark the remaining two devices faulty (since they actually are on a failed drive), then use mdadm to remove all three faulty devices from our array before inserting the new drive. You cannot use "mdadm --remove" on devices that are in use, so they need to be set as faulty first. You do not need to do this if you are going to power down the system and replace the drive with a clean drive. Make doubly sure you are failing the partitions on the drive that has failed!

Only needed if using hot-swap drives and you are not going to power down:


mdadm --set-faulty /dev/md0 /dev/hda1

mdadm --set-faulty /dev/md1 /dev/hda5

mdadm --remove /dev/md0 /dev/hda1

mdadm --remove /dev/md1 /dev/hda5

mdadm --remove /dev/md2 /dev/hda6

Shut it down:

shutdown -h now

For consistency (and to keep my sanity) I always move the good drive to the primary position (if it is not already there) and place the new clean drive in the secondary position. We have shut down, so disconnect the good drive, clean apple, move pie (the good drive) into the primary position, place the cleaned apple in the secondary position and bring the system back up. When using SCSI drives all I have to do to swap the two SCSI drives is move the jumper from one drive to the other. OK, my system did boot up.
First we see what's going on (cat /proc/mdstat). As you can see, hdc1, hdc5 and hdc6 are missing:

md0 : active raid1 hda1[1]
      96256 blocks [2/1] [_U]

md1 : active raid1 hda5[1]
      979840 blocks [2/1] [_U]

md2 : active raid1 hda6[1]
      8691008 blocks [2/1] [_U]

We start by copying the partition structure from /dev/hda to /dev/hdc. We do this for what should now be an obvious reason: the secondary drive is empty, but it needs to have the same structure as the primary drive. If the disk was first cleaned, and is large enough, you should have no errors (but you may still have to --force it):


sfdisk -d /dev/hda | sfdisk /dev/hdc

We make sure the superblocks are zeroed out on the new drive (as always, be careful you do this to the correct drive). Edit as needed:

mdadm --zero-superblock /dev/hdc1

mdadm --zero-superblock /dev/hdc5

mdadm --zero-superblock /dev/hdc6

Now we add our three hdc partitions to the corresponding md's.
Understand what you are doing here before you do it, edit as needed:


mdadm --add /dev/md0 /dev/hdc1

mdadm --add /dev/md1 /dev/hdc5

mdadm --add /dev/md2 /dev/hdc6

Watch them sync:

watch -n 6 cat /proc/mdstat

Once the recovery is complete (and not until then), create a new boot records on both drives:


grub

From the grub> prompt (edit partition number if needed):


root (hd0,0)

setup (hd0)

root (hd1,0)

setup (hd1)

quit

We are working again.

You might want to reboot from the console to make sure you actually boot from the secondary drive.

You should never experiment with this next step on a production system because it will trash your array and you could loose data. If you need to prove to yourself that each drive will boot up when it is the only drive in the system, you could boot up using each drive with the other one missing. As soon as a drive boots up, log in and run 'shutdown -h now' to shut it back down. Then try the other drive. Then if you care in the least about the integrity of the data on the system you should clean one of the drives and install it just as you would if you were replacing a failed drive. It's not a good idea to fire up the system using both drives if each drive has been started independently. Also, in A RAID system it is a good idea to avoid kernel version upgrades (security upgrades should be performed of course).

References (alphabetical order). Not all of these are good, but all were interesting to me in one way or another. Trust me, there are a lot more documents similar to these out there:
http://alioth.debian.org/project/showfiles.php?group_id=30283&release_id=288
http://deb.riseup.net/storage/software-raid/
http://forums.whirlpool.net.au/forum-replies-archive.cfm/471585.html
http://nepotismia.com/debian/raidinstall/
http://nst.sourceforge.net/nst/docs/user/ch14.html
http://piirakka.com/misc_help/Linux/raid_starts_degraded.txt
http://thegoldenear.org/toolbox/unices/server-setup-debian.html
http://togami.com/~warren/guides/remoteraidcrazies/
http://www.debian-administration.org/articles/238
http://www.debian-administration.org/users/philcore/weblog/4
http://www.doorbot.com/guides/linux/x86/grubraid/
http://www.epimetrics.com/topics/one-page?page_id=421&topic=Bit-head%20Stuff&page_topic_id=120
http://www.james.rcpt.to/programs/debian/raid1/
http://www.linuxjournal.com/article/5898
http://www.linuxsa.org.au/mailing-list/2003-07/1270.html
http://www.linux-sxs.org/storage/raid_setup.html
http://www.parisc-linux.org/faq/raidboot-howto.html
http://www.tldp.org/HOWTO/Software-RAID-HOWTO.html
http://trinityhome.org/misc/bootable-raid1.html
http://www.vermittlungsprovision.net/367.html
http://xtronics.com/reference/SATA-RAID-debian-for-2.6.html

Gary V
mr88talent at yahoo dot com
Last edited 21 MAR 2008