Linux Mirroring Recovery

A hard drive failure makes Linux raid uncoperative. Physically removing the drive makes it difficult to remove it from the raid array. However no data is lost after learning the appropriate raid tool commands. Detailed below are the steps and command to get the system to a redundant state again.

Picture of a hard disk.

What Failed?

My computer had 5 drives in it. A 128GB solid state, 2 x 1TB hard disks, and 2 x 500MB hard disks. The solid state and a partition on the failed 1TB was mirrored using software raid level 1. The hard disk was set to write-mostly, so that most of the reads would just go to the solid state drive. Additionally all four hard disks had LVM volumes.

After getting an email from the system about the degraded raid array, I proceeded to remove the drive. After booting however, the LVM and raid volumes still referenced the failed drive. And this is where I had problems: LVM had no issues removing the drive, but the raid setup just would not cooperate.

Cleaning up LVM

To inspect the current state of the LVM array I run the following:

$ lvs -a -o +devices

The output shows me each logical volume, it's mirrors, and the physical volumes that those mirrors are on.

LVM has a nice command to drop all missing pvs. This will drop any mirrors and you'll be running in a degraded state until a new drive is put in.

$ vgreduce --removemissing debian

My volume group is called debian, you will want to use your name for it. However, the complains because it wants to be safe. The --force flag will really remove the physical volume, so be careful.

$ vgreduce --removemissing --force debian

Lastly, I inspect the LVM state again using the same command as before:

$ lvs -a -o +devices

Cleaning up RAID

First I inspected the RAID state with the following:

$ mdadm --detail /dev/md/deb64

Which gives a summary at the bottom similar to the following:

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       0        0        1      removed

Ok, that's great! The RAID system knows it's not there. I will just need to remove the mirror. So I run the command to remove missing devices.

$ mdadm --manage /dev/md/deb64 --remove detached

No output, I will just check the detail command to confirm it worked:

$ mdadm --detail /dev/md/deb64

It didn't work!

Trying again and with slightly different flags did not work. A post, "Repair Degraded RAID 5", on the Ubuntu forums provides a sort of solution: Recreate the RAID device. Since my raid device was root, I had to boot into a live usb install and continue the repairs from the live usb.

First, I unmount and stop the RAID device:

$ umount /deb64
$ mdadm --detail /dev/md/deb64
$ mdadm --stop /dev/md/deb64

I record the UUID of that raid device at this point. You will need this UUID to boot properly.

Second, clear the RAID metadata so that when the RAID device is re-created it will not attempt to find the missing device. I make sure that I have the UUID written down, safe and sound.

$ mdadm --zero-superblock /dev/sda1

It is important to specify the UUID, especially if it is mounted as / (root). Otherwise, GRUB may not be able to find it.

$ mdadm --create \
        --level=1 \
        --raid-devices=1 \
        --force \ \
        --uuid=be6bdc2d:ce6c1eef:3de0b51b:1d609063 \
        /dev/md/deb64 /dev/sda1

The --force option is necessary because a RAID1 with 1 drive is not accepted by default.

Adding a New Drive to LVM

Adding the new drive in LVM is fairly straight forward if you are familiar with lvm. However I want to make the mirror log more robust by with the --mirrorlog mirrored option.

I create the partition using GParted, format it as an LVM PV, and add it to my volume group:

$ pvcreate /dev/sdd1
$ vgextend debian /dev/sdd1

Now, I can convert each logical volume to mirror on the newly added device. This first example converts an unmirrored volume on /dev/sde3 to a mirrored volume on /dev/sde3 and /dev/sdd1. Use --alloc anywhere to allow the mirrored mirror logs to be create on the same drives as the volumes reside on.

$ lvconvert -m 1 \
            --mirrorlog mirrored \
            --alloc anywhere \
            /dev/debian/home /dev/sd3 /dev/sdd1

Michael explains the mirroring in more detail in his "LVM mirroring: the right way" post.

Adding a New Drive to RAID

This was tricky again because removing the device was done in not a standard way, so searching for re-adding a device were red herrings. Instead simply adding the device as a new device will work. The trick is to "grow" the volume so that it can accomodate more devices.

$ mdadm --grow /dev/md/deb64 --raid-devices=2

After growing the array, I add the new device.

$ mdadm --manage /dev/md/deb64 --add --write-mostly /dev/sdd2

I check the array and the new drive is syncing. However, it is listed as a spare. This is no problem: The drive will be marked as working once it has finished syncing.

It Works!

With the LVM and RAID devices now fixed, I can rest assured that my data is safe again. This and previous drive failures showed me that redundancy reduces a lot of worrying. When the next hard drive failure comes along, I'll know what to do now that I've dealt with both LVM and Linux software RAID recovery.