mdadm


All the things I always forget about mdadm

mdadm cheatsheet

useful commands

storageDevices has the usual hard disk commands.

df
use df -Th to list partitions and types.

blkid
blkid shows you UUIDs, useful for adding things to fstab or identifying drives

mdadm

mdadm --detail /dev/md0 gives you a nice overview of what's going on.. this is usually the place to start.

mdadm --examine /dev/sd[a-z]1 | egrep 'Event|/dev/sd' a rad command from this askubuntu question If a device has been accidentally removed this will tell you how many events have occurred since it was removed.

/proc/mdstat

cat /proc/mdstat will give you some fairly useless information.. or at least it's not useful if your array is clean, but degraded.

devices or partitions

there's no difference in performance aparently, the main difference is that if you use partitions then you can use any drive which is big enough to hold that partition. I use partitions.

adding a drive

see ask ubuntu question.

install the gdisk package then

sdY is the new disk... don't mess up the -R command below.. if you have a degraded array messing it up will obviously ruin one of your good drives. edit: actually.. if you did this by mistake you could recreate the partition table you just destroyed by copying it from another drive.

sgdisk -R /dev/sdY /dev/sdX
sgdisk -G /dev/sdY

The first command copies the partition table of sdX to sdY (be careful not to mix these up). The second command randomizes the GUID on the disk and all the partitions. This is only necessary if the disks are to be used in the same machine, otherwise it's unnecessary.

now you can fdisk -l /dev/sdY to see the new partition table. yes it's exactly the same size as the one you just copied. that's pretty rad.

now add it to your array with mdadm --add /dev/md0 /dev/sdY1

device notation

I don't think /dev/sdx type notation is ever stable. That is, this boot a device might be /dev/sda, but there's no guarantee that it will be next time. Try to avoid paying much attention to it.

fstab

use df -Th to get the partition type and blkid to get the UUID

Add the array to fstab with a line like this :

UUID="988fb9fe-0ef3-4a02-ab1b-0f9405867cbd" /srv ext4 defaults 0 2

accidentially removed a device

http://askubuntu.com/questions/304672/how-to-re-add-accidentally-removed-hard-drive-in-raid5

You can just

mdadm --add /dev/md0 /dev/sdb1

or so ...

diagnostic flow

mdadm --detail /dev/md0

gives you something like:

root@hmsvr:~# mdadm --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Fri Jan  9 17:48:35 2015
     Raid Level : raid5
     Array Size : 5859839232 (5588.38 GiB 6000.48 GB)
  Used Dev Size : 1953279744 (1862.79 GiB 2000.16 GB)
   Raid Devices : 4
  Total Devices : 3
    Persistence : Superblock is persistent

    Update Time : Tue Feb 23 12:48:49 2016
          State : clean, degraded
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 256K

           Name : hmsvr:0  (local to host hmsvr)
           UUID : 20b99061:7df99cfc:01b504b4:94401d9c
         Events : 696615

    Number   Major   Minor   RaidDevice State
       0       8       49        0      active sync   /dev/sdd1
       1       8       33        1      active sync   /dev/sdc1
       4       0        0        4      removed
       4       8       65        3      active sync   /dev/sde1

As you can see here.. a disk has been removed, so it's either dead or just not detected during boot or some such.

As an alternative, you can use lsblk to see what devices are connected:

root@hmsvr:~# lsblk
NAME    MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
sda       8:0    0  1.8T  0 disk
└─sda1 8:1 0 1.8T 0 part
└─md0 9:0 0 7.3T 0 raid5 /srv sdb 8:16 0 1.8T 0 disk
└─sdb1 8:17 0 1.8T 0 part
└─md0 9:0 0 7.3T 0 raid5 /srv sdc 8:32 0 1.8T 0 disk
└─sdc1 8:33 0 1.8T 0 part
└─md0 9:0 0 7.3T 0 raid5 /srv sdd 8:48 0 1.8T 0 disk
└─sdd1 8:49 0 1.8T 0 part
sde 8:64 0 1.8T 0 disk
└─sde1 8:65 0 1.8T 0 part
└─md0 9:0 0 7.3T 0 raid5 /srv sdg 8:96 1 57.9G 0 disk
├─sdg1 8:97 1 512M 0 part /boot/efi ├─sdg2 8:98 1 49.5G 0 part / └─sdg3 8:99 1 7.9G 0 part [SWAP]

A dead device won't appear in this list, but if it's only a messed up partition or array config, you'll still see the device. In the case of the output above, somehow sdd1 has become disconnected from the array. It's still in the list but "removed" from the mdadm output.

As an aside, I guess this might have happened one of the many times my array has become degraded due to a power outage or something, and I've just recreated it by searching for partitions.

If the device is still shown in lsblk, then it's probably worth just trying to add it back into the array with a fresh partition table.

If the device isn't listed by lsblk then its 'dead'. Figuring out which drive is which is a process of getting the serial numbers of the devices which are still in the array with smartctl.. next you remove the old one, add a new one, and you're good to go.

You can udevadm info --query=all --name=/dev/sda |grep ID_SERIAL each device to get the serials of the working devices.

degraded array won't boot

If one disk dies in a raid5 array, the array will still work but of course you don't have any spare. However, the array won't mount if the event count doesn't match on all disks. This can happen if your host isn't shut down correctly like in a power failure or something.

Weirdly, even though my OS doesn't reside on my array, this problem prevented my OS from booting and I was stuck with some weird recovery prompt.

If you wanted to, you could examine the event counts on drives and frett over how big a problem this is. In my own case, I don't care very much about the data in the array, and I figured the data was either ok or it wasn't. Anyhow you can just do something like:

mdadm --assemble --force /dev/sd[a-z]

questions

notification
users receive 'mail' when the array is degraded. if you see "you have new mail" when you log in.. check it.