mdadm

mdadm


All the things I always forget about mdadm

mdadm cheatsheet

useful commands

storageDevices has the usual hard disk commands.

df
use df -Th to list partitions and types.

blkid
blkid shows you UUIDs, useful for adding things to fstab or identifying drives

mdadm

mdadm --detail /dev/md0 gives you a nice overview of what's going on.. this is usually the place to start.

mdadm --examine /dev/sd[a-z]1 | egrep 'Event|/dev/sd' a rad command from this askubuntu question If a device has been accidentally removed this will tell you how many events have occurred since it was removed.

/proc/mdstat

cat /proc/mdstat will give you some fairly useless information.. or at least it's not useful if your array is clean, but degraded.

devices or partitions

there's no difference in performance aparently, the main difference is that if you use partitions then you can use any drive which is big enough to hold that partition. I use partitions.

adding a drive

see ask ubuntu question.

install the gdisk package then

sdY is the new disk... don't mess up the -R command below.. if you have a degraded array messing it up will obviously ruin one of your good drives. edit: actually.. if you did this by mistake you could recreate the partition table you just destroyed by copying it from another drive.

sgdisk -R /dev/sdY /dev/sdX
    sgdisk -G /dev/sdY
    

The first command copies the partition table of sdX to sdY (be careful not to mix these up). The second command randomizes the GUID on the disk and all the partitions. This is only necessary if the disks are to be used in the same machine, otherwise it's unnecessary.

now you can fdisk -l /dev/sdY to see the new partition table. yes it's exactly the same size as the one you just copied. that's pretty rad.

now add it to your array with mdadm --add /dev/md0 /dev/sdY1

device notation

I don't think /dev/sdx type notation is ever stable. That is, this boot a device might be /dev/sda, but there's no guarantee that it will be next time. Try to avoid paying much attention to it.

fstab

use df -Th to get the partition type and blkid to get the UUID

Add the array to fstab with a line like this :

UUID="988fb9fe-0ef3-4a02-ab1b-0f9405867cbd" /srv ext4 defaults 0 2
    

accidentially removed a device

http://askubuntu.com/questions/304672/how-to-re-add-accidentally-removed-hard-drive-in-raid5

You can just

mdadm --add /dev/md0 /dev/sdb1
    

or so ...

diagnostic flow

mdadm --detail /dev/md0
    

gives you something like:

root@hmsvr:~# mdadm --detail /dev/md0
    /dev/md0:
            Version : 1.2
      Creation Time : Fri Jan  9 17:48:35 2015
         Raid Level : raid5
         Array Size : 5859839232 (5588.38 GiB 6000.48 GB)
      Used Dev Size : 1953279744 (1862.79 GiB 2000.16 GB)
       Raid Devices : 4
      Total Devices : 3
        Persistence : Superblock is persistent
    
        Update Time : Tue Feb 23 12:48:49 2016
              State : clean, degraded
     Active Devices : 3
    Working Devices : 3
     Failed Devices : 0
      Spare Devices : 0
    
             Layout : left-symmetric
         Chunk Size : 256K
    
               Name : hmsvr:0  (local to host hmsvr)
               UUID : 20b99061:7df99cfc:01b504b4:94401d9c
             Events : 696615
    
        Number   Major   Minor   RaidDevice State
           0       8       49        0      active sync   /dev/sdd1
           1       8       33        1      active sync   /dev/sdc1
           4       0        0        4      removed
           4       8       65        3      active sync   /dev/sde1
    

As you can see here.. a disk has been removed, so it's either dead or just not detected during boot or some such.

Figuring out which drive is which is a process of getting the serial numbers of the devices which are still in the array with smartctl.. next you remove the old one, add a new one, and you're good to go.

degraded array won't boot

If one disk dies in a raid5 array, the array will still work but of course you don't have any spare. However, the array won't mount if the event count doesn't match on all disks. This can happen if your host isn't shut down correctly like in a power failure or something.

Weirdly, even though my OS doesn't reside on my array, this problem prevented my OS from booting and I was stuck with some weird recovery prompt.

If you wanted to, you could examine the event counts on drives and frett over how big a problem this is. In my own case, I don't care very much about the data in the array, and I figured the data was either ok or it wasn't. Anyhow you can just do something like:

mdadm --assemble --force /dev/sd[a-z]

questions

notification
users receive 'mail' when the array is degraded. if you see "you have new mail" when you log in.. check it.