mdadm cheatsheet
useful commands
df
use df -Th
to list partitions and types.
blkid
blkid
shows you UUIDs, useful for adding things to fstab or identifying drives
mdadm
mdadm --detail /dev/md0
gives you a nice overview of what's going on.. this is usually the place to start.
mdadm --examine /dev/sd[a-z]1 | egrep 'Event|/dev/sd'
a rad command from this askubuntu question If a device has been accidentally removed this will tell you how many events have occurred since it was removed.
/proc/mdstat
cat /proc/mdstat
will give you some fairly useless information.. or at least it's not useful if your array is clean, but degraded.
devices or partitions
there's no difference in performance aparently, the main difference is that if you use partitions then you can use any drive which is big enough to hold that partition. I use partitions.
adding a drive
see ask ubuntu question.
install the gdisk
package then
sdY is the new disk... don't mess up the -R command below.. if you have a degraded array messing it up will obviously ruin one of your good drives. edit: actually.. if you did this by mistake you could recreate the partition table you just destroyed by copying it from another drive.
sgdisk -R /dev/sdY /dev/sdX
sgdisk -G /dev/sdY
The first command copies the partition table of sdX to sdY (be careful not to mix these up). The second command randomizes the GUID on the disk and all the partitions. This is only necessary if the disks are to be used in the same machine, otherwise it's unnecessary.
now you can fdisk -l /dev/sdY
to see the new partition table. yes it's exactly the same size as the one you just copied. that's pretty rad.
now add it to your array with mdadm --add /dev/md0 /dev/sdY1
device notation
I don't think /dev/sdx
type notation is ever stable. That is, this boot a device might be /dev/sda, but there's no guarantee that it will be next time. Try to avoid paying much attention to it.
fstab
use df -Th
to get the partition type and blkid
to get the UUID
Add the array to fstab with a line like this :
UUID="988fb9fe-0ef3-4a02-ab1b-0f9405867cbd" /srv ext4 defaults 0 2
accidentially removed a device
http://askubuntu.com/questions/304672/how-to-re-add-accidentally-removed-hard-drive-in-raid5
You can just
mdadm --add /dev/md0 /dev/sdb1
or so ...
diagnostic flow
mdadm --detail /dev/md0
gives you something like:
root@hmsvr:~# mdadm --detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Fri Jan 9 17:48:35 2015
Raid Level : raid5
Array Size : 5859839232 (5588.38 GiB 6000.48 GB)
Used Dev Size : 1953279744 (1862.79 GiB 2000.16 GB)
Raid Devices : 4
Total Devices : 3
Persistence : Superblock is persistent
Update Time : Tue Feb 23 12:48:49 2016
State : clean, degraded
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 256K
Name : hmsvr:0 (local to host hmsvr)
UUID : 20b99061:7df99cfc:01b504b4:94401d9c
Events : 696615
Number Major Minor RaidDevice State
0 8 49 0 active sync /dev/sdd1
1 8 33 1 active sync /dev/sdc1
4 0 0 4 removed
4 8 65 3 active sync /dev/sde1
As you can see here.. a disk has been removed, so it's either dead or just not detected during boot or some such.
As an alternative, you can use lsblk to see what devices are connected:
root@hmsvr:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 1.8T 0 disk
└─sda1 8:1 0 1.8T 0 part
└─md0 9:0 0 7.3T 0 raid5 /srv
sdb 8:16 0 1.8T 0 disk
└─sdb1 8:17 0 1.8T 0 part
└─md0 9:0 0 7.3T 0 raid5 /srv
sdc 8:32 0 1.8T 0 disk
└─sdc1 8:33 0 1.8T 0 part
└─md0 9:0 0 7.3T 0 raid5 /srv
sdd 8:48 0 1.8T 0 disk
└─sdd1 8:49 0 1.8T 0 part
sde 8:64 0 1.8T 0 disk
└─sde1 8:65 0 1.8T 0 part
└─md0 9:0 0 7.3T 0 raid5 /srv
sdg 8:96 1 57.9G 0 disk
├─sdg1 8:97 1 512M 0 part /boot/efi
├─sdg2 8:98 1 49.5G 0 part /
└─sdg3 8:99 1 7.9G 0 part [SWAP]
A dead device won't appear in this list, but if it's only a messed up partition or array config, you'll still see the device. In the case of the output above, somehow sdd1 has become disconnected from the array. It's still in the list but "removed" from the mdadm output.
As an aside, I guess this might have happened one of the many times my array has become degraded due to a power outage or something, and I've just recreated it by searching for partitions.
If the device is still shown in lsblk, then it's probably worth just trying to add it back into the array with a fresh partition table.
If the device isn't listed by lsblk then its 'dead'. Figuring out which drive is which is a process of getting the serial numbers of the devices which are still in the array with smartctl
.. next you remove the old one, add a new one, and you're good to go.
You can udevadm info --query=all --name=/dev/sda |grep ID_SERIAL
each device to get the serials of the working devices.
degraded array won't boot
If one disk dies in a raid5 array, the array will still work but of course you don't have any spare. However, the array won't mount if the event count doesn't match on all disks. This can happen if your host isn't shut down correctly like in a power failure or something.
Weirdly, even though my OS doesn't reside on my array, this problem prevented my OS from booting and I was stuck with some weird recovery prompt.
If you wanted to, you could examine the event counts on drives and frett over how big a problem this is. In my own case, I don't care very much about the data in the array, and I figured the data was either ok or it wasn't. Anyhow you can just do something like:
mdadm --assemble --force /dev/sd[a-z]
questions
notification users receive 'mail' when the array is degraded. if you see "you have new mail" when you log in.. check it.