After an upgrade to Ubuntu 11.04 on my computer at work I noticed one of my SSD drives were dead in my RAID1 (mirror) and this is how I fixed the problem as well as adding some monitoring to being noticed next time around something breaks.
I managed to get the drive working merely by physically plug it out and in again, but my RAID were still obviously screwed so I had to sync the drives.
First, I checked the current status, and this can be done in two ways.
And, as the output stated my second drive (/dev/sdb2) were not in sync.
First I removed it from the RAID altogether.
mdadm /dev/md0 --remove /dev/sdb2
Then I re-added it as such
mdadm /dev/md0 --re-add /dev/sdb2
You’ll see in syslog that the RAID is being rebuilt, but I’ll rather do this for some more information
You’ll see this: [U_] if one of the disks is faulty, and [UU] if both are up.
Now to the next step.
I noticed this mainly because I rebooted the computer, and with a >100 day uptime this could’ve been broken for ages, and of course this is quite dangerous if the other disk were to fail.
So I used mdadm’s built-in monitoring to send me an e-mail as soon as something would go wrong.
I run postfix locally and this is how I start the process.
/sbin/mdadm --monitor --pid-file /var/run/mdadm/monitor.pid --daemonise --scan --syslog --mail=your@email.here
You can actually try this out by breaking your RAID on purpose using these commands, but don’t forget to rebuild it as explained above!
mdadm /dev/md0 --fail /dev/sdb2
mdadm /dev/md0 --remove /dev/sdb2
This will send you an e-mail with the output from /proc/mdstat as well as a short explanation of what mdadm think is wrong.