Goodbye Big Data — mdadm: No recognizable superblock

Short version for googlers: if your md devices disappear and mdadm claims to find no superblock, try this:

sudo mdadm --examine --scan --config=mdadm.conf >> /etc/mdadm/mdadm.conf

HiringSolvedAs a bootstrapped Big Data startup, we have had to get pretty creative in our approach to our crawler infrastructure. Hence, lots of cheap disks and some software raid via mdadm. We have also been creative in managing our infrastructure, using my rusty old SA skills to get the job done some of the time. That’s creative because I’m the CEO and was never really an SA in the first place. It’s been ages since I worked on this stuff. Also my mind is mostly spending cycles on recruiting, sales, marketing, product development these days. So today when when our arrays would not mount, our md0 and md1 devices had disappeared from /dev and mdadm was saying fun things like “mdadm: no recogniseable superblock”, it was a facepalm moment for me.

The excitement happened after a rare and innocent reboot on one of our crawler-data machines today. On boot, the machine reported that the MD disks had no superblocks. The md devices were not showing up in /dev at all. This was happening on multiple arrays with multiple disks each and the machine was shutdown cleanly. Manually attempting to mount the arrays failed, as did re-assembling the arrays with mdadm –assemble. We also tried fsck and several other tools.

I won’t bore you with all of the steps I took but many tools reported the lack of a superblock or a corrupt superblock on the disks or arrays. After an hour of messing around, Trevor was getting worried. There was a lot of data on those disks. Many of the forum threads were either inconclusive or talking about data recovery… not a great sign. One post said just to keep rebooting and it would fix itself. Then I found this post, the gem that we all look for with that one helpful command near the end of the thread, and the joyful confirmation posts from the lucky souls the OP of solution post helped. The command that fixed things was:

sudo mdadm –examine –scan –config=mdadm.conf >> /etc/mdadm/mdadm.conf

The output of this command contains the uuids of your devices and copies them to your mdadm.conf (assuming it’s in that location). Trevor suspected early on that this might be uuid (or lack of) related. After echoing this output to the end of the mdadm.conf /dev/md0 and /dev/md1 came back and we were able to fsck and mount with no issue, without a reboot.

Thanks to the OP for saving the day. Also thanks to the hard work of the Matt Ekstrom and the HiringSolved sales team, we’re growing fast enough to build a much better crawler infrastructure and hire a real SA! 🙂

Now… back to accounting…