Recently I was confronted with a Buffalo LinkStation which had a failed RAID0. The data on it was important, and the customer did not have any current backups. The were a lot of huge red flag warning signs that seemed to suggest a disk possibly going bad, but the the guy at the customers location that doubled as IT happily ignored those. Buffalos willingness to support this problem extended to replacing bad hard disks, since all data on a failed RAID0 is considered irrecoverably lost by them. That was not entirely unexpected to hear of a tier one support worker though.
I did a little digging and found out, that those LinkStations use some fairly common tools under the hood. So I agreed to have a look, but made it clear that I might not be able to get anything back. The customer wanted me to try anyway and I got a good amount of the data back. The Article describes how.
Things that should be considered before you attempt to restore the data yourself
- Make sure that there is nothing that writes to the disk from which you want to recover data. If you are recovering data from RAID volumes, make sure that there are no writes to any of the disks in the volume. Even if your data recovery is just about a simple deleted file, you will want to avoid writes to the disk. This is because any write cloud potentially overwrite recoverable data. Since some file systems write to the disk even when files are read, you should avoid mounting the disk directly or mount the disk as read only.
If the hard drive is still in the Computer/NAS it is usually in, shut that system down or do not boot it the normal way. There may be programs or services that write to the affected drive. Starting that computer from a Live CD is usually OK though.
- If it is clear that there is a serious mechanical or electronic defect ,often identified by problem descriptions such as: “sounds funny” or “smells funny”, ask the user to consider a data recovery lab before you start working on it yourself. Anything you do in such a situation could negatively effect the results of the lab.
- Always image or clone the disk you are working on and only use the clone going forward. If you suspect that you wont be able to read the original disk a 2nd time, only work with copies of your image/clone. If your recovery gets somewhat more complex it is easy to do something irreversible that will negatively effect your chances of getting the data back.
Even if creating the images takes a lot of time, just let it run and do something else meanwhile. Having a means to go back to to the original state is just basic cover your ass.
I got the entire NAS-Box delivered. It was a Buffalo LinkStation DUO. It still booted (In retrospect I should have cloned the disks before trying to boot) up ok and the RAID was shown as normal, all the shares were listed when I accessed it over SMB, but I could not open any of the shares. The web interface showed the RAID to be OK. But I did not see any values for total storage and free space, which these LinkStations usually show prominently in the Web GUI. The disk settings showed both disks in the list, but the only information shown about them was they product number. This was very odd.
The disks were from Seagate so pulled them and put them in the USB dock on my desktop. I ran the Seatools and Crystal Disk Info to display the SMART values. One of the disks was fine, the other had almost 4000 reallocated sectors and over 700 sectors that were defective but could not be reallocated. The problem was fairly obvious….
Cloning the disks with dd
Next I put those disks (one at a time) and two larger disks (also one at a time) into a Linux machine and cloned the original disks to the other disk I put in with it. The linux tool dd does a great job at copying hard drives at block level and it is my go to tool if I need a block for block image from a disk. Especially if I need to work with that Image.
The dd command is fairly simple and short, but it will run quite some time (especially when there are bad sectors on one of the hard disks) and only give you an output if it encounters an error. The following example is assuming that your source disk (the ones from the NAS) is /dev/sdb and your target disk (the one you put in with it) is /dev/sdd:
Be absolutely certain that you selected the correct output before sending the command!
dd if=/dev/sdb of=/dev/sdd conv=noerror,sync bs=512
The options have the following meaning:
- if: the input for dd. In this case the entire disk sdb.
- of: the output for dd, this can be a block device or a file. Here the target is sdd. Be absolutely sure that you selected the correct output!
- noerror: tells dd to continue even if errors are encountered
- sync: tells dd to fill the block with zeros if an error is encountered, this means that the size of the original will be kept. Otherwise a block that could not be read will be skipped in the output and next readable block will be written directly after the last readable block. This would mess with file allocation tables.
- bs: This specifies the block size. If the disk has bad sectors try to make this match the size of the disks sectors (in this case 512 byte) otherwise pick a larger block size (64k is a good value) to speed up the process. If you pick a larger block size on a faulty drive, the entire block will be written as zeros when a sector can’t be read.
Since the cloning process can take a lot of time and only gives you any output when it encounters an error, you might want to get a status from time to time. To do this open a second terminal and find out the process id of the running dd process with following command:
pgrep -l '^dd'
Once you know the Process ID you can run following command to produce an “ERROR” message in the first terminal with the current status of the dd (substitute the 38862 for the process id you got in the previous step):
kill -USR1 38862
Getting the RAID back online
After the cloning was done, I stored the source disks safely and made sure both clones were in my Linux comupter. The disks were using gpt and had 6 partitions each. The last partition was clearly the one with the data stored. The other 5 partitions seemed to be system and swap partitions for the NAS. The system partitions were set up as a RAID1. That seems sensible and I still have no idea why RAID0 is the default for the data partition.
After looking at the general structure of the disks and partitions, I had a closer look at the data partitions. I first tried to simply add both partitions to a new RAID. This will recognize existing RAID superblocks on the partition and recreate the an already existing RAID:
mdadm -A /dev/md123 /dev/sdb6 mdadm -A /dev/md123 /dev/sdd6
If this works for you, you can continue reading here.
But unfortunately there was no mdadm superblock found on the clone of the defective drive. I tried to find it using a bunch different tools to see if I could find the superblock, but I failed horribly.
So I decided to continue using more drastic measures (read this as potentially destructive) and recreate the RAID by force. But first I needed more Iiformation on what settings the RAID was created with. For that I simply read out the RAID superblock on the data partition of the good disk:
mdadm --examine /dev/sdb6
The output looked similar to this (I got this output when I ran the examine on one of the cloned drives in a USB dock a few weeks later):
/dev/sdf6: Magic : a92b4efc Version : 0.90.00 UUID : 73bfe225:b30b764b:faf5b20a:9ac09813 (local to host skelsrv) Creation Time : Thu Mar 20 23:54:11 2014 Raid Level : raid0 Raid Devices : 2 Total Devices : 2 Preferred Minor : 2 Update Time : Thu Mar 20 23:54:11 2014 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : c04d5a9 - correct Events : 1 Chunk Size : 64K Number Major Minor RaidDevice State this 1 8 22 1 active sync 0 0 8 54 0 active sync 1 1 8 22 1 active sync
even though its not exactly the output I had originally, it still shows all the relevant information:
- Version: the metadata Version of the RAID, it is important that this matches if you recreate the RAID.
- RAID Level: In case you don’t already know what RAID Level was used
- Total Devices: In this case it was clear from the beginning, but with 4 Bay NAS Boxes, the RAID might not use all the Devices
- Chunk Size: This one is also very important to get right
- The Order of the disks: You should specify the disks in the correct order in the create statement. In case of the output above, the disk is the second drive in the array.
Once you have all that information, you can create the RAID again:
mdadm --create /dev/md123 --assume-clean --level=0 --verbose --chunk=64 --raid-devices=2 --metadata=0.90 /dev/sdd6 /dev/sdb6
Once the RAID is recreated you will have to mount the partition to see if you can get any data back. I strongly recommend mounting the partition as read only (for this to work you need to have xfs support, on Debian that is provided by the xfsprogs packet):
mkdir /mnt/md123 mount -o ro -t xfs /dev/md123 /mnt/md123
If this works, great. If your disaster is as bad as the one I was dealing with, you might get an error message along the lines of: the partition can’t be mounted because of log file corruption. I was not dissuaded and decided to ditch those pesky log files and all the files that had not been completely written to the NAS at the time of the crash with them. After all I was pretty far already and getting back some or even most files seemed to be better than getting none of them.But before you try this be aware that this form of repair pretty much drops the existing log files and can cause further corruption to the file system.:
xfs_repair -L /dev/md2 xfs_check /dev/md2 mount -o ro -t xfs /dev/md2 /mnt/md2
This last mount command worked for me and I could copy files off the newly mounted RAID. But as pretty much expected some of the files were corrupt. (You could copy them but not open them with their respective programs or you could open them and the content was garbage). Lucky for me I don’t have to figure out which files got corrupted, because that seems like a real pain in the ass.
Anyway once you retrieved the data, make sure to stress the importance of Backups before handing out the restored files.
If you do happen to have a convenient method to check over 700000 files of different formats (MS Office, PDF, single saved E-Mail messages from outlook, jpg files, Autocad files and a bunch of others) please share. But even if you don’t I hope that this article could help you.