Thursday, December 04, 2008

Recover a failing disk using a Linux Live CD

I had a friends disk fail the other day and this is how I attempted to recover it.

First the situation, a Windows XP machine that was having trouble booting. I eventually got it to boot but shortly after that the disk in the machine started giving random read/write errors. So here is how I tried to recover it.

Tools:
Fedora Live CD (I used Fedora 8 cause that is what I had)
A Fedora Linux based file server with lots of diskspace or another hard drive.

My file server is setup as a samba server, so a Windows Server could be used in place if you have one.

  • Boot the machine using the Live CD
  • Switch to running as root
  • Install dd_rescue and samba-client (on fedora the command is 'yum install dd_rescue samba-client')
Note: if you have to reboot, you will have to install these tools every time, unless you are using a Live CD on a USB Flash drive.

  • Mount the file server using cifs (this is the Samba/Windows Server protocol)
Create the local mount point
mkdir /media/public

Mount the fileserver
mount -t cifs //[hostname or ip]/[share] [local path] -o username=[id]
e.g. mount -t cifs //192.168.0.210/public /media/public -o username=anyone
It will ask for a password, and you should give the password for the anyone account.

If you are using another disk you can just mount it. The device name will probably be /dev/sdb1, but check dmesg to be sure.

mount /dev/sdb1 /media/public

  • Use dd_rescue to copy the partition to a file on the file server
dd_rescue -A /dev/sda1 /media/public/disk.img
This may take several hours or even days and will take up large amounts of diskspace. In my case it was 2 days and 40GB.

  • Once the rescue is complete you can mount the disk on the server with a loop mount or copy it back to a new disk

Loop method (requires Linux)
Create the local mount point
mkdir /mnt/disk

Mount the image using a loop device
mount -o loop -t ntfs-3g [path to disk.img] [mount point]
e.g. mount -o loop -t ntfs-3g /data/public/disk.img /mnt/disk

Copy the needed files from the /mnt/disk directory.

If the disk image is not mounted, you might be able to boot the image using qemu with the following command:
qemu -m 512 -hda /data/public/disk.img

Copy method
(reboot back into live CD with the new disk installed and partitioned, install the tools,mount the server/disk again)
dd_rescue -A /media/public/disk.img /dev/sda1

If the new disk is larger than the old disk (it cannot be smaller) then you may need to use parted or Partition Magic to resize the partition. Also, you may need to run 'fixboot' and 'fixmbr' using the Windows Install CD in rescue mode to get the new disk to boot. I would recommend doing a clean install and just copying the data files back since the OS and applications may have been damaged by the disk errors.

Hopefully this is useful, but remember that depending on the level of disk failure it may not work.

1 comment:

Tony said...

If you are dealing with any ide drive that is failing, there is a tool out there that is worth 10x the price - Spinrite by Steve Gibson (www.grc.com) wrote the best and most effective tool. I've been an IT professional for a very long time and if spinrite can't fix the drive, its time for a recovery service. I don't work at GRC - I just use the product.

I've used the live cd method as described in the post - however without fixing the issues, the copy method may not get all the data on the drive. So I would suggest that if you are faced with this problem, try to fix the drive so that you can copy off the data through the method of your choice, such as the one in this post.

Tony