Server Rebuild and Disaster Recovery

This week’s topic is “What I learned during a recent Server Rebuild and Disaster Recovery” session, in which a RAID system failed catastrophically on a Windows SBS 2003 server. Anyone who knows about SBS knows that most people who have it have only 1 server, and so for this company the business was largely offline until the repair was complete. So what did we learn from the experience?

Let’s start with – don’t try to go on holiday, things will break.

Once over that fact, here’s what else I learned:

  • In Acronis Backup and Recovery, a Files backup does not include information about your disk partitions
  • Acronis does not seem to happy about restoring from a NAS share, and takes an age (even locally attached) to list all of the files in the archive before you can select which ones you don’t want to restore. Crazy huh?
  • What I SHOULD have done to recover using the files backup was what I’d call a traditional recovery; New HDD, install Windows, install your backup software, boot into DS Restore mode, then recover your files and reboot.
  • What I DID was closer to how a disk/image backup is supposed to work; put in a new HDD, run recovery, all disk partition information is restored. Actually I ended up installing Windows to get a good boot-sector, then restoring to an alternate folder, then booting into a WinPE environment, and moving the c:\alternate\Windows folder back to c:\Windows – reboot and hope for the best.
  • Despite starting the mailbox database OK, Exchange wasn’t too happy. I had to ‘reinstall’ by using Add/Remove, and pressing ‘change’ on Windows Small Business Server 2003. The Exchange component had a ‘reinstall’ option.
  • I then had to reload SP2
  • Run through this procedure, http://support.microsoft.com/kb/935916 and THEN install SP2
  • A couple of other things weren’t restored, notably WSUS, because they weren’t backed up. In this case the server failure prompted the customer to buy a new server, so as I’ll be fitting that in the coming weeks I didn’t bother to resolve these issues.
  • You can never have too many backups.

Now – why would I have only done a files-backup in Acronis – the leading Imaging software? BECAUSE of Exchange, and as we all know if your backup software isn’t Exchange aware it doesn’t delete your log-files, and lots of logfiles make for a full-up server disk. So the ‘answer’ that I found somewhere was to use a files backup, and this had been working fine and possibly would have been fine to restore IF:-

  • The backup was stored/accessible via USB, or whatever other network-storage-server setup Acronis have internally (requiring another Windows server I’m sure)
  • I’d realised the difficulty I would have had restoring just the files onto a blank drive
  • I’d considered the files backup to be more like a BKF backup than an image backup, requiring a working Windows for you to restore into.

So what’s the ultimate lesson here?

Have a plan – and TEST IT.

I guess that goes for any emergency situation in life – you’ve got to know the implications and pitfalls of your plan before they become an issue. Had we known the restore was going to take exactly 2.5 days, at least we would have known and a decision made about appropriate options vs risk.

Live and learn.