Sunday, August 21, 2005

Redundancy is a Good Thing

Friday was a bit of a bad day; as I walked into the office at 8:30 on Friday morning, the first words that greeted me were, ‘the file server is down.’ Sure enough our aging main server was flourishing the Blue Screen of Death (which it has done before on occasion) but this time rebooting it didn’t make the problem go away. Nope; it got most of the way through loading Windows and then blue-screened again, complaining that it couldn’t find the disk where Windows was located. Go figure.

Fortunately the problem was only with the system drive, not the array of disks that holds the data, and we had a new system unit in testing, due to replace the one that failed in a week or so. It wasn’t a big job to get it ready to go live a little bit early and before too long we had a functioning file server again.

One of the projects I am working on at the moment is replacing most of our key server hardware and this is a case in point. When the current systems were built some years ago, no redundancy was incorporated. Consequently if a disk drive fails, the server fails. If a power supply fails, the server fails. One of my tasks since I’ve joined the company has been to increase the resilience of the core infrastructure like this, so all of the new servers have lots of redundancy built-in.

In future, if a disk drive fails, rather than bringing down the server there will be a second disk drive which mirrors the contents of the first and carries on when the first drive fails. The IT team get an email telling them of the failure, pick a new drive from stock (another new innovation), and replace the failed unit. All the while the server is up and running. When you replace the failed drive the server detects the new disk and, in the background, copies the contents of the working disk on to it, so once again you have two disks working in parallel in case one of them fails.

The users never even know there’s been a problem.

That's how we like it!

No comments: