Learn about the technologies behind the Internet with The TCP/IP Guide!|
NOTE: Using robot software to mass-download the site degrades the server and is prohibited. See here for more.
Find The PC Guide helpful? Please consider a donation to The PC Guide Tip Jar. Visa/MC/Paypal accepted.
|View over 750 of my fine art photos any time for free at DesktopScenes.com!|
[ The PC Guide | Systems and Components Reference Guide | Hard Disk Drives | Hard Disk Performance, Quality and Reliability | Redundant Arrays of Inexpensive Disks (RAID) | RAID Configuration and Implementation | Advanced RAID Features ]
If a drive fails in a RAID array that includes redundancy--meaning all of them except RAID 0--it is desirable to get the drive replaced immediately so the array can be returned to normal operation. There are two reasons for this: fault tolerance and performance. If the drive is running in a degraded mode due to a drive failure, until the drive is replaced, most RAID levels will be running with no fault protection at all: a RAID 1 array is reduced to a single drive, and a RAID 3 or RAID 5 array becomes equivalent to a RAID 0 array in terms of fault tolerance. At the same time, the performance of the array will be reduced, sometimes substantially.
An extremely useful RAID feature that helps alleviate this problem is hot swapping, which when properly implemented will let you replace the failed drive immediately without taking down the system. Another approach is through the use of hot spares. Additional drives are attached to the controller and left in a "standby" mode. If a failure occurs, the controller can use the spare drive as a replacement for the bad drive. A very simple concept, and a feature that is supported by most RAID implementations, even many of the inexpensive hardware RAID cards and software RAID solutions. Typically, the only cost is "yet another" hard disk that you have to buy but can't use for storing data. :^)
You may ask though: if I have hot swap capability, why do I need hot spares anyway? I can just replace a drive when it fails, right? That's true, but the main advantage that hot sparing has over hot swapping is that with a controller that supports hot sparing, the rebuild will be automatic. The controller detects that a drive has gone belly up, it disables it, and immediately rebuilds the data onto the hot spare. This is a tremendous advantage for anyone managing many arrays, or for systems that run unattended--do you really want to have to go into the office at 4 am on a rainy Sunday to hot-swap a drive for the benefit of your overseas users?
As features, hot sparing and hot swapping are independent: you can have one, or the other, or both. They will work together, and often are used in that way. However, sparing is particularly important if you don't have hot swap (or warm swap) capability. The reason is that it will let you get the array back into normal operating mode quickly, delaying the time that you will have to shut down the system until when you want to do it. You of course lose the hot sparing capability in the meantime; when the failed drive is replaced, the new drive becomes the new hot spare.
Tip: Hot spares may sit
dormant on a system for months at a time. It's a good idea to periodically test the spare
drive to make sure it is still working properly. Some controllers offer a maintenance utility specifically for this purpose. Some may
automatically test the spares on occasion.
If for whatever reason your RAID setup won't support hot sparing, you can still do the next best thing, which is what I call "cold sparing". :^) This is simple: when you buy the drives for your RAID array, buy one extra drive; keep it in a safe place near the system. If a drive ever fails in the array, you'll have to swap it out, but you won't have to wait for hours or days while you try to locate, order and have delivered a replacement drive. Another good reason to do this is that you will be sure that the drive you are replacing is the exact same as the original ones in the array--some arrays don't like having a drive replaced with anything but another of the exact same type.
Next: Array Expansion