Studying for the A+, Network+ or Security+ exams? Get over 2,600 pages of FREE study guides at CertiGuide.com!|
Join the PC homebuilding revolution! Read the all-new, FREE 200-page online guide: How to Build Your Own PC!
NOTE: Using robot software to mass-download the site degrades the server and is prohibited. See here for more.
Find The PC Guide helpful? Please consider a donation to The PC Guide Tip Jar. Visa/MC/Paypal accepted.
|View over 750 of my fine art photos any time for free at DesktopScenes.com!|
[ The PC Guide | Systems and Components Reference Guide | Hard Disk Drives | Hard Disk Performance, Quality and Reliability | Redundant Arrays of Inexpensive Disks (RAID) | RAID Configuration and Implementation | Advanced RAID Features ]
In the "good old days" of RAID, fault tolerance was provided through redundancy, but there was a problem when it came to availability: what do you do if a drive fails in a system that runs 24 hours a day, 7 days a week? Or even in a system that runs 12 hours a day but has a drive go bad first thing in the morning? The redundancy would let the array continue to function, but in a degraded state. The hard disks were installed deep inside the server case, and this required the case to be opened to access the failed drive and replace it. Furthermore, the other drives in the array that continued to run despite the failure, would have to be powered off, interrupting all users of the system anyway. Surely there had to be a better way, and of course, there is.
An important feature that allows availability to remain high when hardware fails and must be replaced is drive swapping. Now strictly speaking, the term "drive swapping" simply refers to changing one drive for another, and of course that can be done on any system (unless nobody can find a screwdriver! :^) ) What is usually meant by this term though is hot swapping, which means changing a hard disk in a system without having to turn off the power and open up the system case. In a system that supports hot swap, you can easily remove a failed drive, replace it with a new one and have the system rebuild the replaced drive immediately. The users of the system don't even know that the change has occurred.
Unfortunately, "hot swap" is another one of those terms that is used in a non-standard way by many, frequently leading to confusion. In fact, there are a hierarchy of different swap "temperatures" that properly describe the state of the system at the time a drive is swapped:
It is common for a system to be described as capable of hot swapping when it really is only doing warm swaps. True hot swapping requires support from all of the components in the system: the RAID controller, the bus (usually SCSI), the enclosure (which must have open bays for the drives so they can be accessed from the front of the case), and the interface. It requires special connectors on the drives that are designed to ensure that the ground connections between the drive and the bus are maintained at any time that the device has power. This means that when removing a device, the power connection has to be broken before the ground connection, and when re-inserting a device, the ground connection has to be made before the power connection is re-established. This is typically done by designing the connectors so that the ground connector pins are a bit longer than the other pins. This design is in fact used by SCSI SCA, the most common interface used by hot-swappable RAID arrays. See this discussion of SCA for more, as well as this discussion of drive enclosures.
As mentioned above, the SCA method on SCSI is most commonly used for hot-swappable arrays. In the IDE/ATA world, the best you can usually do is warm swapping using drive trays, which "convert" regular IDE/ATA drives to a form similar in concept to how SCA works, though not quite the same. This is still pretty good, but not really hot swapping. The system usually needs to be halted before you remove the drives.
A system that cannot do hot swapping, or even warm swapping, will benefit from the use of hot spares. If your system can only cold swap, you will at some point have to take it down to change failed hardware. But if you have hot spares, you can restore the array to full functionality immediately, and thus delay shutting the system down to a more convenient time, like 3:00 am (heh, I meant more convenient for the users, not you, the lucky administrator. :^) ) In fact, hot sparing is a useful feature even if you have hot swap capability; read more about it here.
Next: Hot Spares