Learn about the technologies behind the Internet with The TCP/IP Guide!
NOTE: Using robot software to mass-download the site degrades the server and is prohibited. See here for more.
Find The PC Guide helpful? Please consider a donation to The PC Guide Tip Jar. Visa/MC/Paypal accepted.
View over 750 of my fine art photos any time for free at DesktopScenes.com!

[ The PC Guide | Systems and Components Reference Guide | Hard Disk Drives | Hard Disk Performance, Quality and Reliability | Redundant Arrays of Inexpensive Disks (RAID) | RAID Concepts and Issues | RAID Reliability Issues ]


If you ask an IT professional to list the reasons why he or she set up a RAID array, one of the answers likely to be mentioned is "increased reliability". They probably don't really mean it though. ;^) As I have implied in many other areas of the site's coverage of RAID, "reliability" is a vague word when it comes to redundant disk arrays. The answer of increased reliability is both true and not true at the same time.

The reliability of an individual component refers to how likely the component is to remain working with a failure being encountered, typically measured over some period of time. The reliability of a component is a combination of factors: general factors related to the design and manufacture of the particular make and model, and specific factors relevant to the way that particular component was built, shipped, installed and maintained.

The reliability of a system is a function of the reliability of its components. The more components you put into a system, the worse the reliability is of the system as a whole. That's the reason why compex machines typically break down more frequently than simple ones. While oversimplified, the number used most often to express the reliability of many components, including hard disks, is mean time between failures (MTBF). If the MTBF values of the components in a system are designated as MTBF1, MTBF2, and so on up MTBFN, the reliability of the system can be calculated as follows:

System MTBF = 1 / ( 1/MTBF1 + 1/MTBF2 + ... + 1/MTBFN )

If the MTBF values of all the components are equal (i.e., MTBF1 = MTBF2 = ... = MTBFN) then the formula simplifies to:

System MTBF = Component MTBF  / N

The implications of this are clear. If you create a RAID array with four drives, each of which has an MTBF figure of 500,000 hours, the MTBF of the array is only 125,000 hours! In fact, it's usually worse than that, because if you are using hardware RAID, you must also include the MTBF of the controller, which without the RAID functionality, wouldn't be needed. For sake of illustration, let's say the MTBF of the controller card is 300,000 hours. The MTBF of the storage subsystem then would be:

System MTBF = 1 / ( 1/MTBF1 + 1/MTBF2 + ... + 1/MTBFN )
= 1 / ( 1/500000 + 1/500000 + 1/500000 + 1/500000 + 1/300000)
= 88,235

So in creating our array, our "reliability" has actually decreased 82%. Is that right? Why then do people bother with RAID at all? Well, that's the other side of the reliability coin. While the reliability of the array hardware goes down, when you include redundancy information through mirroring or parity, you provide fault tolerance, the ability to withstand and recover from a failure. This allows the decreased reliability of the array to allow failures to occur without the array or its data being disrupted, and that's how RAID provides data protection. Fault tolerance is discussed here. The reason that most people say RAID improves reliability is that when they are using the term "reliability" they are including in that the fault tolerance of RAID; they are not really talking about the reliability of the hardware.

What happens if you don't include redundancy? Well, then you have a ticking time-bomb: and that's exactly what striping without parity, RAID 0, is. A striped array without redundancy has substantially lower reliability than a single drive and no fault tolerance. That's why I do not recommend its use unless its performance is absolutely required, and it is supplemented with very thorough backup procedures.

Next: Fault Tolerance

Home  -  Search  -  Topics  -  Up

The PC Guide (http://www.PCGuide.com)
Site Version: 2.2.0 - Version Date: April 17, 2001
Copyright 1997-2004 Charles M. Kozierok. All Rights Reserved.

Not responsible for any loss resulting from the use of this site.
Please read the Site Guide before using this material.
Custom Search