Learn about the technologies behind the Internet with The TCP/IP Guide!|
NOTE: Using robot software to mass-download the site degrades the server and is prohibited. See here for more.
Find The PC Guide helpful? Please consider a donation to The PC Guide Tip Jar. Visa/MC/Paypal accepted.
|View over 750 of my fine art photos any time for free at DesktopScenes.com!|
[ The PC Guide | Systems and Components Reference Guide | Hard Disk Drives | Hard Disk Performance, Quality and Reliability | Hard Disk Quality and Reliability | Hard Disk Quality and Reliability Specifications ]
Mean Time Between Failures (MTBF)
The most common specification related to drive reliability is mean time between failures or MTBF. This value, usually measured in hours, is meant to represent the average amount of time that will pass between random failures on a drive of a given type. It is usually in the range of 300,000 to 1,200,000 hours for modern drives today (with the range increasing every few years) and is specified for almost every drive.
This number is very often misinterpreted and misused. Usually, the "analysis" goes like this: "Gee, a year contains 8,766 hours. That means my 500,000 MTBF drive should last 57 years." (I have even seen this on the web site of a major hard disk manufacturer that shall remain nameless to spare them the embarrassment!) After concluding that the MTBF means the drive will last for decades, amusingly, one of two opposite things usually happens: either the person actually thinks the drive will last half a century or longer, or the opposite: they realize this is crazy and so they write off the entire MTBF figure as "obvious exaggeration and therefore useless". The real answer of course is neither. (It is obviously impossible for any individual hard disk to be tested to anywhere near the amount of time required to provide a MTBF factor near even 100,000, never mind 500,000.)
To be interpreted properly, the MTBF figure is intended to be used in conjunction with the useful service life of the drive, the typical amount of time before the drive enters the period where failures due to component wear-out increase. MTBF only applies to the aggregate analysis of large numbers of drives; it says nothing about a particular unit. If the MTBF of a model is 500,000 hours and the service life is five years, this means that a drive of that type is supposed to last for five years, and that of a large group of drives operating within this timeframe, on average they will accumulate 500,000 of total run time (amongst all the drives) before the first failure of any drive. Or, you can think of it this way: if you used one of these drives and replaced it every five years with another identical one, in theory it should last 57 years before failing, on average (though I somehow doubt we'll be using 10 to 100 GB spinning-platter hard disk drives in the year 2057. :^) )
There are in fact two different types of MTBF figures. When a manufacturer is introducing a new drive to the market, it obviously has not been in use in the real world, so they have no data on how the drive will perform. Still, they can't just shrug and say "who knows?", because many customers want to know what the reliability of the drive is likely to be. To this end, the companies calculate what is called a theoretical MTBF figure. This number is based primarily upon the analysis of historical data; for example: the historical failure rate of other drives similar to the one being placed on the market, and the failure rate of the components used in the new model. It's important to realize that these MTBF figures are estimates based on a theoretical model of reality, and thus are limited by the constraints of that model. There are typically assumptions made for the MTBF figure to be valid: the drive must be properly installed, it must be operating within allowable environmental limits, and so on. Theoretical MTBF figures also cannot typically account for "random" or unusual conditions such as a temporary quality problem during manufacturing a particular lot of a specific type of drive.
After a particular model of drive has been in the market for a while, say a year, the actual failures of the drive can be analyzed and a calculation made to determine the drive's operational MTBF. This figure is derived by analyzing field returns for a drive model and comparing them to the installed base for the model and how long the average drive in the field has been running. Operational MTBFs are typically lower than theoretical MTBFs because they include some "human element" and "unforeseeable" problems not accounted for in theoretical MTBF. Despite being arguably more accurate, operational MTBF is rarely discussed as a reliability specification because most manufacturers don't provide it as a specification, and because most people only look at the MTBFs of new drives--for which operational figures are not yet available.
The key point to remember when looking at any MTBF figure is that it is meant to be an average, based on testing done on many hard disks over a smaller period of time. Despite the theoretical numbers sometimes seeming artificially high, they do have value when put in proper perspective; a drive with a much higher MTBF figure is probably going to be more reliable than one with a much lower figure. As with most specifications, small differences don't account for much; given that these are theoretical numbers anyway, 350,000 is not much different than 300,000.
Overall, MTBF is what I consider a "reasonably interesting" reliability statistic--not something totally useless, but definitely something to be taken with a grain of salt. I personally view the drive's warranty length and stated service life to be more indicative of what the manufacturer really thinks of the drive. I personally would rather buy a hard disk with a stated service life of five years and a warranty of three years, than one with a service life of three years and warranty of two years, even if the former has an MTBF of 300,000 hours and the latter one of 500,000 hours.
In the real world, the actual amount of time between failures will depend on many factors, including the operating conditions of the drive and how it is used; this section discusses component life. Ultimately, however, luck is also a factor, so keep those backups current.
Next: Service Life