Error Correcting Code (ECC) and Data Recovery

Most people know that compact disk audio is remarkably resilient to data loss; bits of dust or dirt on the surface of the disk, or even small scratches, will often not impede the performance of the CD player at all, and usually not even a CD-ROM. There are in fact several techniques that are used to create this robustness.

The first and most important is the use of error correcting code or ECC. ECC is a special data encoding protocol that uses a combination of redundant information and special data positioning, to make it possible to detect and recover from missing bits of data. Through special algorithms, the CD player can reconstruct missing data from the ECC information and allow the disk to play right through most errors seamlessly. ECC is described in more detail in the section on hard disks.

When ECC fails to provide the correct information, there is another tool available: interpolation. This technique is used to fill in missing bits of data on audio CDs. Let's suppose that you have a series of values that is read from a CD audio track, which go like this: 400, 525, 650, 825, 1100. Now let's suppose that due to an error, the fourth value cannot be read, so we in fact see 400, 525, 650, ___, 1100. Using linear interpolation, we can estimate the missing value as half-way between 650 and 1100, or 875. This isn't the correct value, but it's close enough for audio. Since it is only one value of thousands being played to the ear each second, no human can possibly detect the mistake. In fact, humans won't notice much larger quantities of missing numbers.

With computer data on a CD-ROM however, interpolation is useless; there is no way to make a guess at the correct value in a series of program bytes. An error of even one single bit in a 1 MB file can be the difference between an application that works properly and one that wipes out data on your hard drive. (Seriously. It would be really bad luck but it is possible.) This was a big problem when the original CD-ROM formats were developed.

For this reason, data formats on CD employ additional ECC data. In addition to the ECC defined at the bit level as part of the "red book" audio CD format, data CDs devote over 10% of the theoretical capacity of the disk to additional error detection and correction codes at the byte level. For each 2,048 bytes of data, 280 bytes are used for error detection and correction codes. This sacrifice in capacity results in extremely high reliability for modern data CD-ROM media; amazingly large errors can be recovered due to this redundancy.

