I have been hoping to write a really geeky post for a while about the different audio formats available, because for the most part we tend to just stick to what we know without really thinking about why we use to a particular format. This is especially true with something as widespread as iTunes that already comes with a default audio encoder designed by Apple. The reminder to sit down and write it appeared when I came across this article via lifehacker.
The formats can be split into two categories, lossless and lossy. CD quality audio (which is entirely uncompressed, and also known as PCM) is lossless, as it contains all the original data describing the audio. There are methods of compression that result in a smaller file size but without any loss at all to the original audio. The basic way these formats work is to remove duplicated data, and replace it with a marker recognised by the audio player to repeat the identical data from elsewhere. Ultimately, all you are left with is the data which describes the changes in the original audio, and this is known as ‘Entropy Coding’. Incidentally, it is possible to apply a lossless encoding scheme to a piece of audio and end up with a larger file size than the original, so codecs have to be carefully engineered to avoid this.
Text compression (as in a .zip file etc) is a good way to explain how lossless coding works, as it is crucial to restore the document to exactly its original state before compression was applied. Clearly if you uncompress a text document and it has words missing, the compression is completely useless, as the meaning text may have changed and is irrecoverable.
‘Lossy’ audio codecs do not allow the audio to be restored to its original condition. They work by removing frequencies that the human ear is unlikely to perceive because they are already masked by other frequencies. Imagine you hear a noisy bus drive past, you can appreciate how you might not hear a much quieter car immediately next to the bus, even if you know it’s there. This is because the loud rumbling of the bus covers up the softer noise of the car. The same is true for much more specific frequencies, a loud frequency can ‘mask’ a quieter frequency within a short period of time and small frequency range. Interestingly if the two frequencies are close enough together in pitch, one can mask the other even if they reach the ear at different times. The masker can come be before or after the masked frequency, this is known as temporal masking and is dependant on the volume and pitch of both frequencies. More info on the oddities of auditory masking can be found here if you’re really that interested.
Lossy codecs take advantage of this effect and predict which of the available frequencies at any one time you are unlikely to hear, and removes them from the audio.
A good encoder will do this without compromising the audible quality of the audio, so long as it is given a high enough bit-rate. Clearly, the lower the bit-rate is then the more information the encoder has to remove to meet that bit-rate, which results in more audible quality loss.
Many independent listening tests have been done on how good various lossy encoders are, and at what bit-rate the quality of audio is audibly reduced.
As a rough guide, we recommend that using Apple’s AAC encoder (one of the best and most widely available) at 256kbps is your best bet. It is a reasonably low bit-rate, so will give you plenty of room on your iPod, and is of a quality that the average user would not notice any quality degradation from the original audio. If you’re a bit more of an audiophile then go with a lossless encoder and save yourself the space required to keep PCM audio on your PC.
[image via Michael Dales]