Explanation of Sample Rate in Digital Audio and Breakdown of Misconceptions
Fundamental Concepts of Sampling
Digital sampling is the process of splitting a continuous analog signal into discrete chunks that can be represented by digital bits. This process is carried out by an analog to digital converter (ADC) in the recording process, which takes analog audio input and generates data values for the amplitude at small intervals, and is carried out in reverse when audio is played back by a digital to analog converter (DAC). An ideal digital conversion would match the original continuous signal exactly at the sampled points in time. A signal cannot have an infinite number of digital samples because it would require an infinite amount of disk space. The sample rate represents how many digital samples are taken in a second. These sampled amplitude integer values also cannot be infinitely precise. The bit depth represents how many bits are stored in each digital sample value, which determines how precise these values can be. Bit depth has no effect on the sample rate as they are two distinct measurements. The bit depth determines where each data point is drawn on a vertical scale.
The sample rate depends on the principle of the Nyquist-Shannon Theorem, which states that the highest sample rate needed for accurate reproduction of sound is twice that of the highest frequency audible to humans. Thus, the sample rate is usually set to 44,100 Hz or 44.1 kHz (kilohertz), as it is slightly higher than double the absolute upper range of human hearing, which is around 22,000 Hz when we are born. Most people can only hear up to a frequency somewhere in the 15,000 to 18,000 Hz range when they reach adulthood, so a 44.1 kHz sample rate is already more than sufficient for meeting the Nyquist requirement. 44.1 kHz is the sample rate for standard Compact Disc Digital Audio based on the Red Book specifications. Sample rate is directly correlated with the bandwidth of the signal, which is the range of frequencies that are preserved. Frequencies up to 22.05 kHz are preserved at a 44.1 kHz sample rate, in comparison to an upper limit frequency of 11.025 kHz for a 22.05 kHz sample rate.
The number of samples or data points taken per wave cycle is equal to the frequency over the sample rate. The highest frequency able to be sampled is represented by 2 data points with one at the crest and one at the trough. When a curve is drawn through both points and the equilibrium point, it will create a sine wave. This shows that half of the sample rate, known as the Nyquist frequency, is truly the highest representable frequency because the wave would have to contain more than two data points to draw more sine wave cycles in the same amount of time. Interpolation is the process of simulating data between the points by drawing a line from one sample to the next and curving the line to reconstruct the acoustic vibration pattern using a continuous function. A wave at twice the Nyquist frequency will have one data point per cycle. One data point is not sufficient to interpolate a sine wave pattern, and it would be converted as silence. If a frequency exists higher than the Nyquist frequency that is not a perfect multiple of the sample rate, it will be sampled at irregular points of the wave cycle, producing an effect called aliasing. When these samples are interpolated, they form lower sound waves that create non-linear distortion in the sampled frequency range. Non-linear distortion involves the creation of additional frequencies that do not have a linear relationship to the input frequencies.
The Challenging Sonic Aspects of Preventing Aliasing
It is important to filter out any frequencies, including noise, that exist above the Nyquist frequency before sampling. This is accomplished using an analog low-pass filter, which eliminates frequencies above a set point, called the cutoff frequency. This filter, referred to as an anti-aliasing filter, is usually applied around 20 kHz for audio at the standard 44.1 kHz sample rate.
The ideal filter would act as a wall, completely eliminating frequencies above the cutoff frequency and not affecting the frequencies below at all, but no such analog filter exists. Analog filters do not cut off at a straight edge, but rather they gradually roll off. This roll-off is referred to as the quality factor, or ‘Q’. Higher Q filters have steeper roll-offs but are harder to design. Filtering with an infinite quality factor can only be done with digital filters since it is easier to process bits than to manipulate signals with an analog filter. Digital filters that have an infinitesimal transition band are known as brickwall filters.
The roll-off of the anti-aliasing filter can create audible effects depending on its quality factor. Higher frequencies that are still within the audible range may have phase alterations that cause a smearing effect referred to as pre-ringing in which ghost frequencies exist before transients, and they can also be suppressed due to their involvement in the gradual roll-off.
These effects are usually subtle, but they are still an important consideration when it comes to deciding on an optimal sample rate for recording. The difficult aspect of anti-aliasing is designing a filter that will be transparent and work correctly at the same time. The filter must be effective enough at the Nyquist frequency to start pushing the amplitude below the level of hearing, so there is only a narrow space for the filter to roll off if the human hearing range goes all the way up to 20 kHz.
The Process of Oversampling
Oversampling, the process of sampling input at a higher multiple of the target sample rate before subsequent digital conversion, avoids this problem in ADCs by bumping the Nyquist frequency up far from the hearing range and using two filters, an analog filter with a low quality factor and a digital filter. The roll-off of the analog filter can be extended, making it wider and smoother, and the cutoff frequency can be placed high enough to be out of audible range. A digital filter can be cost-effective, and it doesn’t use as much power. It is also more precise than an analog filter and can be used to filter with a very high quality factor.
In addition to using better filtering, this process also reduces the amplitude of quantization noise, which is caused by rounding errors in sample values. The digital signal to noise ratio after digitization is higher than it would be with the standard sample rate because the bandwidth of the noise is widened and the noise is spread out across the entire sampled range. The reduction in noise is proportional to the sample rate during oversampling.
A linear phase digital anti-aliasing filter is applied to the oversampled input to completely eliminate frequencies above the standard Nyquist frequency, and the sample rate is downsampled to the standard 44.1 kHz sample rate for the final product. This filter creates an unnoticeably short delay of all frequencies by the same amount so that phase coherence, the consistency of the phase difference, is not affected by the filter.
Filtering out Images
In the same way that an anti-aliasing filter cuts ultrasonic frequencies in an analog signal, an anti-imaging filter cuts ultrasonic frequencies in a digital signal. Strong high frequencies would be caused by the stairstep waveform drawn out by the sample values if it was not interpolated before playback. This low-pass analog filter, also known as the reconstruction filter, recreates the analog signal by using smooth interpolation as described in the Whittaker-Shannon interpolation formula.
An oversampling process similar to the one described above for analog to digital conversion happens in reverse in most DACs. Just as oversampling allowed a wider anti-aliasing filter, it also permits a wider reconstruction filter. The output is usually sampled at a multiple of 2,4, or 8 of the target sample rate before it is downsampled for playback.
The key aspect of oversampling is that the sample rate of the final input for recording and the final output for playback is not any higher than the standard sample rate. The sample rate is only higher in the transition process when the analog audio is sampled or when the digital audio is being converted back into an analog signal. Recorded material is stored at the standard sample rate. Oversampling provides the benefits of clearer high frequencies and less noise in both input and output. All of the audible frequencies are untouched by both the anti-aliasing and anti-imaging filters, and the quantization noise is lowered.
Oversampling in Digital Plugin Processing
Oversampling can also provide benefits in plugins for digital audio workstation software. Digital plugins that use oversampling process the audio in the same way as discussed above without the analog interpolation step. The oversampling happens internally within the plugins, which sample the input signal at some multiple of its original sample rate, and the audio does not have to be oversampled in the entire DAW.
Some of the upper harmonics created by distortion plugins are above the Nyquist limit, so they are aliased when they are processed at the standard 44.1 kHz sample rate. This aliasing is not harmonically related to the dry signal, so it is noticeable as a sharp, harsh sound. With oversampling, these harmonics are digitally filtered out before downsampling.
Equalizers map an infinite range of frequencies down to the frequency range within the Nyquist limit, so the filtering of frequencies near the top has a warped response. A low-pass filter set to 18 kHz with a 6dB per octave quality factor will completely eliminate frequencies at 22 kHz when it should only be slightly attenuating them. Oversampling fixes this issue of asymmetry by providing a much bigger frequency map that easily covers the human range and much higher into the ultrasonic range.
Sometimes interpolation causes analog audio to go over the digital upper level limit in between samples. Inter-sample peaks created by limiter plugins can be reduced by using oversampling. Limiter plugins that use oversampling are sometimes called True Peak limiters.
The problem of using oversampling is that it doubles the amount of processing that the CPU has to perform for the same amount of plugins. If the usage becomes too high, the latency buffer will need to be increased to avoid clicks and pops. The sound quality depends on the accuracy of the upsampling and downsampling processes in oversampling. Also, if you want all of the plugins to use a higher sample rate, not just the ones that support oversampling, you will have to upsample the entire project. Running the project at a higher sample rate can cause a much higher usage of the CPU, and the accuracy of the upsampling determines whether oversampling is worth it for mixing. Setting your project to use 88.2 or 96 kHz will probably make everything sound better, but it is simply because the plugins are processing at a higher sample rate. This could be one cause of the myth that audio at a higher sample rate sounds better.
High Sample Rate Audio
High sample rate playback above 48 kHz does not seem to have any benefits. Files that are sampled at high sample rates are sometimes referred to as “high-definition audio.” This is different from oversampling because oversampling is an internal process designed to fix certain issues in converter devices and mixing software without changing the final sample rate.
High sample rate playback is based on the idea that hearing ultrasonic frequencies will result in a better sound. Most equipment will not be able to play frequencies above 22 kHz, but equipment that does will playback ultrasonic frequencies if they exist in the high sample rate files, and these create high amounts of intermodulation distortion. Intermodulation distortion is a sideband effect that creates additional frequencies due to amplitude modulation between two other frequencies. This modulation is caused specifically by frequencies that are affected by the non-linear response of the system. Audio systems usually are slightly affected by non-linear response, but not enough for audible intermodulation. Non-linear response is much worse in the higher frequencies that are out of the audible spectrum. If ultrasonic content exists in the audio, it will create audible frequencies that are not harmonically related to the signal, and the modulation will be significantly stronger than the standard amount for audible frequencies. This is an aliasing effect that might not be obvious, but it affects the sound so that is not as accurate as it would be if it were played at a sample rate that is more attuned to our hearing range. Intermodulation distortion is not a characteristic of the original audio, but it is the effect caused when the equipment plays back the ultrasonic frequencies. Our ears cannot hear these frequencies anyways, so the existence of them in the original audio only messes up what we can hear.
Positive audible effects from using higher sample rates might be caused by the widened anti-aliasing filter that is also used in oversampling. It is better to oversample since it returns the audio to the standard sample rate after the oversampling process, but higher sample rates retain the negative effect of intermodulation distortion. One way of determining whether the ultrasonic frequencies make a positive difference is to downsample the audio. In theory, if the ultrasonic frequencies improve the sound, the downsampled audio should sound worse. However, the downsampled audio sounds the same, except for the fact that the audio equipment will not produce the intermodulation distortion.
Effects on the Audiophile Consumer
The debate over sample rates in playback and storage never seems to stop because there is misinformation based on pseudoscience in the discussion. Audio professionals show evidence that excessively high sample rates cause more harm than good, but audio equipment companies continue to market the fact that their devices can produce up to 192,000 samples per second. It becomes a numbers game where better specifications must mean that the equipment will produce higher quality sound, according to the manufacturers.
They claim that a higher sample rate will result in “smoother” sound and provide a more accurate stereo soundstage, but all it does is store frequencies that we can’t hear. They are probably referring to the phase coherence that oversampling provides, but it is important to note that there is no reason to store audio at such a high sample rate because it won’t allow us to hear any more frequencies than we can already hear. Frequencies can be interpolated correctly as long as they have at least two data points.
Sometimes the device only supports up to a sample rate of 192 kHz because they want to provide the option for audio files at that rate to be played back properly. However, humans do not get anything out of listening to these files, and the process of reproducing the sound alters the audible sound in a negative way. Designing a device to record or playback audio at 192 kHz is justifying the use of digital audio at that sample rate, so the misinformation continues. Higher sample rates mean that the processing in the device has to be more efficient, and design compromises must be made that might increase noise and decrease the accuracy of each sample. Oversampling is confused with “high-definition audio,” so the benefits of oversampling in converters and software are falsely attributed to high sample rates.
Another factor in the push for high sample rate audio is the backlash against music with lossy compression. Lossy compression takes digital audio and removes parts of it to save space. Lossy compression uses perceptual algorithms to take out what is supposedly not heard, but high amounts of lossy compression can have audible effects. Many audio files are distributed as mp3 files with low to medium compression, creating a stigma of digital audio from a quality perspective. Lossless compression is also available, and it rearranges the data to save space without affecting the actual waveform representation. Lossless formats are common for storing audio at full quality without taking up as much space. Audio stored in lossless formats are full recreations of audio sampled at the standard sample rate, and they contain all of the information necessary for us to hear. Many people are not aware of the existence of these lossless formats, and they may correlate mp3-encoded audio to all digital audio. One problem is that online music stores almost all sell their music in lossy formats, but this is a separate issue from sample rate.
Misconceptions about digital audio can cause people who love music to buy expensive equipment that may only be expensive because it accounts for high sample rate recording or playback. Education about digital audio is important for both the companies who have to design the products and the consumers who care about the quality of audio.