Digital Signal Processing

A brief overview of audio and signal theory.

6 min readSep 30, 2021

By: Jordan Sansing, Sr. Collaboration Engineer at Aston Technologies

This article will cover a high-level view of what audio is and how signal theory and physics have established a century of ongoing technological advancements in sonic capabilities.

There’s a significant portion of mathematical transforms and theory involved in this subject that will conveniently be glossed over to keep the scope broad and approachable. I’m not a physicist by any stretch of the imagination; I’ll leave those papers to the careered professionals and instead talk at a level I’m more familiar with.

Breaking Down the Basics

MIT Research

In 2014, MIT published an article, called Extracting Audio From Visual Information, in which MIT in collaboration with Apple, Microsoft, and Adobe was able to successfully reconstruct audio using nothing more than a high-speed camera filming a potato chip bag.

I’d like to use this study as a pivot point for defining analog audio as the principal theory supporting the feasibility of this study, similar to how the human ear provides proper information to the brain to be interpreted as sound.

Sound is defined as an acoustic wave propagating through a medium, such as air, that produces a measurable acoustic pressure through our atmosphere. When these mechanical waves meet the diaphragm in your ear, the minute vibrations become information your brain processes into a psychoacoustic response. The MIT study was successful in replicating this phenomenon by emulating the aforementioned biological response with digital equipment.

Defining and Demonstrating Analog

Can you synthesize nature?

Analog is tricky to define, as the signals are considered continuous and in the realms of infinities, in such a way that the representations of the measurable signals are analogous to their naturally occurring state. During Aston Technologies’ Tech Talks Event last August, I played a piece of audio I previously synthesized to persuade the audience they were listening to a field recording taken from a location of the image I displayed.

If you investigate the audio, it becomes easier to discern the nuances that sound unnatural, but at a surface level, it’s relatively convincing. Although the original audio wasn’t analog when captured, it’s analogous to the intended emulation.

So, how do you record analog audio? In the previous example, if I had gone out and sampled the foley with my sampler it would have still gone through a digital conversion, as my Zoom H4n uses flash memory for storage. As you’ll see, in the late 1800s, analog soundwaves could be archived via the invention of the phonograph.

Speaking into the horn, sound waves are localized to the mechanical movement of a small needle making precise engravings into a wax or aluminum cylinder. If you’re familiar with how a record player works, the phonograph was the predecessor but does the engraving and playback on the same system. Both the phonograph and vinyl record are examples of analog audio archives. In fact, we can take this one step further and introduce real-time analog communication over a twisted pair of copper wires of the telephone.

Analog telephony wasn’t invented much later than the phonograph and leveraged the same approach, however, it added the use of electromagnets to carry voice signals across a defined distance over copper wires.

These inventions paved the path towards more comprehensive systems for archiving and transmitting audio in parallel with the modern computing paradigms that followed in the 20th century. The next sections examine some of the adaptations of a digital ecosystem to the legacy methodology of recording and playing audio.

Defining And Demonstrating Digital

Music as an algorithm

I previously defined analog as information conveyed as a continuous signal represented over a property of time, theoretically in some realm of infinite sample points. Digital signals are quantized and sampled, represented in a finite and calculated number of sample points, to closely represent the information within a threshold of acceptable loss of integrity.

Digital signals are thus defined as a function of discrete-time, approximating the information with enough sample points where a computer can connect the dots to resemble the sound with some agreed-upon accuracy. This is facilitated through a process known as Digital Signal Processing (DSP).

There are dedicated DSP chips responsible for this conversion that is better suited for repetitive mathematic calculations in real-time, however, most modern CPUs are able to handle the calculations in real-time as well. Devices requiring dedicated batteries typically will have a DSP to better preserve their power consumption.

If I had captured that soundscape used as an earlier example with my sampler, the incoming signal would have been analog to the microphone. Storing that information in flash memory requires the use of an ADC (analog to digital conversion). To play that signal from a speaker, a DAC (digital to analog) conversion takes place, as the playback from a speaker returns to the territory of analog.

In the history of producing music in the industry, analog audio effects like the Leslie speaker and spring reverbs were commonly used by mixing engineers to incorporate into their recordings for creative sound design techniques. With the use of a DSP, these same effects can be emulated on a single machine in software, allowing the sound techs more room to exploit creative avenues with less overhead.

The theory behind DSP in the domain of audio can be attributed to the Nyquist-Shannon paper which cites a previous paper written by E.T Whittaker in 1915.

“A digital sampling system must have a sample rate at least twice as high as the highest audio frequency being sampled.”

The standard sampling rate for digital audio became 44,100 samples per second, or 44.1khz because the range of human hearing is between 20hz (20 cycles per second) and 20khz (20,000 cycles per second). Since the highest audio frequency being sampled for human hearing is 20khz, according to the Nyquist-Shannon sampling theorem, the sample rate should be 40khz, or two times the highest frequency being sampled. You will likely have noticed a discrepancy, as I said the standard sampling rate is 44.1khz, which is intentionally padded to account for aliasing, which Dan Worrall describes best in the video shown below.

A demonstration I played in my Aston Tech Talks presentation was a pcap of an IP phone call I placed using a Wireshark capture collecting a span of my traffic.

The IP phone was the employer of the DSP using a dedicated chip in its motherboard to turn the analog signal of my voice, into a stream of UDP IP packets over the RTP network protocol to transmit small segments of the phone call across the IP network to be reconstructed by the other end into a single audio waveform.

The payload used a telecommunication standard encoding of G.711, which band limits the signal into what is deemed an acceptable loss of information to whittle down the overall packet size to save overhead on the network and the reconstruction at the other end. This encoding keeps the audio within the range of 300hz-3400hz which sacrifices fidelity to save on bandwidth. This intentionally fits within the average range of the human voice but is largely more detrimental to hold music (Tim Carleton — Opus №1) which neatly fills out the spectrum of human hearing.

Conclusion

Thanks to those who were able to listen to this presentation live and to anyone who found this write-up interesting.

If you have any questions, feel free to reach out:
jordan.sansing@astontech.com