Core Audio Introduction

Nikita Kardakov
Jul 20, 2017 · 9 min read

In ancient India people used to have a measurement of distance called Gavyuti (गव्यूति). 1 Gavyuti was basically the distance on which you can hear the cow call. Move one step away from 1 Gavyuti (which is approximately 3.7km or 2.3 miles) and you wouldn’t hear it. Apart from it being my favourite measurement of distance this is also a good start for our pseudo-scientific introduction into nature of sound and the first step towards the basics of Core Audio. One of the best sound frameworks existing today.

So how do we hear a cow? You’ve probably heard about sound waves (you may even hear them now) and in one of the definitions of sound wave there’s a mention of something called medium — the substance sound waves travel through. On the picture above our medium is air. But sound waves can travel through water, solids and plasma as well (try to submerge yourself in the bath and hit the sides of the bath or try to hit a metallic rail with something). However, if we send the cow into space — it might scream but we wouldn’t hear it because there’s no medium around. Famous physicist Robert Boyle made an experiment of placing an alarm clock into glass chamber and while it was ringing he pumped out all the air from the chamber — clock was still ringing but nobody could hear it (I should try that with my alarm clock).

So if sound travels through the air — what happens to it? With the power of lungs cow’s vocal cords create vibrations in the air and molecules around cow start to move. A lot of people tried to visualise the nature of sound waves and I can look at videos like this one for hours.

We see here that molecules are forming regular intervals of so called rarefactions (areas of sparse molecules) and compressions (areas where there are lots of molecules). Somewhere in the middle of the compression area there will be a point of maximum sound pressure and the distance between these maximum points is called wavelength.

What’s more interesting is that sound wave has a nature of simple harmonic motion. Another good example of simple harmonic motion is pendulum or spring with a weight attached to it. And what’s good about harmonic motion is that it’s sinusoidal. That’s why people usually draw audio signal using sine curve (jumping ahead — when it comes to playback of sound on your computer sine curve representation is not the only option).

So here we have our sine curve. Sound pressure and time. And when it comes to audible parameters of sound we’re usually interested in pitch and loudness. It became a little obsession of mine to figure out how different instruments actually “work”. Couple of years back they found “the oldest” instrument which was a flute made of birds bones. But it’s easier to start with another ancient instrument: monochord. Which is essentially just a piece of string. There’s a legend that it’s been invented by Pythagoras and mathematicians and acousticians love to talk about it. These days we’re more familiar with one of its grandchildren – guitar.

Look at this short clip above. Isn’t it beautiful? You can see that strings that are thicker vibrate “slower”. And what about those molecules of air around the string? It appears that these areas of rarefactions and condensations for thicker strings would be wider which indicates bigger wavelength (measured in meters). And the inverse property for wavelength is frequency (measured in hertz) which in a way is a synonym of pitch. So higher the frequency–higher the note.

This is all very simplified — when you actually play a note on a guitar you hear a composition of sound waves and the nature of those waves depends on the wood your guitar is made of, how long are your nails today, what room are you sitting in, what kind of strings you’re using and so on.

So when sound reaches your ear it causes your ear drum to vibrate. Microphones and speakers are working in the same way. When you want to record something — you’re trying to specify the position of the membrane of the microphone or the speaker at any given time. And when people tried to create first recording and reproducing devices (in the 19th century) they were physically carving displacement of the membrane on a physical object called phonograph cylinder.

In the same fashion the vinyl records are made. And what you get in the end is analog recording which is very accurate (depending on the quality of the equipment of course).

But maybe it’s about time to talk about digital storage of sound. What we’re trying to do is to store (x,y) data points of our wave. Where x is time and y is a displacement of the “membrane” at any given time. We can’t record these data points at every given moment of time (because it will require unlimited space on our computer) but we can record those data points regularly. These approach is called pulse code modulation (PCM).

So we will try to “record” our data points with certain intervals (or with certain rate). And the process of storing these values is called sampling. And one important parameter of sampling is sample rate. It basically means how often do we record our wave. On the image below we have our wave being sampled 1 and 5 times in a second – 1Hz and 5Hz sample rates correspondingly. One of the industry standards is 44.1 kHZ sample rate – which means that we record 44100 data points every second.

Another important parameter of sampling is bit depth — which simply means how many bits we can use for storing these data points or how expressive we are. 1 bit depth means that we can only have one bit for every sample — which allows only two different values that can describe our wave. 32 bit depth is very common and it is quiet obvious that if number is bigger we will need more space to store these samples. Here are few examples.

So if we will “record” our wave with 5 Hz sample rate and 2 bit depth we will be able to approximate (or interpolate) our sound wave later. Although it wouldn’t be very accurate. And naturally if we would increase both bit depth and sample rate we will have more accurate representation.

But why don’t we listen to some real life examples? Here’s one of my favourite arias resampled with different sample rates and bit depths.

So what is Core Audio?

On the high level it’s a set of frameworks which are responsible for running audio on macOS, iOS and watchOS. Historically this was one of the most undocumented, hard-to-start-with and powerful sets of frameworks in Cocoa world. The latter is still true but with few good books and continuous improvement of API’s it is now easy to do very sophisticated things with sound even using Swift (previously you had to do all the heavy-lifting in C).

There’s also a slight difference between particular sets of API’s that deal with sound which are not officially part of Core Audio but this also changes towards solidified set of API’s (which is great).

So if we were to list things that Core Audio can do we would have to mention:

  • Recording sound using all sorts of hardware (internal and external mics).
  • Playback of sound from files, network streams or real-time audio.
  • Processing sound: adding effects, generators, filters and mixing.
  • Converting audio (like the thing with sample rate in the video above).
  • Dealing with MIDI: creating MIDI files and playing back MIDI. Supporting external and internal musical instruments (even via bluetooth!).
  • Communicating between different sound apps (inter-app audio and audio sessions on iOS specifically).
  • Supporting 3D audio: which is great for creating games or immersive audio experiences.

I most likely forgot a lot of important things here but now you understand roughly what we can do. I want to show some really small example so let’s just try to playback a file. I’ve created a macOS app that you can download here for this purpose.

So first let’s create AVAudioEngine. It is relatively new class for Core Audio and conceptually it is just a set of AVAudioNodes. And AVAudioNode is a basic building element of audio apps – you can generate, playback, mix and process sound using nodes. You need to create audio nodes separately from audio engine and than attach them to audio engine. If you ever played in a band you might saw something like this.

Guitar is connected to the speaker through some effect pedals. In fact let’s do exactly the same thing. Or maybe let’s replace the guitar with the cow from the beginning of our article. And if we would like to reproduce this setup with existing audio nodes we would need something like this.

AVAudioEngine has some nodes by default and something called mainMixedNode — instance of AVAudioMixerNode which is already in place.

So we just need to create our nodes and our engine.

let audioEngine = AVAudioEngine()
let playerNode = AVAudioPlayerNode()
let delayNode = AVAudioUnitDelay()
let distortionNode = AVAudioUnitDistortion()
// we don't need to create a mixer node!

Next thing we need to do is to attach external nodes to audio engine and connect them.

let format = ...audioEngine.attach(playerNode)
audioEngine.attach(delayNode)
audioEngine.attach(distortionNode)
audioEngine.connect(playerNode, to: delayNode, format: format)audioEngine.connect(delayNode, to: distortionNode, format: format)audioEngine.connect(distortionNode, to: audioEngine.mainMixerNode, format: format)

Where format is an abstraction of audio settings we’re going to use in-between our nodes (things like sample rate, number of channels, etc). And we will get the format from our audio file.

guard let audioFile = try? AVAudioFile(forReading: audioFileURL!) else {
print("Can't open your file!")
return
}
let format = audioFile.processingFormat

So we have an audio file cow.caf in our project and we just need to find a URL that our program will be able to handle (we’re using our project’s Bundle here for that purpose). And here we have our format — we will just use the format of our audio file and hope that it will be supported by Core Audio (which is not always the case of course).

Now we just need to start our engine and ask playerNode to playback our file.

do {
try audioEngine.start()
playerNode.scheduleFile(audioFile, at: nil, completionHandler: nil)
playerNode.play()
} catch {
print("Can't start the engine!")
}

I mentioned earlier that this is a very simple example but as you can see with just few lines of code you can do pretty powerful things. If you want to expand this idea you can add more effect nodes and change their parameters (pretty much like you can change settings on guitar pedals). I encourage you to go and check some more articles on CoreAudio or even get some books (my personal favourite is “Learning Core Audio” by Chris Adamson and Kevin Avila).

Is everything straightforward and you feel like you can start working with sound on iOS and macOS right now? Well, it should be a little confusing actually. We will try to post more things later. Thank you.

)

Thanks to Alistair Poolman

Nikita Kardakov

Written by

Musician & Developer

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade