Generating Raw Audio Samples

[macOS] — Using Objective-C and the Apple Audio APIs

Picture from Free-Photos in Pixabay

This is a series of posts about Apple Core Audio API. Read them and practice in order.

Series Episodes:

Introduction

We are developing a command line application that generates a digital audio file. The audio stored in the file is the digital representation of a tone. The frequency of the tone to store is given as a command line argument to the program. For example, if we want to generate the tone of A4, then we give the number 440.

Note: A list of frequencies for each key on the piano is given here.

Another property of this program is that it can generate three different shapes of waves:

  1. square shape
  2. saw shape
  3. sine shape

The default value is the square shape wave. If we want to generate another shape wave we give it as the second argument to the program.

Code Repository

The code repo can be found here.

Most Important Parts of the Code

Prepare The Audio File Format Specification

In order to write audio data into a file, we have to tell Core Audio API what is the format of the data we want to write.

We call this the audio stream basic description.

  1. Instantiate an AudioStreamBasicDescription

First we instantiate an AudioStreamBasicDescription like this:

2. Populate with Format Specification Values

We then populate the audio stream basic description structure with the correct data. This is done with the help of the function buildAudioStreamBasicDescription().

Let’s read this piece of code more carefully, line-by-line:

  • line 2: We initialize the buffer to have values 0. This is a good practice in order to make sure that properties we don’t give value for they are not going to have a random value.
  • line 4: We set the mSampleRate to SAMPLE_RATE, which we have defined to be 44100. Hence, we are going to generate an audio sample with CD level quality.
  • line 5: We set the format to kAudioFormatLinearPCM. The Linear PCM (LPCM) is a lossless format that uses the pulse code modulation algorithm to store the audio information. It stores the whole sample digital value, uncompressed. Hence it offers the best quality but requires more storage. [Read more…]
  • line 6: We need to set some format flags which depend on the format we are using. For LPCM we have to say whether our samples are going to be big-endian byte order encoded or vice versa. Here, we are setting on the flag kAudioFormatFlagIsBigEndian, to indicate that our sample integers are going to be following the big-endian byte order encoding, i.e. the most significant byte is placed at the byte with the lowest address. Note that we are generating .aif files (type AIFF) which can only accept big-endian byte ordering of integers. Also, we are setting on the flag kAudioFormatFlagIsSignedInteger, to indicate that our samples are signed integers. Finally, we are setting the flag kAudioFormatFlagIsPacked, to indicate that the sample values use all the bits available in each byte.
  • line 7: We need to tell how many bits we are going to use for each one of the samples. We go with 16 bits per sample. This value is defined in the constant BITS_PER_SAMPLE.
  • line 8: We specify the number of channels. Currently 1 (defined in NUMBER_OF_CHANNELS). Hence, we are generating a mono channel audio sample.
  • line 10: We need to specify the number of bytes per frame. Since we have 1 channel and each channel holds 2 bytes (16-bits), then we give 2 for the number of bytes. BYTES_PER_SAMPLE is 2 and NUMBER_OF_CHANNELS is 1, hence BYTES_PER_SAMPLE * NUMBER_OF_CHANNELS is 2.
  • line 9: We need to specify the frames per packet. But LPCM does not use packets, i.e. each packet contains 1 frame. The different number of packets per frame are only useful when variable bit rate encoding is used. Not with LPCM. Hence FRAMES_PER_PACKET is set to 1.
  • line 11: We need to specify the number of bytes per packet. This is the number of frames per packet multiplied by the number of bytes on each frame. This ends being 2, since 1 frame per packet x 2 bytes per frame = 2 bytes per packet.

Initialize the Audio File

We are ready with the audio stream basic description and this allows us to initialize the audio file. This is done with the help of the function AudioFileCreateWithURL().

  • We use the flag kAudioFileAIFFType to tell that we want to create a file of type AIFF.
  • We pass the reference to the local audio stream basic description data (&audioStreamBasicDescription)
  • We specify that if the file already exists it will first be deleted (kAudioFileFlags_EraseFile).
  • We will get a handle to the audio file stored into the local variable audioFile.

Calculate The Total Number Of Samples

Before calculating the exact sample values, we need to know how many samples we have to generate.

Since we want to use the SAMPLE_RATE equal to 44,100 samples per second, and we want to record 5 seconds of audio (DURATION), then the total number of samples we want to generate is SAMPLE_RATE * DURATION.

Hence, for 5 seconds duration, the total number of samples we want to generate is equal to 220,500 (or for 1 second, it is 44,100).

This is useful to know in advance, because it will be used to terminate the whole program. As soon as we have generated this number of samples, we will stop the program.

Number of Samples Per Period (Wave Length in Samples)

We know that we have to generate 44,100 samples per 1 second (or 220,500 samples for 5 seconds). But how is this related to the tone frequency of the tone we want to generate?

For example, let’s assume that we want to generate 1 second of one A4, i.e. 440Hz. The Hz frequency means that we have an acoustic signal whose values have a periodicity. In particular 1Hz means that he signal does 1 repetition in 1 second. Whereas, 440Hz means that the signal does 440 repetitions within 1 second. Each repetition is called, wave and its duration is called wave period. The greater the number of repetitions within 1 second the higher the pitch of the sound. For example, the tone with frequency 880Hz is higher in pitch than the tone with frequency 440Hz.

Since we know that our sample values need to repeat and do440 cycles within 1 second (for the example of the A4 tone), this means that we have to allocate the 44,100 samples of each second to 440 cycles. Or in other words, each cycle needs to have 44,100 / 440 samples. This is the period or the wave length measured in samples.

This number, the length of the wave/period in samples, is very useful because it will be used as the end point when generating samples for one period. As soon as we finish with the first period, we then proceed to calculate the samples for the second period, then for the third and so on, until we exhaust in generating all the samples (maxSampleCount) we want.

Note: this algorithm is quite basic and dumb, but it works as a basis in these series of tutorials.

Loop Generating Samples

This is the bulk of the work done by the program. The program has to generate maxSampleCount samples. Hence it gets into a loop that goes step-by-step, from 1 up to maxSampleCount:

Inner Loop to Generate Samples For a Period/Wave

But, as we said, the audio signals are waves that repeat every second. And we already know how many samples to generate for one cycle, one period. It is waveLengthInSamples samples.

This means that we need to loop from 1 to waveLengthInSamples when generating the samples for a period. This is an inner loop inside the outer loop.

Generating a Sample

The program supports 3 different wave shapes:

  1. square
  2. saw
  3. sine

Square-shaped Wave

The square shape wave is more or less something like this:

Square-shaped wave, 8 samples per wave, 2 waves displayed

Half of the samples of the wave are on the maximum value and the other half are on the minimum value.

This is quite easily implemented. Let’s have a look at the function generateSquareShapeSample().

For the first half (i <= waveLengthInSamples / 2) we return SHRT_MAX. Otherwise we return SHRT_MIN.

For example, if i = 1, waveLengthInSamples = 8, SHRT_MAX = 1.0 and SHRT_MIN = -1.0, then 1 is less than or equal to8 / 2 and we return 1.0. If i = 4 this is less than or equal to 8 / 2 and we return 1.0 too. But for i = 5 which is greater than 8 / 2 we return -1.0.

Note that we use SHRT_MAX as the maximum value (maximum volume) of each sample and SHRT_MIN as the minimum value (minimum volume) of each sample. Also, the function assumes that the position of the sample (i) given is always within the limits of the number of samples per wave (waveLengthInSamples). Otherwise, it raises an error (assert <= waveLengthInSamples).

Saw-Shaped Wave

A saw shaped wave is something like the following:

Saw-shaped wave, 8 samples per wave, 2 waves displayed

Given the sample position i, in order to find the correct value for the sample using the saw function is a little bit more difficult. Let’s see the implementation of the function generateSawShapeSample().

The generateSawShapeSample() function implements the mathematical function y(i, numberOfSamples, maxValue) = (2 * maxValue / numberOfSamples * (i — 1) — maxValue.

Basically, this function divides the range of volume values (-SHRT_MAX..SHRT_MAX) to waveLengthInSamples pieces, and given the index/position of the piece (i — 1) it calculates the value it should have. Note that the distance between the minimum value and the maximum value is 2 * SHRT_MAX. Hence the division of this distance into waveLengthInSamples pieces is done with 2 * SHRT_MAX / waveLengthInSamples. Which means that the actual value of sample at position (i — 1) should be the multiplication 2 * SHRT_MAX / waveLengthInSamples * (i — 1). But since we want the volume to run from negative to positive range (-SHRT_MAX..SHRT_MAX) we finally have to subtract -SHRT_MAX.

Try some values for the input and see how it works. For example if you set i equal to 1 and numberOfSamples equal to 8 and maxValue equal to 1.0, then you will get y(1, 8, 1.0) = (2 * 1.0 / 8 * (1 — 1) — 1.0 => -1.0. Whereas, if i is equal to 4, then y(4, 8, 1.0) = (2 * 1.0 / 8 * (4 — 1) — 1.0 => -0.25.

Sine-shaped Wave

An example of a sine-shaped wave is given in the following figure below:

Sine-shaped wave, 8 samples per wave, 2 waves displayed

The most basic form of the sine function as a function of time is the following:

y(t) = A * sin(2 * π * f * t)

where

  • A : is the maximum value from 0. You can think about SHRT_MAX here.
  • f: is the frequency, i.e. the number of cycles per second. You can think about the tone frequency here, i.e. hz in our case.
  • π: is the mathematical constant πι (= 3.141592653589793).

The sine function is periodic by its definition, i.e. it repeats every 1 / f seconds. All the values that the function can take for a specific period are the values that correspond to t from 0 to 1 / f. We need to divide the time distance of 0 to 1 / f into waveLengthInSamples pieces, because this is the number of samples we have to calculate. Hence,

(1 / f) / waveLengthInSamples

But waveLengthInSamples is actually SAMPLE_RATE / f.

Hence,

(1 / f) / (SAMPLE_RATE / f) or 1 / f * f / SAMPLE_RATE,

or 1 / SAMPLE_RATE

This is the step we need walk the time to go from 0 to 1/f second. Like this:

y(0) = A * sin(2 * π * hz * 0)

y(1) = A * sin(2 * π * hz * 1 * 1 / SAMPLE_RATE)

y(2) = A * sin(2 * π * hz * 2 * 1 / SAMPLE_RATE)

y(3) = A * sin(2 * π * hz * 3 * 1 / SAMPLE_RATE)

e.t.c.

y(i) = A * sin(2 * π * hz * i * 1 / SAMPLE_RATE)

But hz / SAMPLE_RATE is the reverse of waveLengthInSamples, i.e.

1 / waveLengthInSamples

Hence, the function y for i becomes:

y(i) = A * sin(2 * π * i / waveLengthInSamples)

The above is the explanation of how we came up with the following function that returns the sample value for the position i:

Note: on line 4, we need to actually give i — 1 and not i, cause our samples are indexed from 1.

Calling Sample Generating Functions

Having explained the three different ways to produce samples, here is the code that actually invokes the corresponding functions according to the shape that is given in the command line.

Watch for the last line conversion. CFSwapInt16HostToBig() makes sure that the integer given as argument (sample) is being converted to big-endian byte ordering, depending on the byte ordering representation of an integer in the host operating system, i.e. the operating system that is running this program. This function will make sure the conversion is done appropriately in different operating systems like macOS or iPhone or their versions.

Writing The Sample Into The File

We have the sample value and we now need to write it into the audio file. This is the piece of code that does that:

The function AudioFileWriteBytes() allows us to write the bytes held in the memory address &sample into the file audioFile at its position offset equal to sampleCount * bytesToWrite.

Running the Program

Here are three examples of running the program:

Square-shaped for 440Hz

$ ./WriteRawAudioSamples 440 square

will generate the file 440.000-square.aif

If you open this file in your DAW, you will see its waveform being something like this:

Saw-shaped for 440Hz

$ ./WriteRawAudioSamples 440 saw

will generate the file 440.000-saw.aif

If you open this file in your DAW, you will see its waveform being something like this:

Sine-shaped for 440Hz

$ ./WriteRawAudioSamples 440 sine

will generate the file 440.000-sine.aif

Sine-shaped Wave Properties

The sine-shaped (or sinusoidal) waves sound much more pleasing and clear to the human ear. A sound that is made of more than one sine wave will have perceptible harmonics. This is how the timbre of the sound is made of and how we can tell one instrument from another when both playing the same tone. Please, also note, that if a sound is built up of aperiodic waves, then this is considered noise.

Closing Note

We have seen how we can generate an audio file. We took a tone frequency (e.g. 440Hz) and we repeated it for 5 seconds. We sampled at 44,100 samples per second. Each sample has been calculated according to the waveform we wanted to generate. We tried with three different waveforms: i) square-shaped, ii) saw-shaped and iii) sine-shaped.

The most important Core Audio API elements we used were:

  • AudioStreamBasicDescription, to give the specification about the format of the data in the audio file we create.
  • AudioFileWriteBytes(), to write bytes into the audio file.

We wrote the sample one at a time, which might not be that performant. In the next episodes we will improve this by using buffers.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store