Generating Raw Audio Samples
[macOS] — Using Objective-C and the Apple Audio APIs

This is a series of posts about Apple Core Audio API. Read them and practice in order.
Series Episodes:
- S01, 01. Reading Basic Info from a Local Audio File
- [this post] S01, 02. Generating Raw Audio Samples
- S01, 03. Audio Stream Basic Descriptions
- S01, 04. Recording With Audio Queues
- S01, 05. Playing Back With Audio Queues
- S01, 06. Converting Any Audio Format to LPCM
- S01, 07. Converting An Audio File to LPCM
- S01, 08. Playing Back With Audio Units
- S01, 09. Audio Unit Input Via A Render Callback
- S01, 10. Connecting Audio Unit Together
- S01, 11. Positional Sound with OpenAL
- S01, 12. OpenAL Streaming Example
- S01, 13. Connecting to a MIDI Controller
- S01, 14. Primitive MIDI Controller
Introduction
We are developing a command line application that generates a digital audio file. The audio stored in the file is the digital representation of a tone. The frequency of the tone to store is given as a command line argument to the program. For example, if we want to generate the tone of A4
, then we give the number 440.
Note: A list of frequencies for each key on the piano is given here.
Another property of this program is that it can generate three different shapes of waves:
- square shape
- saw shape
- sine shape
The default value is the square
shape wave. If we want to generate another shape wave we give it as the second argument to the program.
Code Repository
The code repo can be found here.
Most Important Parts of the Code
Prepare The Audio File Format Specification
In order to write audio data into a file, we have to tell Core Audio API what is the format of the data we want to write.
We call this the audio stream basic description.
- Instantiate an
AudioStreamBasicDescription
First we instantiate an AudioStreamBasicDescription
like this:
2. Populate with Format Specification Values
We then populate the audio stream basic description structure with the correct data. This is done with the help of the function buildAudioStreamBasicDescription()
.
Let’s read this piece of code more carefully, line-by-line:
- line 2: We initialize the buffer to have values
0
. This is a good practice in order to make sure that properties we don’t give value for they are not going to have a random value. - line 4: We set the
mSampleRate
toSAMPLE_RATE
, which we have defined to be44100
. Hence, we are going to generate an audio sample with CD level quality. - line 5: We set the format to
kAudioFormatLinearPCM
. The Linear PCM (LPCM) is a lossless format that uses the pulse code modulation algorithm to store the audio information. It stores the whole sample digital value, uncompressed. Hence it offers the best quality but requires more storage. [Read more…] - line 6: We need to set some format flags which depend on the format we are using. For LPCM we have to say whether our samples are going to be big-endian byte order encoded or vice versa. Here, we are setting on the flag
kAudioFormatFlagIsBigEndian
, to indicate that our sample integers are going to be following the big-endian byte order encoding, i.e. the most significant byte is placed at the byte with the lowest address. Note that we are generating.aif
files (typeAIFF
) which can only accept big-endian byte ordering of integers. Also, we are setting on the flagkAudioFormatFlagIsSignedInteger
, to indicate that our samples are signed integers. Finally, we are setting the flagkAudioFormatFlagIsPacked
, to indicate that the sample values use all the bits available in each byte. - line 7: We need to tell how many bits we are going to use for each one of the samples. We go with
16
bits per sample. This value is defined in the constantBITS_PER_SAMPLE
. - line 8: We specify the number of channels. Currently
1
(defined inNUMBER_OF_CHANNELS
). Hence, we are generating a mono channel audio sample. - line 10: We need to specify the number of bytes per frame. Since we have
1
channel and each channel holds2
bytes (16
-bits), then we give2
for the number of bytes.BYTES_PER_SAMPLE
is2
andNUMBER_OF_CHANNELS
is1
, henceBYTES_PER_SAMPLE * NUMBER_OF_CHANNELS
is2
. - line 9: We need to specify the frames per packet. But LPCM does not use packets, i.e. each packet contains 1 frame. The different number of packets per frame are only useful when variable bit rate encoding is used. Not with LPCM. Hence
FRAMES_PER_PACKET
is set to1
. - line 11: We need to specify the number of bytes per packet. This is the number of frames per packet multiplied by the number of bytes on each frame. This ends being
2
, since1 frame per packet x 2 bytes per frame = 2 bytes per packet
.
Initialize the Audio File
We are ready with the audio stream basic description and this allows us to initialize the audio file. This is done with the help of the function AudioFileCreateWithURL()
.
- We use the flag
kAudioFileAIFFType
to tell that we want to create a file of typeAIFF
. - We pass the reference to the local audio stream basic description data (
&audioStreamBasicDescription
) - We specify that if the file already exists it will first be deleted (
kAudioFileFlags_EraseFile
). - We will get a handle to the audio file stored into the local variable
audioFile
.
Calculate The Total Number Of Samples
Before calculating the exact sample values, we need to know how many samples we have to generate.
Since we want to use the SAMPLE_RATE
equal to 44,100 samples per second, and we want to record 5 seconds of audio (DURATION
), then the total number of samples we want to generate is SAMPLE_RATE * DURATION
.
Hence, for 5
seconds duration, the total number of samples we want to generate is equal to 220,500
(or for 1
second, it is 44,100
).
This is useful to know in advance, because it will be used to terminate the whole program. As soon as we have generated this number of samples, we will stop the program.
Number of Samples Per Period (Wave Length in Samples)
We know that we have to generate 44,100
samples per 1
second (or 220,500
samples for 5 seconds). But how is this related to the tone frequency of the tone we want to generate?
For example, let’s assume that we want to generate 1 second of one A4
, i.e. 440Hz
. The Hz
frequency means that we have an acoustic signal whose values have a periodicity. In particular 1Hz
means that he signal does 1 repetition in 1
second. Whereas, 440Hz
means that the signal does 440
repetitions within 1
second. Each repetition is called, wave and its duration is called wave period. The greater the number of repetitions within 1
second the higher the pitch of the sound. For example, the tone with frequency 880Hz
is higher in pitch than the tone with frequency 440Hz
.
Since we know that our sample values need to repeat and do440
cycles within 1
second (for the example of the A4
tone), this means that we have to allocate the 44,100
samples of each second to 440
cycles. Or in other words, each cycle needs to have 44,100 / 440
samples. This is the period or the wave length measured in samples.
This number, the length of the wave/period in samples, is very useful because it will be used as the end point when generating samples for one period. As soon as we finish with the first period, we then proceed to calculate the samples for the second period, then for the third and so on, until we exhaust in generating all the samples (maxSampleCount
) we want.
Note: this algorithm is quite basic and dumb, but it works as a basis in these series of tutorials.
Loop Generating Samples
This is the bulk of the work done by the program. The program has to generate maxSampleCount
samples. Hence it gets into a loop that goes step-by-step, from 1
up to maxSampleCount
:
Inner Loop to Generate Samples For a Period/Wave
But, as we said, the audio signals are waves that repeat every second. And we already know how many samples to generate for one cycle, one period. It is waveLengthInSamples
samples.
This means that we need to loop from 1
to waveLengthInSamples
when generating the samples for a period. This is an inner loop inside the outer loop.
Generating a Sample
The program supports 3 different wave shapes:
square
saw
sine
Square-shaped Wave
The square
shape wave is more or less something like this:

Half of the samples of the wave are on the maximum value and the other half are on the minimum value.
This is quite easily implemented. Let’s have a look at the function generateSquareShapeSample()
.
For the first half (i <= waveLengthInSamples / 2
) we return SHRT_MAX
. Otherwise we return SHRT_MIN
.
For example, if i = 1
, waveLengthInSamples = 8
, SHRT_MAX = 1.0
and SHRT_MIN = -1.0
, then 1
is less than or equal to8 / 2
and we return 1.0
. If i = 4
this is less than or equal to 8 / 2
and we return 1.0
too. But for i = 5
which is greater than 8 / 2
we return -1.0
.
Note that we use SHRT_MAX
as the maximum value (maximum volume) of each sample and SHRT_MIN
as the minimum value (minimum volume) of each sample. Also, the function assumes that the position of the sample (i
) given is always within the limits of the number of samples per wave (waveLengthInSamples
). Otherwise, it raises an error (assert <= waveLengthInSamples
).
Saw-Shaped Wave
A saw
shaped wave is something like the following:

Given the sample position i
, in order to find the correct value for the sample using the saw
function is a little bit more difficult. Let’s see the implementation of the function generateSawShapeSample()
.
The generateSawShapeSample()
function implements the mathematical function y(i, numberOfSamples, maxValue) = (2 * maxValue / numberOfSamples * (i — 1) — maxValue
.
Basically, this function divides the range of volume values (-SHRT_MAX..SHRT_MAX
) to waveLengthInSamples
pieces, and given the index/position of the piece (i — 1
) it calculates the value it should have. Note that the distance between the minimum value and the maximum value is 2 * SHRT_MAX
. Hence the division of this distance into waveLengthInSamples
pieces is done with 2 * SHRT_MAX / waveLengthInSamples
. Which means that the actual value of sample at position (i — 1
) should be the multiplication 2 * SHRT_MAX / waveLengthInSamples * (i — 1)
. But since we want the volume to run from negative to positive range (-SHRT_MAX..SHRT_MAX
) we finally have to subtract -SHRT_MAX
.
Try some values for the input and see how it works. For example if you set i
equal to 1
and numberOfSamples
equal to 8
and maxValue
equal to 1.0
, then you will get y(1, 8, 1.0) = (2 * 1.0 / 8 * (1 — 1) — 1.0 => -1.0
. Whereas, if i
is equal to 4
, then y(4, 8, 1.0) = (2 * 1.0 / 8 * (4 — 1) — 1.0 => -0.25
.
Sine-shaped Wave
An example of a sine-shaped wave is given in the following figure below:

The most basic form of the sine function as a function of time is the following:
y(t) = A * sin(2 * π * f * t)
where
A
: is the maximum value from0
. You can think aboutSHRT_MAX
here.f
: is the frequency, i.e. the number of cycles per second. You can think about the tone frequency here, i.e.hz
in our case.π
: is the mathematical constantπι
(= 3.141592653589793
).
The sine function is periodic by its definition, i.e. it repeats every 1 / f
seconds. All the values that the function can take for a specific period are the values that correspond to t
from 0
to 1 / f
. We need to divide the time distance of 0
to 1 / f
into waveLengthInSamples
pieces, because this is the number of samples we have to calculate. Hence,
(1 / f) / waveLengthInSamples
But waveLengthInSamples
is actually SAMPLE_RATE / f
.
Hence,
(1 / f) / (SAMPLE_RATE / f)
or 1 / f * f / SAMPLE_RATE
,
or 1 / SAMPLE_RATE
This is the step we need walk the time to go from 0
to 1/f
second. Like this:
y(0) = A * sin(2 * π * hz * 0)
y(1) = A * sin(2 * π * hz * 1 * 1 / SAMPLE_RATE)
y(2) = A * sin(2 * π * hz * 2 * 1 / SAMPLE_RATE)
y(3) = A * sin(2 * π * hz * 3 * 1 / SAMPLE_RATE)
e.t.c.
y(i) = A * sin(2 * π * hz * i * 1 / SAMPLE_RATE)
But hz / SAMPLE_RATE
is the reverse of waveLengthInSamples
, i.e.
1 / waveLengthInSamples
Hence, the function y
for i
becomes:
y(i) = A * sin(2 * π * i / waveLengthInSamples)
The above is the explanation of how we came up with the following function that returns the sample value for the position i
:
Note: on line 4, we need to actually give i — 1
and not i
, cause our samples are indexed from 1
.
Calling Sample Generating Functions
Having explained the three different ways to produce samples, here is the code that actually invokes the corresponding functions according to the shape that is given in the command line.
Watch for the last line conversion. CFSwapInt16HostToBig()
makes sure that the integer given as argument (sample
) is being converted to big-endian byte ordering, depending on the byte ordering representation of an integer in the host operating system, i.e. the operating system that is running this program. This function will make sure the conversion is done appropriately in different operating systems like macOS or iPhone or their versions.
Writing The Sample Into The File
We have the sample
value and we now need to write it into the audio file. This is the piece of code that does that:
The function AudioFileWriteBytes()
allows us to write the bytes held in the memory address &sample
into the file audioFile
at its position offset equal to sampleCount * bytesToWrite
.
Running the Program
Here are three examples of running the program:
Square-shaped for 440Hz
$ ./WriteRawAudioSamples 440 square
will generate the file 440.000-square.aif
If you open this file in your DAW, you will see its waveform being something like this:

Saw-shaped for 440Hz
$ ./WriteRawAudioSamples 440 saw
will generate the file 440.000-saw.aif
If you open this file in your DAW, you will see its waveform being something like this:

Sine-shaped for 440Hz
$ ./WriteRawAudioSamples 440 sine
will generate the file 440.000-sine.aif
Sine-shaped Wave Properties
The sine-shaped (or sinusoidal) waves sound much more pleasing and clear to the human ear. A sound that is made of more than one sine wave will have perceptible harmonics. This is how the timbre of the sound is made of and how we can tell one instrument from another when both playing the same tone. Please, also note, that if a sound is built up of aperiodic waves, then this is considered noise.
Closing Note
We have seen how we can generate an audio file. We took a tone frequency (e.g. 440Hz) and we repeated it for 5 seconds. We sampled at 44,100 samples per second. Each sample has been calculated according to the waveform we wanted to generate. We tried with three different waveforms: i) square-shaped, ii) saw-shaped and iii) sine-shaped.
The most important Core Audio API elements we used were:
AudioStreamBasicDescription
, to give the specification about the format of the data in the audio file we create.AudioFileWriteBytes()
, to write bytes into the audio file.
We wrote the sample one at a time, which might not be that performant. In the next episodes we will improve this by using buffers.