Record and resample audio with Audio Engine

Maysam Shahsavari
6 min readJul 11, 2022

--

(This is my digital keyboard, it was the most relevant image I could think of)

If you cannot read it on medium, you can find it here on my personal blog: https://maysamsh.me/2022/07/11/record-and-resample-audio-with-audio-engine/

During the last month (June 2022) for one of the core components of our iOS app we needed to capture microphone data with a fixed sample rate. The first part was easy, the second part, however, was not straight forward. When it comes to working with audio there are some hardware imposed limitations, one of them is sample rate; for instance, AirPods input sample rate is 24kbps while a wired headphone can go up to 48kbps. Fortunately Apple has provided a rich set of APIs to work with audio which enables us to capture, convert and mix audio signals.

In this article we will create a simple class — AudioEngine — to capture, convert and write audio data into a wave file and communicate with the outside world via delegates. (I’ll post the complete class at the end and the URL to a github project for sample code).

The delegate provides a set of for methods, two of which return audio data:

protocol AudioEngineDelegate: NSObjectProtocol {
func audioEngineFailed(error: Error)
func audioEngineConverted(data: [Float], time: Float64)
func audioEngineStreaming(buffer: AVAudioPCMBuffer, time: AVAudioTime)
func audioEngineStarted()
}

It is worth mentioning that both of audioEngineConverted() and audioEngineStreaming() are returning data after resampling.

This class will get some of its parameters during the initialization, channels, audio format and sample rate:

init(channels: Int, format: AVAudioCommonFormat, sampleRate: Double) { // init }

After getting these parameters in the init function we setup our engine which means tapping the microphone and preparing our AVAudioEngine. Having a status variable would help us to track the current status of the engine so create an enum to cover most cases:

enum EngineStatus {
case notInitialized
case ready
case recording
case paused
case failed
}

Preparation of the engine is done through the init function, after that it’s up to the caller to control the recording using the public functions, start(), pause(), resume(), convert() and writePCMBuffer() to write data into a wave file.

Installing tap on the input node is done with calling installTap on an audio node, in our case input node of our default audio engine:

inputNode.installTap(onBus: inputBus, bufferSize: bufferSize, format: inputFormat, block: { [weak self] (buffer, time) in // })

There is one other static function inside the class which serves an important purpose: routing audio. The default input for audio is always the builtin microphone (default output is the loudspeaker) even if you plug a headphone (not for wired headphones output). This method makes sure if a headphone is connected the audio will be routed through that input. This method gets a list of available output routes, iterates through them and sets the audio port to one of the available connectors for wired/wireless headphones.

static func setupInputRoute() {
let audioSession = AVAudioSession.sharedInstance()
try? audioSession.setCategory(.playAndRecord, mode: .default, options: [.allowAirPlay, .allowBluetoothA2DP, .allowBluetooth])
let currentRoute = audioSession.currentRoute
if currentRoute.outputs.count != 0 {
for description in currentRoute.outputs {
if description.portType == AVAudioSession.Port.headphones || description.portType == AVAudioSession.Port.bluetoothA2DP {
try? audioSession.overrideOutputAudioPort(.none)
} else {
try? audioSession.overrideOutputAudioPort(.speaker)
}
}
} else {
try? audioSession.overrideOutputAudioPort(.speaker)
}

if let availableInputs = audioSession.availableInputs {
var mic : AVAudioSessionPortDescription? = nil

for input in availableInputs {
if input.portType == .headphones || input.portType == .bluetoothHFP {
print("[AudioEngine]: \(input.portName) (\(input.portType.rawValue)) is selected as the input source. ")
mic = input
break
}
}

if let mic = mic {
try? audioSession.setPreferredInput(mic)
}
}

try? audioSession.setActive(true)
print("[AudioEngine]: Audio session is active.")
}

If you notice in the setCategory method of AVAudioSession there are three different options set, .allowAirPlay, .allowBluetoothA2DP, .allowBluetooth. These three account for wired and wireless headphones. The last important step in this method is calling setPreferredInput() and passing the available input.

The last function in the chain is to resample data using an AVAudioConverter object, this object gets two formats, input and out formats and do the job by calling its convert() function.

The last method of the class I’d like to introduce is the one which writes PCM data into an audio file, writePCMBuffer(_ :AVAudioPCMBuffer,_ :URL). this method gets file path and buffer data then writes data into the giver path.

Here you can see the whole class:

// AudioEngine.Swiftimport Foundation
import AVFoundation
import CoreAudio
protocol AudioEngineDelegate: NSObjectProtocol {
func audioEngineFailed(error: Error)
func audioEngineConverted(data: [Float], time: Float64)
func audioEngineStreaming(buffer: AVAudioPCMBuffer, time: AVAudioTime)
func audioEngineStarted()
}
extension AudioEngineDelegate {
func audioEngineStreaming(buffer: AVAudioPCMBuffer, time: AVAudioTime) {}
}
final class AudioEngine {
enum AudioEngineError: Error, LocalizedError {
case noInputChannel
case engineIsNotInitialized
case invalidFormat
case failedToCreateConverter
}

weak var delegate: AudioEngineDelegate?

private var engine = AVAudioEngine()
private var streamingData: Bool = false
private var numberOfChannels: UInt32
private var converterFormat: AVAudioCommonFormat
private var sampleRate: Double
private var outputFile: AVAudioFile?
private let inputBus: AVAudioNodeBus = 0
private let outputBus: AVAudioNodeBus = 0
private let bufferSize: AVAudioFrameCount = 1024
private var inputFormat: AVAudioFormat!
private (set) var status: EngineStatus = .notInitialized

enum EngineStatus {
case notInitialized
case ready
case recording
case paused
case failed
}

init(channels: Int, format: AVAudioCommonFormat, sampleRate: Double) {
numberOfChannels = UInt32(channels)
converterFormat = format
self.sampleRate = sampleRate

setupEngine()
}

fileprivate func setupEngine() {
/// I don't know what the heck is happening under the hood, but if you don't call these next few lines in one closure your code will crash.
/// Maybe it's threading issue?
self.engine.reset()
let inputNode = engine.inputNode
inputFormat = inputNode.outputFormat(forBus: outputBus)
inputNode.installTap(onBus: inputBus, bufferSize: bufferSize, format: inputFormat, block: { [weak self] (buffer, time) in
self?.convert(buffer: buffer, time: time.audioTimeStamp.mSampleTime)
})
engine.prepare()
self.status = .ready
print("[AudioEngine]: Setup finished.")
}

func start() {
guard (engine.inputNode.inputFormat(forBus: inputBus).channelCount > 0) else {
print("[AudioEngine]: No input is available.")
self.streamingData = false
self.delegate?.audioEngineFailed(error: AudioEngineError.noInputChannel)
self.status = .failed
return
}

do {
try engine.start()
self.status = .recording
} catch {
self.streamingData = false
self.delegate?.audioEngineFailed(error: error)
print("[AudioEngine]: \(error.localizedDescription)")
return
}

print("[AudioEngine]: Started tapping microphone.")
return
}

func pause() {
self.engine.pause()
self.status = .paused
self.streamingData = false
}

func resume() {
do {
try engine.start()
self.status = .recording
} catch {
self.status = .failed
self.streamingData = true
self.delegate?.audioEngineFailed(error: error)
print("[AudioEngine]: \(error.localizedDescription)")
}
}

func stop() {
self.engine.stop()
self.outputFile = nil
self.engine.reset()
self.engine.inputNode.removeTap(onBus: inputBus)
setupEngine()
}

func writePCMBuffer(buffer: AVAudioPCMBuffer, output: URL) {
let settings: [String: Any] = [
AVFormatIDKey: buffer.format.settings[AVFormatIDKey] ?? kAudioFormatLinearPCM,
AVNumberOfChannelsKey: buffer.format.settings[AVNumberOfChannelsKey] ?? 1,
AVSampleRateKey: buffer.format.settings[AVSampleRateKey] ?? sampleRate,
AVLinearPCMBitDepthKey: buffer.format.settings[AVLinearPCMBitDepthKey] ?? 16
]

do {
if outputFile == nil {
outputFile = try AVAudioFile(forWriting: output, settings: settings, commonFormat: .pcmFormatFloat32, interleaved: false)
print("[AudioEngine]: Audio file created.")
}
try outputFile?.write(from: buffer)
print("[AudioEngine]: Writing buffer into the file...")
} catch {
print("[AudioEngine]: Failed to write into the file.")
}
}

/**
This method sets the right route in regards to input and output source for audio, otherwise the OS will pick the builtin microphone.
*/
static func setupInputRoute() {
let audioSession = AVAudioSession.sharedInstance()
try? audioSession.setCategory(.playAndRecord, mode: .default, options: [.allowAirPlay, .allowBluetoothA2DP, .allowBluetooth])
let currentRoute = audioSession.currentRoute
if currentRoute.outputs.count != 0 {
for description in currentRoute.outputs {
if description.portType == AVAudioSession.Port.headphones || description.portType == AVAudioSession.Port.bluetoothA2DP {
try? audioSession.overrideOutputAudioPort(.none)
} else {
try? audioSession.overrideOutputAudioPort(.speaker)
}
}
} else {
try? audioSession.overrideOutputAudioPort(.speaker)
}

if let availableInputs = audioSession.availableInputs {
var mic : AVAudioSessionPortDescription? = nil

for input in availableInputs {
if input.portType == .headphones || input.portType == .bluetoothHFP {
print("[AudioEngine]: \(input.portName) (\(input.portType.rawValue)) is selected as the input source. ")
mic = input
break
}
}

if let mic = mic {
try? audioSession.setPreferredInput(mic)
}
}

try? audioSession.setActive(true)
print("[AudioEngine]: Audio session is active.")
}

private func convert(buffer: AVAudioPCMBuffer, time: Float64) {
guard let outputFormat = AVAudioFormat(commonFormat: self.converterFormat, sampleRate: sampleRate, channels: numberOfChannels, interleaved: false) else {
streamingData = false
delegate?.audioEngineFailed(error: AudioEngineError.invalidFormat)
print("[AudioEngine]: Failed to create output format.")
self.status = .failed
return
}

guard let converter = AVAudioConverter(from: inputFormat, to: outputFormat) else {
streamingData = false
delegate?.audioEngineFailed(error: AudioEngineError.failedToCreateConverter)
print("[AudioEngine]: Failed to create the converter.")
self.status = .failed
return
}

let inputBlock: AVAudioConverterInputBlock = { inNumPackets, outStatus in
outStatus.pointee = AVAudioConverterInputStatus.haveData
return buffer
}

let targetFrameCapacity = AVAudioFrameCount(outputFormat.sampleRate) * buffer.frameLength / AVAudioFrameCount(buffer.format.sampleRate)
if let convertedBuffer = AVAudioPCMBuffer(pcmFormat: outputFormat, frameCapacity: targetFrameCapacity) {
var error: NSError?
let status = converter.convert(to: convertedBuffer, error: &error, withInputFrom: inputBlock)

switch status {
case .haveData:
self.delegate?.audioEngineStreaming(buffer: buffer, time: AVAudioTime.init())
let arraySize = Int(convertedBuffer.frameLength)
let samples = Array(UnsafeBufferPointer(start: convertedBuffer.floatChannelData![0], count: arraySize))
if self.streamingData == false {
streamingData = true
delegate?.audioEngineStarted()
}
delegate?.audioEngineConverted(data: samples, time: time)
case .error:
if let error = error {
streamingData = false
delegate?.audioEngineFailed(error: error)
}
self.status = .failed
print("[AudioEngine]: Converter failed, \(error?.localizedDescription ?? "Unknown error")")
case .endOfStream:
streamingData = false
print("[AudioEngine]: The end of stream has been reached. No data was returned.")
case .inputRanDry:
streamingData = false
print("[AudioEngine]: Converter input ran dry.")
@unknown default:
if let error = error {
streamingData = false
delegate?.audioEngineFailed(error: error)
}
print("[AudioEngine]: Unknown converter error")
}

}

}

}

GitHub: https://github.com/maysamsh/avaudioengine

--

--