Record and resample audio with Audio Engine

6 min readJul 11, 2022

(This is my digital keyboard, it was the most relevant image I could think of)

If you cannot read it on medium, you can find it here on my personal blog: https://maysamsh.me/2022/07/11/record-and-resample-audio-with-audio-engine/

During the last month (June 2022) for one of the core components of our iOS app we needed to capture microphone data with a fixed sample rate. The first part was easy, the second part, however, was not straight forward. When it comes to working with audio there are some hardware imposed limitations, one of them is sample rate; for instance, AirPods input sample rate is 24kbps while a wired headphone can go up to 48kbps. Fortunately Apple has provided a rich set of APIs to work with audio which enables us to capture, convert and mix audio signals.

In this article we will create a simple class — AudioEngine — to capture, convert and write audio data into a wave file and communicate with the outside world via delegates. (I’ll post the complete class at the end and the URL to a github project for sample code).

The delegate provides a set of for methods, two of which return audio data:

protocol AudioEngineDelegate: NSObjectProtocol {
    func audioEngineFailed(error: Error)
    func audioEngineConverted(data: [Float], time: Float64)
    func audioEngineStreaming(buffer: AVAudioPCMBuffer, time: AVAudioTime)
    func audioEngineStarted()
}

It is worth mentioning that both of audioEngineConverted() and audioEngineStreaming() are returning data after resampling.

This class will get some of its parameters during the initialization, channels, audio format and sample rate:

init(channels: Int, format: AVAudioCommonFormat, sampleRate: Double) { // init }

After getting these parameters in the init function we setup our engine which means tapping the microphone and preparing our AVAudioEngine. Having a status variable would help us to track the current status of the engine so create an enum to cover most cases:

enum EngineStatus {
     case notInitialized
     case ready
     case recording
     case paused
     case failed
}

Preparation of the engine is done through the init function, after that it’s up to the caller to control the recording using the public functions, start(), pause(), resume(), convert() and writePCMBuffer() to write data into a wave file.

Installing tap on the input node is done with calling installTap on an audio node, in our case input node of our default audio engine:

inputNode.installTap(onBus: inputBus, bufferSize: bufferSize, format: inputFormat, block: { [weak self] (buffer, time) in // })

There is one other static function inside the class which serves an important purpose: routing audio. The default input for audio is always the builtin microphone (default output is the loudspeaker) even if you plug a headphone (not for wired headphones output). This method makes sure if a headphone is connected the audio will be routed through that input. This method gets a list of available output routes, iterates through them and sets the audio port to one of the available connectors for wired/wireless headphones.

static func setupInputRoute() {
    let audioSession = AVAudioSession.sharedInstance()
    try? audioSession.setCategory(.playAndRecord, mode: .default, options: [.allowAirPlay, .allowBluetoothA2DP, .allowBluetooth])
    let currentRoute = audioSession.currentRoute
    if currentRoute.outputs.count != 0 {
        for description in currentRoute.outputs {
            if description.portType == AVAudioSession.Port.headphones || description.portType == AVAudioSession.Port.bluetoothA2DP {
                try? audioSession.overrideOutputAudioPort(.none)
            } else {
                try? audioSession.overrideOutputAudioPort(.speaker)
            }
        }
    } else {
        try? audioSession.overrideOutputAudioPort(.speaker)
    }
    
    if let availableInputs = audioSession.availableInputs {
        var mic : AVAudioSessionPortDescription? = nil
        
        for input in availableInputs {
            if input.portType == .headphones || input.portType == .bluetoothHFP {
                print("[AudioEngine]: \(input.portName) (\(input.portType.rawValue)) is selected as the input source. ")
                mic = input
                break
            }
        }
        
        if let mic = mic {
            try? audioSession.setPreferredInput(mic)
        }
    }
    
    try? audioSession.setActive(true)
    print("[AudioEngine]: Audio session is active.")
}

If you notice in the setCategory method of AVAudioSession there are three different options set, .allowAirPlay, .allowBluetoothA2DP, .allowBluetooth. These three account for wired and wireless headphones. The last important step in this method is calling setPreferredInput() and passing the available input.

The last function in the chain is to resample data using an AVAudioConverter object, this object gets two formats, input and out formats and do the job by calling its convert() function.

The last method of the class I’d like to introduce is the one which writes PCM data into an audio file, writePCMBuffer(_ :AVAudioPCMBuffer,_ :URL). this method gets file path and buffer data then writes data into the giver path.

Here you can see the whole class:

// AudioEngine.Swiftimport Foundation
import AVFoundation
import CoreAudioprotocol AudioEngineDelegate: NSObjectProtocol {
    func audioEngineFailed(error: Error)
    func audioEngineConverted(data: [Float], time: Float64)
    func audioEngineStreaming(buffer: AVAudioPCMBuffer, time: AVAudioTime)
    func audioEngineStarted()
}extension AudioEngineDelegate {
    func audioEngineStreaming(buffer: AVAudioPCMBuffer, time: AVAudioTime) {}
}final class AudioEngine {
    enum AudioEngineError: Error, LocalizedError {
        case noInputChannel
        case engineIsNotInitialized
        case invalidFormat
        case failedToCreateConverter
    }
    
    weak var delegate: AudioEngineDelegate?
    
    private var engine = AVAudioEngine()
    private var streamingData: Bool = false
    private var numberOfChannels: UInt32
    private var converterFormat: AVAudioCommonFormat
    private var sampleRate: Double
    private var outputFile: AVAudioFile?
    private let inputBus: AVAudioNodeBus = 0
    private let outputBus: AVAudioNodeBus = 0
    private let bufferSize: AVAudioFrameCount = 1024
    private var inputFormat: AVAudioFormat!    private (set) var status: EngineStatus = .notInitialized
    
    enum EngineStatus {
        case notInitialized
        case ready
        case recording
        case paused
        case failed
    }
    
    init(channels: Int, format: AVAudioCommonFormat, sampleRate: Double) {
        numberOfChannels = UInt32(channels)
        converterFormat = format
        self.sampleRate = sampleRate
        
        setupEngine()
    }
    
    fileprivate func setupEngine() {
        /// I don't know what the heck is happening under the hood, but if you don't call these next few lines in one closure your code will crash.
        /// Maybe it's threading issue?
        self.engine.reset()
        let inputNode = engine.inputNode
        inputFormat = inputNode.outputFormat(forBus: outputBus)
        inputNode.installTap(onBus: inputBus, bufferSize: bufferSize, format: inputFormat, block: { [weak self] (buffer, time) in
            self?.convert(buffer: buffer, time: time.audioTimeStamp.mSampleTime)
        })
        engine.prepare()
        self.status = .ready
        print("[AudioEngine]: Setup finished.")
    }
    
    func start()  {
        guard (engine.inputNode.inputFormat(forBus: inputBus).channelCount > 0) else {
            print("[AudioEngine]: No input is available.")
            self.streamingData = false
            self.delegate?.audioEngineFailed(error: AudioEngineError.noInputChannel)
            self.status = .failed
            return
        }
        
        do {
            try engine.start()
            self.status = .recording
        } catch {
            self.streamingData = false
            self.delegate?.audioEngineFailed(error: error)
            print("[AudioEngine]: \(error.localizedDescription)")
            return
        }
        
        print("[AudioEngine]: Started tapping microphone.")
        return
    }
    
    func pause() {
        self.engine.pause()
        self.status = .paused
        self.streamingData = false
    }
    
    func resume() {
        do {
            try engine.start()
            self.status = .recording
        } catch {
            self.status = .failed
            self.streamingData = true
            self.delegate?.audioEngineFailed(error: error)
            print("[AudioEngine]: \(error.localizedDescription)")
        }
    }
    
    func stop() {
        self.engine.stop()
        self.outputFile = nil
        self.engine.reset()
        self.engine.inputNode.removeTap(onBus: inputBus)
        setupEngine()
    }
    
    func writePCMBuffer(buffer: AVAudioPCMBuffer, output: URL) {
        let settings: [String: Any] = [
            AVFormatIDKey: buffer.format.settings[AVFormatIDKey] ?? kAudioFormatLinearPCM,
            AVNumberOfChannelsKey: buffer.format.settings[AVNumberOfChannelsKey] ?? 1,
            AVSampleRateKey: buffer.format.settings[AVSampleRateKey] ?? sampleRate,
            AVLinearPCMBitDepthKey: buffer.format.settings[AVLinearPCMBitDepthKey] ?? 16
        ]
        
        do {
            if outputFile == nil {
                outputFile = try AVAudioFile(forWriting: output, settings: settings, commonFormat: .pcmFormatFloat32, interleaved: false)
                print("[AudioEngine]: Audio file created.")
            }
            try outputFile?.write(from: buffer)
            print("[AudioEngine]: Writing buffer into the file...")
        } catch {
            print("[AudioEngine]: Failed to write into the file.")
        }
    }
    
    /**
     This method sets the right route in regards to input and output source for audio, otherwise the OS will pick the builtin microphone.
     */
    static func setupInputRoute() {
        let audioSession = AVAudioSession.sharedInstance()
        try? audioSession.setCategory(.playAndRecord, mode: .default, options: [.allowAirPlay, .allowBluetoothA2DP, .allowBluetooth])
        let currentRoute = audioSession.currentRoute
        if currentRoute.outputs.count != 0 {
            for description in currentRoute.outputs {
                if description.portType == AVAudioSession.Port.headphones || description.portType == AVAudioSession.Port.bluetoothA2DP {
                    try? audioSession.overrideOutputAudioPort(.none)
                } else {
                    try? audioSession.overrideOutputAudioPort(.speaker)
                }
            }
        } else {
            try? audioSession.overrideOutputAudioPort(.speaker)
        }
        
        if let availableInputs = audioSession.availableInputs {
            var mic : AVAudioSessionPortDescription? = nil
            
            for input in availableInputs {
                if input.portType == .headphones || input.portType == .bluetoothHFP {
                    print("[AudioEngine]: \(input.portName) (\(input.portType.rawValue)) is selected as the input source. ")
                    mic = input
                    break
                }
            }
            
            if let mic = mic {
                try? audioSession.setPreferredInput(mic)
            }
        }
        
        try? audioSession.setActive(true)
        print("[AudioEngine]: Audio session is active.")
    }
    
    private func convert(buffer: AVAudioPCMBuffer, time: Float64) {
        guard let outputFormat = AVAudioFormat(commonFormat: self.converterFormat, sampleRate: sampleRate, channels: numberOfChannels, interleaved: false) else {
            streamingData = false
            delegate?.audioEngineFailed(error: AudioEngineError.invalidFormat)
            print("[AudioEngine]: Failed to create output format.")
            self.status = .failed
            return
        }
        
        guard let converter = AVAudioConverter(from: inputFormat, to: outputFormat) else {
            streamingData = false
            delegate?.audioEngineFailed(error: AudioEngineError.failedToCreateConverter)
            print("[AudioEngine]: Failed to create the converter.")
            self.status = .failed
            return
        }
        
        let inputBlock: AVAudioConverterInputBlock = { inNumPackets, outStatus in
            outStatus.pointee = AVAudioConverterInputStatus.haveData
            return buffer
        }
        
        let targetFrameCapacity = AVAudioFrameCount(outputFormat.sampleRate) * buffer.frameLength / AVAudioFrameCount(buffer.format.sampleRate)
        if let convertedBuffer = AVAudioPCMBuffer(pcmFormat: outputFormat, frameCapacity: targetFrameCapacity) {
            var error: NSError?
            let status = converter.convert(to: convertedBuffer, error: &error, withInputFrom: inputBlock)
            
            switch status {
            case .haveData:
                self.delegate?.audioEngineStreaming(buffer: buffer, time: AVAudioTime.init())
                let arraySize = Int(convertedBuffer.frameLength)
                let samples = Array(UnsafeBufferPointer(start: convertedBuffer.floatChannelData![0], count: arraySize))
                if self.streamingData == false {
                    streamingData = true
                    delegate?.audioEngineStarted()
                }
                delegate?.audioEngineConverted(data: samples, time: time)
            case .error:
                if let error = error {
                    streamingData = false
                    delegate?.audioEngineFailed(error: error)
                }
                self.status = .failed
                print("[AudioEngine]: Converter failed, \(error?.localizedDescription ?? "Unknown error")")
            case .endOfStream:
                streamingData = false
                print("[AudioEngine]: The end of stream has been reached. No data was returned.")
            case .inputRanDry:
                streamingData = false
                print("[AudioEngine]: Converter input ran dry.")
            @unknown default:
                if let error = error {
                    streamingData = false
                    delegate?.audioEngineFailed(error: error)
                }
                print("[AudioEngine]: Unknown converter error")
            }
            
        }
        
    }
    
}

GitHub: https://github.com/maysamsh/avaudioengine

Record and resample audio with Audio Engine

Written by Maysam Shahsavari