Understanding Spatial Audio in ARKit — XR Accessibility Part 1

Published in

XRPractices

5 min readDec 23, 2022

Source: https://www.trustedreviews.com/explainer/what-is-3d-audio-4220209

Spatial Audio aka 3D positional audio is a key part of bringing immersive experience in XR. For a vision impaired user, spatial audio is the only possible way to experience extended reality. On this context, we are set to go through a set of articles which will take you through a journey of key enablers that would make the XR experience more accessible for vision impaired users.

For this article series, we are going to stick with iOS, Swift and ARKit as tech stack for exploration. Enough of talking, let’s start with a simple way of adding a spatial audio to ARKit session.

Step 1: The World Tracking configuration

Spatial Audio experiences are tied with 6DoF (6 Degrees of Freedom) movements of the user. In simple terms, users walking with phone while having AR experience is called a 6DoF movement. In ARKit, this 6DoF movement is tracked by ARSession with a ARWorldTrackingConfiguration

var defaultConfiguration: ARWorldTrackingConfiguration {
  let configuration = ARWorldTrackingConfiguration()
  configuration.planeDetection = .horizontal
  configuration.worldAlignment = .gravityAndHeading
  configuration.isAutoFocusEnabled = true
  return configuration
}

In the tracking configuration “worldAlignment” is key in achieving few things that will come later in this series. “gravityAndHeading” initialises the ARSession world coordinate system to align with gravity and heading to true north. Once this is done, the phone’s camera position can be reliably represented in the scene view if needed (For Haptic feedbacks on collision. More on this topic in next article).

Start the ARKit session with the above mentioned configuration

override func viewDidAppear(_ animated: Bool) {
    super.viewDidAppear(animated)

    // Be sure that the device supports world tracking
    guard ARWorldTrackingConfiguration.isSupported else {
        fatalError("ARKit is not available on this device.")
    }
    sceneView.session.delegate = self
    sceneView.session.run(defaultConfiguration)
    setupAudioEnvironment() // Read Step 2 for details. Place of calling this function is important
    setupAudioAndAddToScene() // Read Step 4 for details.
}

Step 2: Setting up the Audio Environment

The audioEnvironmentNode property of ARSCNView object controls the key spatial audio configurations.

Attenuation — is the sound fading when the user moves away from the audio source. referenceDistance — is the minimum distance from the audio source where no attenuation is applied. maximumDistance — is the distance beyond which no attenuation will be applied. so attenuation aka fading of audio source will happen between referenceDistance till maximumDistance. distanceAttenuationModel — is the mathematical interpolation of attenuation from referenceDistance to maximumDistance. The below chart shows how the interpolation looks under different distanceAttenuationModel

Source: https://www.youtube.com/watch?v=FlMaxen2eyw&t=2325s

rollOffFactor— Control how fast/slow the attenuation happens. This controls the curvature of the inverse and exponential curves in the above diagram. rollOffFactor has no effect in a Linear model.

reverbParameters - property allows you to apply a simulated acoustic environment, or "reverb," to the audio being played. The loadFactoryReverbPreset method allows you to choose from a set of predefined reverb presets that simulate different types of acoustic environments. For example, the .mediumRoom preset simulates the sound of a medium-sized room, while the .largeHall preset simulates the sound of a large hall. You can choose the preset that best fits the type of acoustic environment that you want to simulate for your audio.

In the below sample snippet, the reference distances is set to 0.1 meters and maximum distances is set to 20 meters with attenuation model set to exponential. The acoustic is set to mediumRoom.

func setupAudioEnvironment() {
  let audioEnvironment = sceneView.audioEnvironmentNode
  audioEnvironment.distanceAttenuationParameters.referenceDistance = 0.1
  audioEnvironment.distanceAttenuationParameters.maximumDistance = 20
  audioEnvironment.distanceAttenuationParameters.rolloffFactor = 1.0
  audioEnvironment.distanceAttenuationParameters.distanceAttenuationModel = .exponential
  audioEnvironment.reverbParameters.enable = true
  audioEnvironment.reverbParameters.loadFactoryReverbPreset(.mediumRoom)
  selectAndSetAudioEnvironmentRenderingAlgorithm() // Read Step 3 for details
}

Step 3: Selecting the Audio Rendering algorithm

ARKit offers a good bunch of audio rendering algorithms. If the user is just using the phone and doesn’t have any earbuds or headphone connected then the default “equalPowerPanning” will do. But if the user has connected earbuds then we can produce better audio rendering using the head pose, orientation and size. However, these rendering algorithms are computationally intensive and may drain out device resources faster. Below snippet is an indicative selection of rendering algorithm given the earpiece connectivity and juice in the battery

func selectAndSetAudioEnvironmentRenderingAlgorithm() {
  let audioSession = AVAudioSession.sharedInstance()
  let outputDataSource = audioSession.outputDataSource
  debugPrint("Audio Output DataSource: \(String(describing: outputDataSource?.dataSourceName))")
  // If bluetooth headsets are connected, select a better rendering algorithm
  if outputDataSource?.dataSourceName == "Built-in Output: Headphones" || 
outputDataSource?.dataSourceName == "Built-in Output: AirPods" {
      // Check the battery level of the device
      let device = UIDevice.current
      // more than 50%
      if device.batteryLevel > 0.5 {
          // Use sphericalHead if it is AirPod
          if outputDataSource?.dataSourceName == "Built-in Output: AirPods" {
              sceneView.audioEnvironmentNode.renderingAlgorithm = .sphericalHead
          }
          else {
              // HRTFHQ for the rest of headphones
              sceneView.audioEnvironmentNode.renderingAlgorithm = .HRTFHQ
          }
      } else {
          // Low battery level: use HRTF rendering algorithm
          sceneView.audioEnvironmentNode.renderingAlgorithm = .HRTF
      }
  } else {
      // No Bluetooth headset is connected: use the default rendering algorithm
      sceneView.audioEnvironmentNode.renderingAlgorithm = .equalPowerPanning
  }
}

Step 4: Audio Source, Audio Player and Scene Node

So far we only talked about setting up the audio environment. Adding an spatial audio to AR scene involves creation of Audio Source, Audio Player and attaching them to a node in the scene hierarchy.

IMPORTANT: Spatial audio will work only with mono mp3 files. So make sure you export mono from stereo using tools like audacity beforehand.

private func setupAudioAndAddToScene() {
  // Instantiate the audio source
  let audioSource = SCNAudioSource(fileNamed: "mozart40mono.mp3")!
  // As an environmental sound layer, audio should play indefinitely
  audioSource.loops = true
  // Decode the audio from disk ahead of time to prevent a delay in playback
  audioSource.load()

  let box = SCNBox(width: 0.1, height: 0.1, length: 0.1, chamferRadius: 0)
  let material = SCNMaterial()
  material.diffuse.contents = UIColor.green
  material.specular.contents = UIColor.white
  material.metalness.intensity = 1
  material.shininess = 50
  material.lightingModel = .physicallyBased
  box.materials = [material]

  let node = SCNNode(geometry: box)
  node.position = SCNVector3(0,0,1.0) // set it 1 meters in front of the camera

// The following steps are important
// first add the node to scene
  sceneView.scene.rootNode.addChildNode(node)
// add the audio player only after adding the node to scene. if we reverse, audio never plays
  node.addAudioPlayer(SCNAudioPlayer(source: audioSource))
}

The above code snippet adds a Box for visual indication of position of audio node. Congrats! Now you can move around the box to experience the spatial audio.

Let’s meet in the next part to see how we can track phone movement and use it to provide haptic feedback to the vision impaired user.

Part 2 — The art of providing haptic touch feedback

Understanding Spatial Audio in ARKit — XR Accessibility Part 1

Step 1: The World Tracking configuration

Step 2: Setting up the Audio Environment

Step 3: Selecting the Audio Rendering algorithm

Step 4: Audio Source, Audio Player and Scene Node

Written by Raju K