Playing With Audio In Unity

Published in

XRPractices

10 min readMar 3, 2023

We all come across multiple games and applications where having an audio brings more realistic experiences. What if there had been no audio in our application???? Simply we would have felt something missing and we would have not been able to connect much with the application. Right??????

Now let’s talk about the gaming world in specific. Audio can be considered an inseparable element in our lives with no exception to the world of video games where we generally talk about ‘Game Audio’. Now think of what all kinds of audio could be present in a game? It could be some sound effect, ambience sound, background music or even audio of the players in the game itself. Right???? We’ve many gaming tools which could help us develop and achieve realistic audio features in our application. Examples could be — unity, unreal etc.

Development is fine!!!! What comes next???? Testing!!!! ofcourse!!!!! Ever wondered if we have any tool or any software which could help automate features like audio sharing in XR space????

Disclaimer: In this blog we will talk about a case study for an application where we have avatars and they are able to talk to each other just like we do over a real time audio/video application (eg. zoom) but the only point of difference here is, it is a desktop XR application built in unity. We need to test the audio sharing feature in this 3D application.

Prerequisites:Basic knowledge of unity, C# and test automation using Arium is required.

Tech Specifications:

Language : C#
Game development tool: Unity
SDK integrated to provide audio sharing feature: Agora Engine video sdk
Testing framework: Arium
Application type: desktop
OS: Mac OS

Feature Description:

A mac based XR desktop application built on top of unity, comes with audio sharing feature which is provided by Agora video sdk. Each person joining a particular channel has an avatar and here avatars are able to communicate with each other using mute/unmute button. When a person is unmuted and speaks something, everyone present in the room can listen to that person and vice versa.

Steps to Reproduce

Login to the application.
Enter the main room.
Connect to the local host/client ip.
By default a user will be on mute, so next is unmute yourself.
Speak something.
Others present in the room should be able to listen.

Well exploratory testing is fine!!!! But what about automation???? How can we replicate the scenario via automation??

Automation test strategy

We will step by step tell you about the phases we have been through to automate this not-so-simple scenario as it appears to be!!!!

When we talk about XR automation, we don’t have any specific tool which could provide us with all the automation capabilities. The usual approach which one must follow here is to first have good understanding of the development code. To understand about dev implementation you can refer to this link: https://api-ref.agora.io/en/video-sdk/unity/3.x/index.html

Now before proceeding to automation strategy, let’s first discuss what are we going to validate!!!!

Validate that when a user is on mute others present in the room are not able to listen to the user.
Validate that when a user is unmuted others present in the room are able to listen to the user.
Validate that agora video sdk is properly integrated with the application.
Validate that when user clicks on mute button, the mute icon is changed to unmute icon.
Validate that when the user clicks on unmute button , the unmute icon is changed to mute icon.

In this blog we will just discuss above two validations because last three could easily be automated using Arium and the screen-share test approach. Refer to —
https://medium.com/xrpractices/getting-started-with-3d-automation-arium-at-glance-fca27273426d
https://medium.com/xrpractices/screen-share-test-in-unity-59803935f1e3

Test Approach 1: AudioSource and AudioListener approach

This is the first approach that came to our mind.

1.We created two empty gameobjects.

2.To one we added audiosource component and renamed the gameobject as microphonePrefab.

3.Next is audioclip sub-component of the audiosource component. If we attach any audio clip to this audioclip sub-component, that audio will be played at run time. But if we keep that field empty, in that case it takes the microphone input.

4.So we kept audioclip empty and converted these audiosource and audiolistener gameobjects to prefabs which could be instantiated at run-time after adding below code to Arium.cs file in Arium framework.

public GameObject InstantiatePrefab(GameObject gameobject, Vector3 position)
        {
            GameObject go = GameObject.Instantiate(gameobject, position, Quaternion.identity);
            return go;

        }

5.Also before converting this microphonePrefab gameobject to prefab, we attached below script to the gameobject.

using UnityEngine;
public class VoiceScript1 : MonoBehaviour
{
    public AudioSource audioSource;

    void Awake()
    { 
        audioSource = GetComponent<AudioSource>();
        audioSource.clip = Microphone.Start("", true, 5, 44100);
        audioSource.loop = true;
        while (!(Microphone.GetPosition(null) > 0)){}
        audioSource.Play();
      
    }
}

This voicescript1 is having an AudioSource audioSource variable, so what we need to do is, before converting it to prefab, drag and drop the audiosource component of this microphonePrefab itself to this public variable.

6.Next we wrote a script to play an audio which will act like a microphone input. This is implemented using “process class” and using “afplay” terminal command.

using System.Diagnostics;

public class PlayAudio 
{
    public void PlayAudioFile()
    {   
        var p = new Process
        {
            StartInfo =
            {   
                
                FileName = "afplay",
             
                Arguments = "Assets/Tests/why-hello-there-103596.mp3"
                
            }
            
        }.Start();
       
    }
}

7. We also wrote a script to get the data of audiolistener which will help us calculate audio intensity.

using UnityEngine;
public class audioData : MonoBehaviour
{
    public int qSamples = 4096; 
  
    private float[] samples;

    private float clipLoudness = 0f;
  
    public void Start () 
    {
        samples = new float[qSamples]; 
    }
    
    public float GetRMS(int channel )
    {
        AudioListener.GetOutputData(samples, channel
        float sum = 0; 
        for (int i=0; i < qSamples; i++)
        {   
            Debug.Log("Sample:  " + i + "::" +samples[i]);
            sum += samples[i]*samples[i];
            // sum squared samples 
        } 
        foreach (var sample in samples) {
            clipLoudness += Mathf.Abs(sample);
        }
        
        return Mathf.Sqrt(sum/qSamples); 
    
    }


}

Limitations:

Test flakiness due to varying results of rms.

Expected Result: When we are unmuted and when audio is played, it should give a rms value>0 in output but when we are on mute and audio is played, it should give rms=0.

Actual Result: Even if we are muted or unmuted, test was giving rms>0.

Reason for failing test:

AudioSource is present inside the gamescene which is taking direct input from microphone and playing the audio within the scene. Our Audiolistener is not listening to the microphone, but to this audiosource. So when we are on mute and we play the audio, our audiosource will still take microphone input and hence audiolistener will get an input, thus giving rms>0.

Test Approach 2: No AudioSource but AudioListener approach

Now in this this strategy we thought of giving direct input from microphone to the audiolistener.

Test Script:

//Loading the home scene where we have audio sharing feature
LoadHomeScene();
yield return new WaitForSeconds(15);
//connect to localhost/client ip using network manager
GameObject networkManager = _findGameObjects.FindNetworkManager();
networkManager.GetComponent<NetworkManager>().StartHost();
yield return new WaitForSeconds(10);
//click on microphone button to unmute
GameObject microphone = _findGameObjects.FindMicButton();
_arium.PerformAction(new UnityPointerClick(), microphone);
yield return new WaitForSeconds(4);
//play the audio which will act like microphone input
PlayAudio audio = new PlayAudio();
audio.PlayAudioFile();
yield return new WaitForSeconds(4);
//calling audioData script to calculate RMS value
audioData listen = _arium.GetComponent<audioData>("LoadAvator [connId=0]/Avatar");
//here 0 inside GetRMS denotes channel 
float vol = listen.GetRMS(0);
Debug.Log("volume ====" + vol);

Output:

Each time we get RMS=0

Limitations:

AudioListener cannot listen to microphone directly.

Reason for failing test:

The audiolistener present in the scene can only listen to audiosource input and not the microphone. So each time when we play an audio and try to fetch data from audiolistener, it will give rms=0.

Test Approach 3: Validating by voice recognition

Next approach we followed is based on voice recognition. We found this piece of code after great research. The main idea here is, we will create a gameobject say cube and instantiate it at runtime in the home scene. Then we have created a dictionary in this script where we have set of keywords which our cube can recognise. So we will give audio input of some of these keywords via automation, or you can say we will just play an audio with these keywords via automation just like we did earlier. With each keyword we have an action attached for the cube. We will invoke this cube through our test script in the game scene at run-time, then we will play the audio, cube recognises the keywords and accordingly performs the action in the scene. This indicates that audio is going inside our gameview. That means all others present in that room would also be able to listen to the audio being sent.

using UnityEngine;
using System.Collections.Generic;
using System.Linq;
using System;
using System.Collections;
using UnityEngine.Windows.Speech;
public class VoiceControl : MonoBehaviour
{
    // Start is called before the first frame update
    // Voice command vars
    private Dictionary<string, Action> keyActs = new Dictionary<string, Action>();
    private KeywordRecognizer recognizer;
    // Var needed for color manipulation
    private MeshRenderer cubeRend;
    //Var needed for spin manipulation
    private bool spinningRight;
    //Vars needed for sound playback.
    private AudioSource soundSource;
    public AudioClip[] sounds;
    void Start()
    {
        cubeRend = GetComponent<MeshRenderer>();
        soundSource = GetComponent<AudioSource>();
        //Voice commands for changing color
        keyActs.Add("red", Red);
        keyActs.Add("green", Green);
        keyActs.Add("blue", Blue);
        keyActs.Add("white", White);
        //Voice commands for spinning
        keyActs.Add("spin right", SpinRight);
        keyActs.Add("spin left", SpinLeft);
        //Voice commands for playing sound
        keyActs.Add("please say something", Talk);
        //Voice command to show how complex it can get.
        keyActs.Add("pizza is a wonderful food that makes the world better", FactAcknowledgement);
        recognizer = new KeywordRecognizer(keyActs.Keys.ToArray());
        recognizer.OnPhraseRecognized += OnKeywordsRecognized;
        recognizer.Start();
        
    }
    
    void OnKeywordsRecognized(PhraseRecognizedEventArgs args)
    {
        Debug.Log("Command: " + args.text);
        keyActs[args.text].Invoke();
    }
    
    void Red()
    {
        cubeRend.material.SetColor("_Color", Color.red);
    }
    void Green()
    {
        cubeRend.material.SetColor("_Color", Color.green);
    }
    void Blue()
    {
        cubeRend.material.SetColor("_Color", Color.blue);
    }
    void White()
    {
        cubeRend.material.SetColor("_Color", Color.white);
    }
    
    void SpinRight()
    {
        spinningRight = true;
        StartCoroutine(RotateObject(1f));
    }
    void SpinLeft()
    {
        spinningRight = false;
        StartCoroutine(RotateObject(1f));
    }
    private IEnumerator RotateObject(float duration)
    {
        float startRot = transform.eulerAngles.x;
        float endRot;
        if (spinningRight)
            endRot = startRot - 360f;
        else
            endRot = startRot + 360f;
        float t = 0f;
        float yRot;
        while (t < duration)
        {
            t += Time.deltaTime;
            yRot = Mathf.Lerp(startRot, endRot, t / duration) % 360.0f;
            transform.eulerAngles = new Vector3(transform.eulerAngles.x, yRot, transform.eulerAngles.z);
            yield return null;
        }
    }
    void Talk()
    {
        soundSource.clip = sounds[UnityEngine.Random.Range(0, sounds.Length)];
        soundSource.Play();
    }
    void FactAcknowledgement()
    {
        Debug.Log("How right you are.");
    }
    
}

Limitation:

We had to perform this test for a mac based application. But this code is for windows system only. It is using UnityEngine.Windows.Speech library. We could not find any relevant and similar library for mac based application. Though we did not test on our own, but this should work fine for windows system.

Test Approach 4: Recording the game audio

In this approach, we planned to record the game audio using ffmpeg command. We planned to play an audio first via script and then record the game audio and then convert that .mp3 or .wav file to text i.e string which could then be validated for the given input.

using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;
using UnityEngine;

public class RecordAudio 
{
    public void RecordAudioFile()
    {   
        var p = new Process
        {
            StartInfo =
            {   
                
                FileName = "ffmpeg",
                Arguments = "/Users/alisha.raizada/Music/audiofilecapture.mp3"
                
            }
            
        }.Start();
       
    }
}

Limitations:

This approach could not serve the purpose.

Expected: Record a particular microphone channel.

Actual: Just does the normal audio recording.

After implementing this approach we realised that we should find something to capture particular AudioFrame. This will finally serve our purpose. This could only be provided by Agora Engine through which that feature is implemented. Here’s the reason why we mentioned in the beginning that it is very important in XR automation to understand development code and implementation in a better way. But yes this circumlocution opened doors for a lot of learning and exploration.

The Final Test Approach: Capturing audio-data from a particular AudioFrame:

In this approach we need to add an internal class “AudioFrameObserver” to the class which contains all the methods from joining a particular channel to leaving a particular channel.

Inside “Join” function, we need to add few lines as mentioned below.

Also, we need to create a separate function called “GetAudioFrame” which will return the audioframe.

internal class AudioFrameObserver : IAudioFrameObserver

{
   private readonly SpatialAudio _videoSample;
   internal AudioFrameObserver(SpatialAudio videoSample)
   {
       _videoSample = videoSample;
   }

   // Sets whether to receive remote video data in multiple channels.
   public virtual bool IsMultipleChannelFrameWanted()
   {
       return true;
   }

   // Occurs each time the player receives an audio frame.
   public bool OnFrame(AudioPcmFrame videoFrame)

   {
       return true;
   }
   // Retrieves the mixed captured and playback audio frame.
  public override bool OnMixedAudioFrame(string channelId, AudioFrame audioFrame)

   {
       return true;
   }

   // Gets the audio frame for playback.
   public override bool OnPlaybackAudioFrame(string channelId, AudioFrame audioFrame)

   {
       return true;
   }

   // Retrieves the audio frame of a specified user before mixing.
   public override bool OnPlaybackAudioFrameBeforeMixing(string channelId, uint uid, AudioFrame audioFrame)

   {
       return true;
   }
   // Gets the playback audio frame before mixing from multiple channels.

   public virtual bool OnPlaybackAudioFrameBeforeMixingEx(string channelId, uint uid, AudioFrame audioFrame)

   {
       return false;
   }

   // Gets the captured audio frame.

   public override bool OnRecordAudioFrame(string channelId, AudioFrame audioFrame)

   {  
       _videoSample._audioFrame = audioFrame;
       return true;

   }}

On join 

public void Join(string _token , string _channelName)
{  
   
  
   RtcEngine.RegisterAudioFrameObserver(new AudioFrameObserver(this));
  
   // Set the format of the captured raw audio data.
   int SAMPLE_RATE = 16000, SAMPLE_NUM_OF_CHANNEL = 1, SAMPLES_PER_CALL = 1024;
   RtcEngine.SetRecordingAudioFrameParameters(SAMPLE_RATE, SAMPLE_NUM_OF_CHANNEL,
   RAW_AUDIO_FRAME_OP_MODE_TYPE.RAW_AUDIO_FRAME_OP_MODE_READ_WRITE, SAMPLES_PER_CALL);
   RtcEngine.SetPlaybackAudioFrameParameters(SAMPLE_RATE, SAMPLE_NUM_OF_CHANNEL,
   RAW_AUDIO_FRAME_OP_MODE_TYPE.RAW_AUDIO_FRAME_OP_MODE_READ_WRITE, SAMPLES_PER_CALL);
   RtcEngine.SetMixedAudioFrameParameters(SAMPLE_RATE, SAMPLE_NUM_OF_CHANNEL, SAMPLES_PER_CALL);
}

//test func
public AudioFrame GetAudioFrame()
{
    Debug.Log("Bytes:::"+ _audioFrame);
    return _audioFrame;
}

After this is done, we can move to our main test file from where we will be calling this “GetAudioFrame” method.

SpatialAudio obj = new SpatialAudio();
obj.GetAudioFrame();

Next, we can select a method out of the mentioned options to fetch a data of particular audioframe.

Note: When we are calling a method from a different folder or directory we need to provide reference to it in our assembly definition.

That’s All !!!!! Thank You!!!! :)

Authors:

References:

Agora

Arium

Playing With Audio In Unity

Authors:

References:

Written by Alisha Raizada