Emotion Recognizer with CoreML
Use Sound Analysis for recognizing human emotions
Research on emotion has increased significantly over the past two decades with many fields. From these researches are born different types of applications.
With vocal audios becoming more and more common, in fact, they’re becoming a useful resource for studying human emotions. We thought that there could be a correlation between audio waves contained is those audios and human emotions.
How we built our CoreML Model
Machine Learning is a useful technology to make expectations on image data, audio data, and others. For this reason, to analyze audio data we created a machine learning model with CreateML.
In order to create our model, we used the data from Toronto emotional speech set (TESS). The following dataset contains 2.800 sample audio files divided into 7 categories. We adapted this dataset to be used in CreateML by dividing them into 2 big folders:
- Training: used for creating the model
- Testing: used for providing us feedback about the reliability of the model
The first contains 80% of the data and the second one remaining 20%.
To save the model just drag and drop the output on your desktop.
How to implement the model on Swift
Create a new project on XCode and drag and drop the model, that you previously saved, in the project folder adding the reference.
First of all import SoundAnalysis used to analyze the audio file with the model that you created previously. After that, in the main class, copy the following code:
The delegate and function startAudioEngine() will be explained later in the article. The first variables that you see are used for instantiating the model and the audio engine for record audio data. Next thing will be adding the record audio function:
The main purpose of this function is to enable the device microphone. Remember to add the Privacy permission usage of the microphone in the info.plist. This function also gives input to the analyzer, which will be shown in the next snippet:
The class ResultObserver takes the compared audio inside request and returns the results with confidence over the specified threshold.
Make Dubbing Easier
With these tools our goal is to develop an application that could help novice dubbers to improve themselves during their work path. How?
Our goal is to compare the scenes to dub and the actor’s voice to generate, with our coreML model, a result that can be useful to the dubber to understand how much is accurate (in terms of emotions) his dubbing.
Here you can find our CoreML model.