Week 3 — Audio Emotion Recognition System
Hi everyone!
This is our third blog about our project. In this work, we aim to recognize emotions from audio files. This week, we used Convolutional Neural Network (CNN) on our spectrogram images.
Let’s start !
What is Spectrogram?
A spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time. When applied to an audio signal, spectrograms are sometimes called sonographs, voiceprints, or voicegrams. [1]
Following spectrogram images are from different audios:
First, we resized our images and turned them to grayscale. Then, we applied CNN to spectrograms of our data. In first model, we reduced the size of original images by %80. We used ReLU and softmax activation function between the layers. We choosed ‘adam’ as optimizer and ‘sparse categorical cross entropy’ as loss function. The accuracy of first model is 0,4 in 10 epochs with 2 Conv layers, 5 Dense layers and 3x3 kernel-sized filters.
In second model, we reduced the size of original images by %40. We used ReLU and softmax activation function between the layers. We choosed ‘adam’ as optimizer and ‘sparse categorical cross entropy’ as loss function. The accuracy of second model is 0,36 in 10 epochs with 2 Conv layers, 3 Dense layers and 3x3 kernel-sized filters.
Previous posts:
Week 1 — https://medium.com/bbm406f19/week-1-introduction-557d0143e753
Week 2 — https://medium.com/bbm406f19/week-2-data-analysis-687ec86c0a71
References