Week 3 — Audio Emotion Recognition System

Published in

bbm406f19

3 min readDec 21, 2019

Hi everyone!

This is our third blog about our project. In this work, we aim to recognize emotions from audio files. This week, we used Convolutional Neural Network (CNN) on our spectrogram images.

Let’s start !

What is Spectrogram?

A spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time. When applied to an audio signal, spectrograms are sometimes called sonographs, voiceprints, or voicegrams. [1]

Following spectrogram images are from different audios:

images differ according to audio signals

First, we resized our images and turned them to grayscale. Then, we applied CNN to spectrograms of our data. In first model, we reduced the size of original images by %80. We used ReLU and softmax activation function between the layers. We choosed ‘adam’ as optimizer and ‘sparse categorical cross entropy’ as loss function. The accuracy of first model is 0,4 in 10 epochs with 2 Conv layers, 5 Dense layers and 3x3 kernel-sized filters.

In second model, we reduced the size of original images by %40. We used ReLU and softmax activation function between the layers. We choosed ‘adam’ as optimizer and ‘sparse categorical cross entropy’ as loss function. The accuracy of second model is 0,36 in 10 epochs with 2 Conv layers, 3 Dense layers and 3x3 kernel-sized filters.