Week 2 — Data Analysis

Ece Omurtay
bbm406f19
Published in
3 min readDec 10, 2019

Hi everyone!

We are Ece Omurtay & Nur Altıparmak. This is our second post about our project -audio emotion recognition system-. This week, we will give information to you about our data and our features that we extract from data.

We hope these features and statistics give insight about our data.

Let’s get started!

Data Analysis and Statistics

Firstly, we made statistical analyze for discovering data. We can think that emotions in 2 general classes which are positive and negative. In positive class: happy, calm, neutral, suprised. In negative class: disgust, fearful, angry and sad. We visualized the frequencies of emotions in data with histogram and we realized that our data is balanced.

According to our researches, male and female voices have to be trained separately for achieving to get a good accuracy. Because, female has a higher pitch than male. Males often speak at 65 to 260 Hertz, while females speak at 100 to 525 Hz range.

We visualized a man voice and a woman voice -which are labeled as neutral and both voices have same length and sentence- with using time series. We observed woman and man voice are differentiable.

man — left one — & woman right one —

Feature Extraction

We used librosa library for feature extraction. Our features are “mfcc”, “chroma_stft”, “chroma_cqt”, “chroma_cens”, “rms”, “spectral_contrast”, “spectral_bandwidth”, “tonnetz”, “zcr”.

This week, we focused on Mel Frequency Cepstral Coefficent (MFCC). The MFCC of a signal are a small set of features (usually about 10–20) which concisely describe the overall shape of a spectral envelope. [1] The word cepstral comes from the spectral with spec reversed! Cepstrum is the information of rate of change in spectral bands. Mel scale is a scale that relates the perceived frequency of a tone to the actual measured frequency. It scales the frequency in order to match more closely what the human ear can hear. A frequency measured in Hertz (f) can be converted to the Mel scale using the formula : Mel(f) = 2595*log(1 + f/700) [2]

Basically, MFCC is a representation of the vocal system that produces the sound. We visualize MFCC for specific examples (man and woman voice that are labelled as angry).

man — left one — & woman — right one —

See you next week!

--

--