Week 4 — Emotion Detection

Şeyma Yılmaz
bbm406f19
Published in
2 min readDec 22, 2019

Hello everyone! We continue to provide information about the progress of our Machine Learning Project. In this post, we will give you information about data augmentation. What is it? What is it used for? Why did we choose to use it? You can continue reading for all of them!

We all know that the more data available, the better the model trains. And so we have a high-performance model. Who wouldn’t want that? Sometimes the dataset we have is not enough or the data may not be distributed properly to all classes. As an inevitable result, the success of the model falls. In order to prevent this situation, increasing the number of samples by producing synthetic data from the existing data set is called data augmentation.

Considering the data we use, as we mentioned in the post last week, there is a total of 37k data. This data is divided into training and validation. Besides, we have 7 classes. The chart below shows the distribution of data across 7 classes.

As you can see in the graph above, the data is not proportionally distributed across all classes. Especially in the disgusting class, there are fewer pictures than the others. It is clear that this will reduce the success of the model. To prevent this situation, it is necessary to augmentation the data. Using Data Augmentation, for example, with the rotation_rate parameter, we can use the different angle of the photo as data. In this way, we have an image in different ways and thus we increase the data set. Isn’t that really a smart way?

We will continue to project progress in the next post. Stay tuned! See you next week!

--

--