It was a nonchalant day in March 2018 when me and three of my fellow batchmates decided to submit an idea for the Mozofest Hackathon to be held at our college, SRM Institute of Science and Technology. We had a brief discussion where we came up with an idea for the AI track. Since most of us were working on Machine Learning and Computer Vision we decided to build an application based on multiple concepts in ML and CV.

Our plan was to make a recommendation system mixed with a face recognition model which was to detect the user’s mood. Based on this mood, we can then run a recommendation system to suggest songs, books, poems and films to the user. Furthermore we also integrated a Neural Style Transfer Module(more on that later) which basically transforms the image of the user into the style of a painting chosen according to the detected mood of the user.

Face Detection Module :

The first part of our system where I spent the most time upon was this part. I used a convolutional neural network and a local 1050 Ti GPU to train it on a dataset of around 13k images divided with a 90 : 10 split into training and testing sets. Firstly, we used OpenCV and a pre trained Haar Cascade to draw a rectangle around the face, and extract the part for further processing.

Haar Cascade run on a Image with Multiple Faces

After this, the face recognition module passes this small part of the image to the convolutional neural network(CNN). The CNN is trained in batches and optimized using the ADAM optimizer. On getting the image it outputs a probability based on the eight classes on which it is trained. These classes include the following — Neutral, Anger, Surprise, Happiness,Disgust,Fear,Contempt and Sadness.

Because of the fact that we had lots of happy and neutral examples in our training set, our model was a bit skewed towards these two classes compared to the others.

Convolutional Model :

The following diagram explains the convolution operation with ease. Basically it involves several matrix multiplication operations with filters which reduce the image size thus condensing the data.

A typical CNN. Our model had the above eight classes instead of dog,cat etc.

This condensed data can be used at a late stage as input to a Fully connected or Dense layer that outputs the several classes using a softmax function.

Recommendation System :

Based on the convolution results we go through a corpus and select items based on the detected mood. These items are varied and include music, films, poems among others. For the hackathon we created the model based on a hastily created corpus, otherwise a recommendation system can be built using Machine Learning by having a big enough DataBase to scan from.

Neural Style Transfer :

In layman’s terms NST is like applying a filter to an image, as in Snapchat or Instagram.

Style Transferred Images. The first column represents the filters and the subsequent rows correspond to different images these filters are applied to.

Technically we use two separate entities, the style loss and the content loss. The style loss measures how much the image correlates with the filter image and the content loss measures how much content it is giving up to achieve that style. Balancing them both, we achieve an image with the style of another.

We use it in a binary context. In our case, we have Edvard Munch’s famous painting the Scream and another painting for the sad/neutral and happy moods respectively.

More on Neural Style Transfer here :

Video Demonstration of our Model :

Code on Github :