Now it’s time to dive into the deep end of deep neural networks for recommender engines.
Part 1 of this series regaled us with a small insightful origin story about the startup heycar and its ambitious journey to find the optimal recommender system. For those who weren’t with us for part 1, we went over a few key fundamentals regarding the creation of a base model, A/B testing and the creation of our first collaborative filtering model using LightFM.
In part 2 of this three-part series, we continue this harrowing tale and find ourselves not in the realm of libraries and quick solutions, but instead in the mystical world of Deep Learning.
Why Deep learning?
Recommending “items” in our case vehicle listings to people is an extremely challenging task for a few simple reasons:
- Noise: historical behaviour is inherently difficult to gauge in terms of satisfaction or interest.
- Scale: Many existing recommender algorithms work well on small problems, but struggle to scale both in serving and handling the sheer quantity of events (millions or events).
- Freshness: Being able to recommend new content to existing users and existing content to new users better known as the “cold start” problem (which will be covered in part three of this series).
As discussed in part 1 we had a basic collaborative filtering model which generalised a positive interaction between our users and listings based on the mere fact that users interacted with a listing in some way or another. This is a little like saying the best country at the Olympic games was the one who entered the most athletes, it does improve their chances of getting more gold, but it does not account for the quality of the athletes. We decided that there must be a way to get something more meaningful out of our user’s interactions.
Fortunately for us, one of the key advantages of Deep Neural Networks is that arbitrary categorical and continuous features can be easily added to the model. This was great news for us because we could now take into account each user’s unique preferences towards any number of listed features. Furthermore, another distinct advantage of Deep Neural Networks is its ability to predict an outcome based on a sequence of events, such as a user’s change of preferences over time. With all this sorcery and enough buzzwords to get you through a tech conference, how do you go about creating a Deep Neural Recommender?
The look before the leap
Before we went crazy with Deep Neural Networks we decided to first look into the data we had collected about our user’s interactions with our listings. Like Big brother we see everything… well not quite, but we do collect intuitive data about our user’s interactions with our listings. We decided to use this information to quantise the interactions, for example, a user merely viewing a listing is not necessarily a sign of a positive interaction and should, therefore, be scored lower than an interaction involving contacting a dealer.
Now that we had a measure for our user’s interactions we then needed to determine which features we wanted to use in our model. The process of feature extraction involved a deep look into similarities between features and analysing their impact on what we knew were positive interactions.
Last but not least we had to normalise everything. Neural networks require numeric values as input, so we set out to convert all non-numeric values through enumerations and grouped other data such as price or mileage into manageable buckets.
Next, we decided on a framework to help us engineer the solution. We planned on searching high and low, but a few quick Google searches led us to the solution. Keras a framework built on top of the popular Tensorflow framework powered by Google was our choice as it is easy to implement, well documented and has a vast growing developer community.
A plunge into the Deep end
We continued our journey towards applying Deep Neural Networks for recommendations by taking the logic from what we already knew from collaborative filtering and asking the question: “what about latent features?”
To answer this question we created a base, featureless model which took into account only users, listings and their now quantised interactions to create what we thought would have been our candidate generation model. The idea behind the candidate generation model is that we could quickly replicate the LightFM model to produce groups of listings which we could then process further through a recommendation model, not unlike re-inventing the wheel.
We soon discovered that the candidate generation model was not actually necessary, but it provided a great starting point for us to begin adding in our listing features. One by one we added features, trained and then tested our models until we no longer observed any valuable improvements. After adding our features, we found ourselves with a very shallow neural network, little more than a splash pool compared to the ocean we are looking to create.
The next thing we did was add Embeddings for each of our features. Embedding’s although not exclusive to neural networks are just another component that makes neural networks extremely powerful. Embedding layers act like boxes, in which each distinct feature that passes through it is cleverly grouped with other similar variations of that feature. The groupings created by embedding’s allow us to reduce the dimensionality of the input data, making it more manageable and allow us to view similarities between different variations of a feature.
Lastly, it was time to add some depth to our deep neural network through the use of hidden layers. Not to bore you with technical specifics, hidden layers are simply layers which sit between an input and output layer, which take in a set of weighted inputs and produce an output value depending on the function applied to the layer.
It may seem quite complicated but it really isn’t, the model decides the path for each input value based on the weights and activation functions we give each layer. Finally, we get our deep neural network for recommendations as seen below.
As you can see in our model we have distinct embeddings for each input, this helps us reduce dimensionality and also allows for a couple of extra little tricks, which we will cover in part 3. We merge our listings and then merge the users and listings. At first it was decided to merge the users and listings using the dot merge function, however, after some experimentation, we determined that with a deeper network a concatenation function works better.
Dropout layers are used to prevent overfitting, they merely tell the network to “ignore” a couple of random neurons during training. By ignoring some neurons it helps to prevent neurons from developing a dependency on each other. Much like a lazy person may find a shortcut to perform a task which is not necessarily efficient, the dropout teaches the network that it cannot rely on any other neurons and has to put in a maximum effort.
Finally, we get our hidden dense layers, two dense Rectified Linear Unit (ReLU) layers. The ReLU activation function is the most commonly used activation function in the world of neural networks and for our model seemed to work best.
You may be thinking that this is a great way to make recommendations for returning users, but what about new users and listings? In part 3 of this series, we will explore various methods employed to produce meaningful content for both user and listing cold start problems.
By the way, if you want to work with Machine Learning, Tensorlow or any related topics, take a look at our careers page: