What’s in a neural network recommender

dimalvovs
The Startup
Published in
4 min readJun 18, 2019

Item Based Collaborative Filtration (IBCF) algorithm has received the “test of time” acknowledgement in 2017, a year later than AlphaGo Master, Google’s neural network (NN) that can learn to play Go won one of the brightest human Go players, Lee Sedol.

At work I always love to choose the simple algorithms over the more complicated ones — when both perform similarly — simpler algorithms take less time to run and to implement, but now can not restrain from finding out how a basic NN autoencoder performs recommendations, given all the hype around NNs and deep learning in particular, and the availability of packages (read: tensorflow) that make it easy to implement NN models, from training to production.

Let’s see how a simple autoencoder (a neural network having as many inputs as outputs) competes with the classic IBCF algorithm.

We will use MovieLens 100K dataset, and we’ll use it in the the implicit mode, meaning that the recommender will predict, based on known movie ratings per user, what other movies a person would watch (or, strictly speaking, rate).

To make the dataset correspond to implicit representation, the ratings are converted to 1 in case the was any rating and 0 otherwise. The term “interaction” is used further to describe the fact that user has rated a given movie.

Data is split into train and test, 80% and 20%.

Assessment of model performance is done by hiding one interaction from each users’ set of interactions, outputting k top recommendations, and evaluating an average hit rate at k (HR@k), HR = TP / P (in our setting P is always 1 since there is just 1 hidden interaction per user, and TP may be 1 or 0 depending on whether the hidden interaction is member of the recommended set of k items or not.

The chosen performance metric changed depending on the item that was hidden, so assessment of performance was performed 10 times for each model, and average HR@k reported.

Several helper functions (available here) are used to format data, assess performance, and make predictions:

  • makeRecommendlabTrainSets() formats data as user-item binary matrix, splits data into train and test sets;
  • maskInteraction() is used to drop one of user interactions to be able to reconstruct it later with a model, useful to assess model performance;
  • assessGuess() compares the actual dropped interaction with the list of interactions proposed by a model — IBCF, NN and a simple prediction with k most frequently occurring movies;
  • netPredictOne() is a wrapper function to make one prediction with the NN model, a bit redundant for this article, used as-is from my other project;

For the classic algorithm the item-item collaborative filtration algorithm from the R package recommenderlab is used with all default settings and for the NN model — a simple autoencoder with one hidden layer of size 1/3 of the input size, trained with keras. In order to compare both models to the simplest alternative, a most frequent interactions model, MFM, was also assessed. All models predicted top k=10 interactions.

The complete code for reproducible results is available here.

NN model was the most accurate one, yielding HR@10 of 0.22, twice as much as the IBCF model (0.11), and almost 3 times the most frequent interaction model (0.08).

IBCF model recommendation accuracy corresponds to accuracy obtained on the same dataset by others. MF model had the least accuracy, as expected.

In our small experiment of comparing NN and IBCF models for implicit recommendations, NN performed surprisingly well, much better than the classic IBCF model. However, no data cleansing took place at all, and no various distance measure tried for IBCF model. Also, no hyper parameter tuning took place except for size of the hidden layer or 1/3 the inputs size, which worked best in previous experience.

When training the NN model, the testing and validation sets were equal as the data set is quite small, but not using the test set during the training and stopping when train error did not improve yielded same results.

Usually I am quite sceptical about using neural networks for learning on anything else than machine-generated data (image/video/sound), but given how easy it is to feed auxiliary input data to neural networks and possibly further improve recommendation quality, it looks that neural networks have all chances to be even better recommenders.

--

--