Recommender Systems Using RBM


In this post, I will try to explain how to use RBM to build a recommender system, one of the most successful applications of machine learning and widely used by web retailers to suggest products to their customers. Netflix recommends TV serials and movies based on what you have watched and what other Netflix users with the same interest have watched. Amazon also recommends a product item to a user based on what other customers who purchased that item in which a user might be interested. You can read about RBM in my previous post here and one of its application here.

Recommender Systems

There are three major categories of recommender systems

  • Collaborative Filtering Recommender Systems
  • Content-Based Filtering Recommender Systems
  • Latent Factor Based Filtering Recommender Systems

Collaborative Filtering tries to identify similarity among users base on their past behavior, and then recommend items to the user which are liked, bought, or rated highly by similar users.

This recommender system can predict items to the user might have an interest, even though the user has never expressed explicit interest. This is generally called user-user collaborative filtering.

The opposite of user-user collaborative filtering is to find items similar to a given item and recommend items to users who have also liked, bought, or rated other similar items highly. This goes by the name item-item collaborative filtering.

Content-Based Filtering relies on hand-coding features for the items based on their content. How the users have rated existing items, a user profile is created and the ranks provided by the users are given to those items.

Content-based filtering involves learning the distinct properties of an item to recommend additional items with similar properties. In content-based filtering, the user is recommended items based on their preferences. This doesn’t involve how other users have rated the items.

Latent Factor Based Filtering recommendation methods attempt to discover latent features to represent user and item profiles by decomposing the ratings. Unlike the content-based filtering features, these latent features are not interpretable and can represent complicated features. For instance, in a movie recommendation system, one of the latent features might represent a linear combination of humor, suspense, and romance in a specific proportion.

Generally, for already rated items, the rating rij given by a user i to an item j can be represented as

where ui is the user profile vector based on the latent factors and vi is the item vector based on the same latent factors

Above diagram is a latent-factor based recommendation method, where the rating matrix Rm x n has been decomposed into the product of user profile matrix Um x k and the transpose of the item profile matrix Pn x k where k is the number of the latent factors of the model.

Based on these profiles, we can recommend items that have so far not been bought by the user by computing the inner product of the user profile and the item profile. The inner product gives a tentative rating that the user might have given had they bought the product

One of the ways these user and item profiles can be created is by performing singular value decomposition (SVD) on the rating matrix after filling in the missing values by some form of mean values across the users and items as appropriate. According to SVD, the rating matrix R can be decomposed as follows:

We can take the user profile matrix as US1/2 and then transpose of the item profile matrix as S1/2 VT to form the latent factor model. You might have a question as to how to perform SVD when there is missing entries in the rating matrix corresponding to the items that are not rated by the users. Common approaches are to impute the missing ratings by the average rating of the user, or by the global rating average, before performing SVD.

MovieLens Dataset

I will use a smaller movie rating dataset known as the MovieLens 20M Dataset, provided by GroupLens, a research lab in the Department of Computer Science and Engineering at the University of Minnesota, Twin Cities. The data contains 20,000,263 ratings across 27,278 movies created by 138,493 users from January 9, 1995, to March 31, 2015.

Collaborative Filtering Using RBM

Let’s turn back to RBMs again. Recall that RBMs have two layers, input layer which is also known as visible layer and the hidden layer. The neurons in each layer communicate with neurons in the other layer but not with neurons in the same layer. there is no intralayer communication among the neurons.

  • In RBM, the neurons from the visible layer communicate to the neurons from the hidden layer, and then the hidden layer passes back information to the visible layer. RBMs perform this communication the passes back and forth several times between the visible and hidden layer to develop a generative model such that the reconstructions from the outputs of the hidden layer are similar to the original inputs.
  • In other words, the RBM are trying to create a generative model that will help predict whether a user will like a movie that the user has never seen based on how similar the movie is to other movies the user has rated and based on how similar the user is to the other users that have rated that movie.
  • The visible layer will have X neurons, where X is the number of movies in the dataset. Each neuron will have a normalized rating value from zero to one, where zero means the user has not seen the movie. The closer the normalized rating value is to one, the more the user likes the movie represented by the neuron.
  • The neurons in the visible layer will communicate with the neurons in the hidden layer, which will try to learn the underlying, latent features that characterize the user-movie preferences.

RBM Net Architecture

  • For our movie-recommender system, we will have a m x n matrix with m users and n movies. We pass a batch of k users with their n movie ratings into the RBM neural network and train for a certain number of epochs.
  • Each input x that is passed into the neural network represents a single user’s rating for all n movies. Therefore, the visible layer has n nodes, one for each movie.
  • We can specify the number of nodes in the hidden layer, which will generally be fewer than the nodes in the visible layer to force the hidden layer to learn the most salient aspects of the original input as efficiently as possible.
  • Each input v0 is multiplied by its respective weight W. The weights are learned by the connections from the visible layer to the hidden layer. Then we add a bias vector at the hidden layer called hb. The bias ensures that at least some of the neurons fire. This W*v0+hb result is passed through an activation function.
  • After this, we will take a sample of the outputs generated via a process known as Gibbs sampling. In other words, the activation of the hidden layer results in final outputs that are generated stochastically. This level of randomness helps build a better-performing and more robust generative model.
  • Next, the output after Gibbs sampling known as h0 is passed back to the visible layer in the opposite direction also known as backward pass. In the backward pass, the activations in the forward pass after Gibbs sampling are fed into the hidden layer and multiplied by the same weights W as before. We then add a new bias vector at the visible layer called vb.
  • This W_h0+vb is passed through an activation function, and then we perform Gibbs sampling. The output of this is v1, which is then passed as the new input into the visible layer and through the neural network as another forward pass.
  • The RBM goes several forward and backward passes like this to learn the optimal weights as it attempts to build a robust generative model.
  • By iteratively the weights of the neural net are adjusted in such a way that the RBM can find the relationships among input features and then determines which features are relevant, the RBM learns to approximate the original data as best as possible.
  • With this learned probability distribution, RBMs are able to make predictions about never-before-seen data. The RBM will attempt to predict ratings for movies that the user has never seen based on the user’s similarity to other users and the ratings those movies have received by the other users.

Accompanied jupyter notebook for this post can be found here.


Restricted Boltzmann machines can be used to build a recommender system for items ratings. The RBM recommender system can learn the probability distribution of ratings of items for users given their previous ratings and the ratings of users to which they were most similar to. Then RBM recommender system used the learned probability distribution to predict ratings on never-before-seen items.

I hope this article helped you to get a basic understanding Of how Restricted Boltzmann Machine (RBM) can be used as a items recommender system.