Recommendation engine algorithm— Collaborative filtering

Dhanoop Karunakaran
Intro to Artificial Intelligence
6 min readAug 14, 2023
Collaborative filtering. Source: [1]

Today, we can see recommendation engines that enhance the user experience on websites such as Amazon, Netflix, etc. It provides the recommendation for the user on what should you watch, listen to, or buy based on the user history and content. This is one of the application of ML that we use day to day basis. In this article, we consider building a recommendation engine for movies as an example.

The simple linear regression approach

We approach the recommendation engine as a linear regression problem. We can formalize the problem in a linear regression model, y = wx+b. In the case of a movie recommendation problem, we can modify the model as below:

Model for Linear regression-based recommendation engine

To learn the w and b parameters, we need to define the cost function similar to the regular linear regression model.

The cost function to compute w & b

Now we can use gradient descent to optimise the model. After the model is learned, we have learned parameters, w & b for each user. For instance, w1 & b1 are parameters for user 1. Now, we can use it to predict the user’s likely rating of movies. Then, we can feed the highly rated movies to the users as recommendations.

Collaborative filtering

We can use linear regression when we have features, but what if there are no features of items(eg. movie features) available? In that case, we can use the collaborative algorithm to learn these features to provide recommendations. The collaborative algorithm is based on the same concept as the simple linear regression approach. The difference here is, now, we need to learn three parameters (w, x, b) instead of two parameters (w and b) as the movie's features (x) are not available. As the x is not provided, we can say that we predicting the recommendation of the particular user based on reactions by similar users.

Now we can define the cost function to learn parameters, w, b, & x as shown below. Before defining that, imagine, we have already learned w & b and the parameter x to be learned. In this case, we can determine the cost function with regularisation terms as below:

The cost function to learn feature vector x when w & b given

Similarly, we can write the cost function to learn parameters w & b given feature vector x. This equation is the same as in the linear regression approach we discussed earlier. In this case, we are adding regularisation terms to make the equation complete.

The cost function to learn w & b when the feature vector, x is given. It is the same as the equation explained in the simple linear regression approach.

Looking at both equations, we can understand that everything is the same except the regularisation term. Now we can combine both equations to build a cost function and learn three parameters, w, b, and x.

The combined cost function to learn w, b, and x parameters

Once the cost function is defined, we can employ gradient descent to optimise. There is a slight change in the gradient descent algorithm with collaborative filtering. As we are learning three parameters, we need to add x as the third parameter in the gradient descent algorithm. The pseudocode of gradient descent for collaborative filtering is shown below:

Gradient descent algorithm for learning three parameters w, b, and x in the collaborative algorithm.

Implementation based on Python

I have created a docker-based implementation for the algorithm and published it on GitHub. Please remember we tested our codebase on Ubuntu, so some of the code is more suitable for Ubuntu.

  1. Download the dataset using the Ubuntu terminal.
rm -rf ml-latest-small*
wget https://files.grouplens.org/datasets/movielens/ml-latest-small.zip
unzip ml-latest-small.zip

2. Load the dataset to pandas data frames

df_movies = pd.read_csv('ml-latest-small/movies.csv')
df_ratings = pd.read_csv('ml-latest-small/ratings.csv')
df_ratings = df_ratings.drop(columns=['timestamp'])

3. Modify data frames to create the variables

# Create a matrix with user id as rows, movieIds as columns, and ratings as values
ratings = df_ratings.pivot(index='movieId', columns='userId', values='rating')
ratings = ratings.fillna(0)

# Copying the the rating df for creating r(i,j) matrix
df_r_ratings = df_ratings.copy()

#To replace the value, we can use the code mentioned here: https://stackoverflow.com/questions/49161120/set-value-of-one-pandas-column-based-on-value-in-another-column
df_r_ratings.loc[df_r_ratings['rating'] > 0, 'rating'] = 1

# Create a r(i,j) matrix with user id as rows, movieIds as columns, and 0 or 1 as values
r_ratings = df_r_ratings.pivot(index='movieId', columns='userId', values='rating')

# Replacing the NaN value with zero, so when we apply our cost function, NaN contents are not used for computing the cost.
r_ij = r_ratings.fillna(0)

num_movies = 30
Y = tf.constant(ratings.copy().values[:num_movies,])
R = tf.constant(r_ij.copy().values[:num_movies,])
num_users = Y.shape[1]
num_features = 5

4. Setup the trainable parameters

# Set Initial Parameters (W, X), use tf.Variable to track these variables
tf.random.set_seed(1234) # for consistent results
W = tf.Variable(tf.random.normal((num_users, num_features),dtype=tf.float64), name='W')
X = tf.Variable(tf.random.normal((num_movies, num_features),dtype=tf.float64), name='X')
b = tf.Variable(tf.random.normal((1, num_users), dtype=tf.float64), name='b')

# Instantiate an optimizer.
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-1)

# Print the parameters.
print("Y", Y.shape, "R", R.shape)
print("X", X.shape)
print("W", W.shape)
print("b", b.shape)
print("num_features", num_features)
print("num_movies", num_movies)
print("num_users", num_users)
print(Y)
print(R)

5. Define the cost function

def cost_function(X, W, b, Y, R, lambda_):
"""
Returns the cost for the content-based filtering
Args:
X (ndarray (num_movies,num_features)): matrix of item features
W (ndarray (num_users,num_features)): matrix of user parameters
b (ndarray (1, num_users): vector of user parameters
Y (ndarray (num_movies,num_users): matrix of user ratings of movies
R (ndarray (num_movies,num_users): matrix, where R(i, j) = 1 if the i-th movies was rated by the j-th user
lambda_ (float): regularization parameter
Returns:
J (float): Cost
"""
j = (tf.linalg.matmul(X, tf.transpose(W)) + b - Y)*R
J = 0.5 * tf.reduce_sum(j**2) + (lambda_/2) * (tf.reduce_sum(X**2) + tf.reduce_sum(W**2))
return J

6. Create Tensorflow’s custom training loop to build the collaborative filtering algorithm.

iterations = 200
lambda_ = 1
for iter in range(iterations):

# Use TensorFlow’s GradientTape
# to record the operations used to compute the cost
with tf.GradientTape() as tape:

# Compute the cost (forward pass included in cost)
cost_value = cost_function(X, W, b, Y, R, lambda_)

# Use the gradient tape to automatically retrieve
# the gradients of the trainable variables with respect to the loss
grads = tape.gradient( cost_value, [X,W,b] )

# Run one step of gradient descent by updating
# the value of the variables to minimize the loss.
optimizer.apply_gradients( zip(grads, [X,W,b]) )

# Log periodically.
if iter % 20 == 0:
print(f"Training loss at iteration {iter}: {cost_value:0.1f}")

7. Make a prediction

# Make a prediction using trained weights and biases
p = np.matmul(X.numpy(), np.transpose(W.numpy())) + b.numpy()
print(p.shape)

# prediction of the first user in the list
my_predictions = p[:,0]
print(my_predictions)

The credit to the overall content goes to Andrew Ng’s Machine learning specialization course in Coursera. So, you may see some similarities if you had attended the course.

If you like my write-up, follow me on Github, Linkedin, and/or Medium profile.

Reference

  1. https://www.analyticsvidhya.com/blog/2022/02/introduction-to-collaborative-filtering/

--

--