5 mins Recommender systems: Neural Collaborative Filtering

7 min readJun 2, 2024

Introduction:

Recommender systems are essential in our digital world, guiding users to relevant content based on their preferences. Traditional methods like matrix factorization have been crucial, but struggle with the growing complexity of user behavior. Enter Neural Collaborative Filtering (NCF), a new model-based technique that harnesses deep neural networks to improve recommendation systems by enabling the modeling of intricate user-item interactions with unparalleled accuracy.

NFCs rely on two main neural network blocks, namely Embeddings and the Multi-Layer-Perceptron. In order to fully grasp the workflow of NFCs we first need to define these important components. Feel free to ignore the next two sections if you’re familiar with the concepts.

Multi-Layer Perceptron (MLP)

MLPs and deep neural networks are a broad concept that needs a dedicated article post, we’ll try to give a brief intuitive introduction to them in this section.

Let’s take a small break from recommendations and movies, imagine you’re trying to teach a computer to recognize handwritten digits. You start by showing it lots of examples, each labeled with the correct digit. Now, how does the computer learn to recognize these digits? That’s where MLPs come in.

Think of an MLP as a series of layers, like stacking transparent sheets. Each layer contains neurons, which are like little decision-makers. The first layer takes in the raw data — pixel values in our example — and passes it to the next layer. Each neuron in the next layer combines these inputs in different ways, learning to recognize patterns. This process continues through multiple layers, with each layer learning more complex patterns.

Finally, the last layer gives us the output — in our case, the recognized digit. But here’s the clever part: during training, the MLP adjusts its internal parameters — weights and biases — so that it gets better at recognizing the correct digits. It does this by comparing its predictions to the actual labels and tweaking those parameters to minimize the difference.

So, in essence, MLPs are like digital artists, learning to see and recognize patterns in data by adjusting their internal workings based on feedback.

For a more detailed explanation, you can check out this series of videos.

Embeddings

Embeddings, refer to low-dimensional, dense vector representations of categorical variables or entities such as words, users, or items. These representations are learned in same way a neuron’s weights and bias inside an MLP, allowing the model to capture semantic relationships and similarities between different entities. Embeddings condense high-dimensional and sparse data into a continuous vector space, facilitating better generalization, improved model performance, and more efficient computation. They are widely used in natural language processing, recommendation systems, and various other machine learning tasks to encode meaningful information about entities in a compact form.

How NCF Works

At the heart of NCF lies the multi-layer perceptron (MLP). Unlike traditional methods that rely on handcrafted features and linear models, the NCF learns directly from the raw interaction data, allowing it to capture nonlinear relationships between users and items.

The framework begins with an input layer representing users and items, followed by an embedding layer that projects these inputs into a dense vector space. These embeddings serve as the latent features of users and items, capturing their underlying characteristics. The neural collaborative filtering layers then use a series of hidden layers to learn the intricate interactions between users and items, culminating in the prediction of user-item interaction scores.

Here are the simple steps of this method:

Data Representation: Represent user-item interactions in a matrix format.
Initialization: Initialize user and item embeddings to encode user and item IDs using Embeddings
Training: Train the neural network model on the interaction data, adjusting parameters iteratively.
Validation: Validate the model’s performance on a separate validation set to monitor for overfitting and ensure generalization.
Prediction: Use the trained model to make predictions on new user-item pairs to generate recommendations.

Here’s a graph to illustrate these steps:

As you may have noticed, Matrix Factorization can be easily generalized and extended through the NFC framework by replacing the MLP part with a simple dot-product block that takes as input the user and item embeddings and outputs their element wise product.

Python implementation

import numpy as np
import tensorflow as tf
from sklearn.model_selection import train_test_split
from tensorflow.keras.layers import Input, Embedding, Flatten, Concatenate, Dense
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from sklearn.metrics import mean_absolute_error

class NeuralCF:
    def __init__(self, num_users, num_items, embedding_dim=10, hidden_layers=[64, 32], activation='relu', learning_rate=0.001):
        self.num_users = num_users
        self.num_items = num_items
        self.embedding_dim = embedding_dim
        self.hidden_layers = hidden_layers
        self.activation = activation
        self.learning_rate = learning_rate
    def _build_model(self):
        user_input = Input(shape=(1,))
        item_input = Input(shape=(1,))
        user_embedding = Embedding(self.num_users, self.embedding_dim)(user_input)
        user_embedding = Flatten()(user_embedding)
        item_embedding = Embedding(self.num_items, self.embedding_dim)(item_input)
        item_embedding = Flatten()(item_embedding)
        vector = Concatenate()([user_embedding, item_embedding])
        for units in self.hidden_layers:
            vector = Dense(units, activation=self.activation)(vector)
        output = Dense(1, activation='sigmoid')(vector)
        model = Model(inputs=[user_input, item_input], outputs=output)
        return model
    def train(self, X_train, y_train, epochs=10, batch_size=10, validation_split=0.1):
        X_train = [X_train[:, 0], X_train[:, 1]]
        y_train = np.array(y_train)
        model = self._build_model()
        model.compile(optimizer=Adam(learning_rate=self.learning_rate), loss='mean_squared_error')
        model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, validation_split=validation_split)
        self.model = model
    def predict(self, X_test):
        X_test = [X_test[:, 0], X_test[:, 1]]
        return self.model.predict(X_test)
# Hyperparameters that could be tuned
embedding_dim = 10
hidden_layers = [64, 32]
activation = 'relu'
learning_rate = 0.001
# we choose a subset of the netflix rating dataset found here : https://www.kaggle.com/datasets/rishitjavia/netflix-movie-rating-dataset?select=Netflix_Dataset_Rating.csv
df = pd.read_csv("/content/Netflix_Dataset_Rating.csv").iloc[:1000,:]
X = df[['User_ID','Movie_ID']].to_numpy()
y = df['Rating'].to_numpy()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
num_users = df['User_ID'].max() + 1
num_items = df['Movie_ID'].max() + 1
# Train and predict using NeuralCF
ncf = NeuralCF(num_users, num_items, embedding_dim, hidden_layers, activation, learning_rate)
ncf.train(X_train, y_train)
y_pred = ncf.predict(X_test)
# Evaluate using Mean Squared Error
mse = mean_absolute_error(y_test, y_pred)
print("Mean Absolute Error:", mse)

As a reminder, we’re planning to add a dedicated article post where we compare how all the RS methods we have introduced by testing each one against a large dataset to measure several training and serving metrics (MAP@K, time to train, time to infer, ease of integrating new users/items, ease of parallelization. etc)

Advantages of NCF

This family of models brings new important recommender system properties that are the following:

Non-Linearity: Unlike traditional collaborative filtering methods like matrix factorization, NCFs employ neural networks, allowing it to capture non-linear relationships between users and items. This enables the NCF to model complex user-item interactions more accurately, leading to better recommendation performance.
Flexibility: NCFs offer flexibility in modeling various types of data and interactions. The architecture of NCF can be customized with multiple layers and different activation functions, providing flexibility in capturing diverse patterns in user behavior and preferences.
Scalability: Neural networks are highly scalable and can handle large-scale datasets efficiently. NCFs can process massive amounts of user-item interactions and learn from implicit feedback, making it suitable for real-world recommendation systems deployed in large-scale online platforms. Moreover, advancements in hardware and distributed computing further enhance the scalability of NCF models, allowing them to serve millions of users and items with low latency

Issues with NCF

Now, let’s discuss the three biggest shortcomings of the Neural Collaborative Filtering method:

Cold Start Problem: Like many recommendation systems, NCF struggles with the cold start problem, where it’s challenging to provide accurate recommendations for new users or items with limited interaction data. Since NCF relies on historical interactions between users and items to make predictions, it may not perform well when there’s insufficient data for new entities.
Data Sparsity: In real-world scenarios, user-item interaction data is often sparse, meaning that most users have only interacted with a small subset of items. This data sparsity can lead to challenges in learning accurate user and item embeddings, potentially resulting in suboptimal recommendations.
Interpretability and Transparency: NCF models are often seen as “black boxes,” meaning their internal workings are not easily interpretable. This lack of transparency can be problematic for understanding why certain recommendations are made, which is crucial for trust and user satisfaction. This issue limits the ability to provide explanations for recommendations, which is increasingly important for user trust and regulatory compliance (e.g., GDPR). It also makes it harder for developers to diagnose and improve model performance effectively.

These shortcomings highlight the need for ongoing research and development to address the challenges associated with collaborative filtering methods like NCF.

Conclusion

As the digital landscape continues to evolve, the demand for personalized recommendation systems will only grow. In this context, NCF emerges as a groundbreaking approach, reshaping the way we perceive collaborative filtering and unlocking new possibilities for delivering tailored recommendations to users worldwide. With its ability to learn from raw data and capture complex user-item interactions, NCFs are a strong performer in the field of recommendation systems.