Improving product recommendation systems using sentiment analysis

Aakash Goel
Data Science at Microsoft
8 min readJul 5, 2022


When customers are looking to make an online purchase, an important factor involved in their search process is how quickly they are able to locate an item they want to buy. The use of recommendation systems is an important way to help link customers with what they are looking for.

Recommendation systems learn from customers as they interact online and suggest products that customers are likely to find valuable among all the available alternatives.

Problem statement

It is commonplace for electronic commerce platforms to ask their users to rate and review the products they offer. Users typically check these product reviews for positive responses or higher ratings from other purchasers, to help determine whether a product is considered good to buy, or for negative responses or lower ratings to help determine whether a product is not considered a good buy. As a result, both ratings (numbers) and reviews (text) can be used to generate better product recommendations instead of relying only on ratings. In this way, a defining characteristic of better product recommendations is that whatever products are suggested by the recommendation system should have positive sentiment as well.

How to approach the problem?

When both reviews and ratings are available, there are three steps involved in recommending products to users.

  • Step 1: Build the product recommendation system using a User-User–based approach.
  • Step 2: Build sentiment analysis using logistic regression.
  • Step 3: Connect the dots — improve product recommendations using sentiment analysis.
Figure 1: Steps to follow in approaching the product recommendation problem.

Recommendation systems filter data by using different algorithms and recommending the most relevant items to a wide variety of users. This is accomplished by taking into consideration past user behavior on a platform as a basis for products to recommend to the user as these products are estimated to be those most likely for the user to buy. In this way, recommendation systems not only present an opportunity to generate more revenue, but also to personalize the user journey.

There are many algorithms for implementing recommendation systems, including ones that are content based, collaborative, a hybrid of content and collaborative, ones that use matrix factorization, and more. As the objective of this article is to focus on one type of recommendation algorithm and see how it can be improved using sentiment analysis, I will use the User-User–based recommendation algorithm for product recommendations. The data supplied to the algorithm is user-product rating data and the programming language is Python.

Step 1: User-Userbased recommendation system

Persons who have shared the same interests in past — or in our case, who have liked the same products — are likely to have similar interests in the future. In this way, similar users likely to have similar tastes. Say there are two users, Ram and Shyam; Ram likes the set of products {P1, P2, P3} and Shyam likes the set of products {P1, P2, P4}. We can see already that there is good similarity between Ram and Shyam as they favor products in common. But this also means that we can recommend product P4 to Ram as its liked already by Shyam.

Figure 2: Steps in generating recommendations using a User-User–based approach.

Consider the following code:

Figure 3: Sample data (Product Rating).

Now, let’s prepare the data that will be used for modeling and evaluation. This includes the following steps:

  • Dividing the data into training and testing sets (see section 3.1 in the code).
  • Applying a pivot operation on the training data set such that the column header contains prod_name and the row header contains userId, with the values being the rating (see section 3.2 in the code).
  • A dummy training set will be used to predict products that have not been rated by the user, and so it excludes products already rated by the user. The dummy test set will be used for evaluation, making predictions only on products rated by the user (see section 3.3 in the code).
Figure 4: Sample user X product matrix.

We need a mechanism to define similarities from user to user. Cosine similarity is one such measure, and adjusted cosine similarity is an even better version. The issue is that different users rate products differently, with some users giving a product a high rating while others give it a low rating. To account for this, we subtract average ratings for each user from each user’s rating for different products.

Figure 5: Adjusted cosine similarity (User-User–based approach).

Now, to predict ratings that a customer is going to provide to each product –p that the user has not rated, we calculate the weighted average of the ratings given to p by peer users.

Figure 6: Predict ratings (User-User–based approach).
Figure 7: Output of code above (recommendation without using sentiment).

Now, let’s evaluate using the RMSE (root mean squared error) to gauge the recommendations being made. We will evaluate for products already rated by the user instead of making predictions for products not rated by the user.

Now, let’s save the model so that it can be used later on along with the sentiment model.

Step 2: Sentiment analysis

Sentiment analysis is a natural language processing (NLP) technique that is used to determine whether data is positive, negative, or neutral. Based on the product reviews contained in our data, we can build a Machine Learning model that gives the corresponding sentiments for each of the products contained in the data.

To build a model that can predict sentiment, we need to perform the following steps:

1. Import libraries and load data: We must have labeled data as we are solving a supervised text classification problem, i.e., given a review of a product, predict sentiment. 1 refers to positive sentiment and 0 refers to negative sentiment, meaning there are only two classes of data (positive and negative sentiment).

Figure 8: Sample labeled data for sentiment analysis.

2. Data pre-processing: This involves cleaning and standardizing text, making it noise-free and ready for analysis. This includes steps like lowercasing, removing punctuation and stopwords, lemmatization, and noise removal (such as whitespace or digits), which have already been performed on this data. Note: The code for this data pre-processing step is not included in this article.

3. Data preparation: This involves dividing the data into a training set (70 percent of the data) and test set (30 percent of the data). Because our data is imbalanced — with 88 percent of it containing positive review sentiment and only 12 percent containing negative review sentiment — we need to balance it. We have used an oversampling technique on the training data to balance it, with random data points selected from the minority class with replacement (i.e., the same data point can be selected multiple times), and then adding them to the training dataset.

Figure 9: Distribution of positive and negative reviews (making the training data balanced).

4. Feature Engineering: As we know that machines can’t work with anything other than numbers, we must convert text and reviews into numbers. We have used TfidfVectorizer, which converts a collection of raw documents and reviews into a matrix of TF-IDF features. TF-IDF assigns a weight (as a number) to each word in a document, thereby quantifying the importance or relevance of words in a document. TF is term frequency (the number of times a term appears in a document) and IDF is inverse document frequency (the number of documents in which a term appears); it penalizes the importance of a term if the term is common across many documents. We save the TfidfVectorizer into pkl file so that it can later be used in inference.

5. Building the ML model: We use logistic regression and feed the output of feature engineering document X features into the model. As our data contains two classes — i.e., only positive or negative sentiment — logistic regression estimates the probability of a +ve/–ve sentiment. In logistic regression, a logit transformation is applied to the odds — that is, the probability of success divided by the probability of failure. After training, the model is evaluated using an F-1 score, specifically by calculating it on both the training and test data. Finally, we save the model to a pkl file so that we can later use it in inference.

Figure 10: Evaluation of sentiment model on train and test data.

Obviously, this is not a demonstration of the best sentiment classification model. For even better options, hyperparameter tuning and some advanced models can also be used such as Linear SVM, RandomForest, XGBoost, Deep Learning–based models, and more.

Step 3: Connecting the dots

In this step, we see how the recommendations from Step 1 can be improved using sentiment analysis from Step 2 on the reviews given by users to the recommended products. Basically, the sentiment analysis model helps us to fine-tune the recommendations that we get from the recommender system.

1. We start by loading product review data, which contains product reviews and both trained models (recommendation and sentiment classification) from Step 1 and Step 2, respectively.

Figure 11: Sample product review data.

2. Now we generate a product’s ranking score with a formula (W1*predicted rating of recommended product + W2*normalized sentiment score on scale of 1–5 of recommended product) and use it to rank and sort product recommendations or filter them out depending on the number of recommendations we want to show. In this way, the higher the product’s ranking score, the better the product’s rating and review. A scale of 1–5 is used for the sentiment score as ratings also use the same scale — and usually, users give more weight to reviews than to ratings. So, we have assigned w1=1 and w2=2 (i.e., double weighting is given to reviews).

Typical results are as follows:

Figure 12: Output of improved product recommendation using sentiment analysis.


We have seen that after using sentiment analysis on recommended products from a recommendation system, the order of recommendations changes as both ratings and reviews contribute to the final recommendation, and only recommended products carry a positive sentiment score.

The code from this article is contained in the following Jupyter Notebook:

Please give a clap to this this article if it has helped you. I also welcome your feedback in the Comments section below.

Aakash Goel is on LinkedIn.