Machine Learning-Powered, Pairwise Ranking of Reviews at 1mg (Part One)

Shaurya Uppal
Tata 1mg Technology
8 min readNov 26, 2019

ABSTRACT:
In the days when the internet did not exist, the early millennials or the generation before that mostly made their purchases depending upon the word of mouth they received from their friends and family. People had to make do with the sub-par quality of the products because there was no way of verifying if those reviews were personal or if it was something these people had “just heard somewhere”. With the wildfire that internet has become, shopping is mostly done online. But how do you know if the online listed products are worth your money and time? That’s where product reviews come into play. A sacred place where you know whether the reviews are provided by the “verified purchases” or they are just given by a random website user. Let’s dive into the machine learning-powered world of the reviews and see what work goes on behind the screen of your laptop or cell phones!

E-Commerce applications provide an added advantage to customers to buy a product with added suggestions in the form of reviews. Obviously, reviews are useful and impactful for customers who are going to buy the products. But these enormous amounts of reviews also create problems for customers as they are not able to segregate useful ones. Regardless, these immense proportions of reviews make an issue for customers as it becomes very difficult to filter informative reviews. This proportional issue has been attempted in this blog. The approach that we discuss in detail later ranks reviews based on their relevance with the product and rank down irrelevant reviews.

This work has been done in four phases- data preprocessing/filtering, feature extraction, pairwise review ranking, and classification.

The outcome will be a list of reviews for a particular product ranking on the basis of relevance using a pairwise ranking approach.

I. INTRODUCTION:

Ranking reviews and showing relevant information to users is a complex problem. The significance of reviews on a product page is to help customers in buying decision but then the count of reviews creates a problem and leave a customer in a conflicting condition.

To solve this problem many E-commerce sites came up with different ways to filter reviews to improve customer satisfaction. Some popular ways to filter out reviews to satisfy customers are as follows
- Amazon Reviews filter according to ‘Top Reviews’ based on review helpfulness score and user review profile rank and ‘Most Recent’ reviews but on review post time
- Flipkart has four ways to showcase reviews- Most Helpful, Most Recent, Positive First, and Negative First

These methods somewhat solve the issue but are on the basis of reviewers sentiment score (upvotes/downvotes) or statistical inferences (time, rating input, etc.). Due to necessity in the market, a snappish approach of ranking reviews based on relevance was required. Only Google has adopted this currently in its two popular products Google Play Store and Google Maps.

Google Play Store (1mg app)

Reviews/implicit feedback given by customers showcase their preference towards items. Out of all existing solutions defined above, the pairwise ranking technique has shown state-of-art solutions. Pairwise ranking helps to compare features of every review with all the reviews individually in the dataset.
Various recent researchers explored the pairwise ranking method for a number of applications. In 2018, Yu et. al. explored the multiple pairwise ranking algorithms to present the difference among multiple pairs of items.

At 1mg we follow Learning To Rank technique of pairwise ranking to rank reviews by relevance.

The Blog is structured in the following manner- Section II discusses the workflow of the overall process to design a well-defined and high-performance methodology. Data Preprocessing and Filtering of Bad Reviews is Discussed in section III. Feature extraction in detail is discussed in section IV. Section V is about the methodology used for Pairwise Ranking followed by section VI. Section VI discussed the Classification model which has been used to validate the pairwise ranking outcome. Section VII provides a detailed outcome of the performed experiment and finally concluding remarks.

II. Overall WorkFlow

Overall Workflow

III. Data Preprocessing

We are currently serving in 1000+ cites across India. Our users are from all sections of society who understand/write different languages and vary in literacy rate. Currently, out of 50k+ reviews (entire review set collected), we have about 4% to 8% reviews that are of languages other than English or Hinglish.

Stage 1: Filtering of Reviews by Language, since the number is currently very small we decided to filter reviews other than English or Hinglish.

Stage 2: About 7% of reviews were gibberish. To filter out Gibberish reviews we used the logic of Markov Chain. In English, you expect that after a ‘q’, you’ll get a ‘u’. If you get a ‘q’ followed by something other than a ‘u’, this will happen with very low probability, and hence it should be pretty alarming. Normalize the counts in your tables so that you have a probability. Then for a query, walk through the matrix and compute the product of the transitions you take. Then normalize by the length of the query. When the number is low, you likely have a gibberish query (or something in a different language).

Stage 3: Filtering out profanity reviews. We filter out both English and Hinglish profanity reviews.

Stage 4: Spelling Correction for words for which confidence of correction is >90%. Methodology: Peter Norvig, this will help to improve the quality of reviews for eg. a word withut will be changed to without.

Filter Stages

IV. Feature Extraction

Features extraction to cover every necessary property/viewpoints and to measure features in a quantitative manner is a much-needed task in order to achieve highly accurate outcomes. Hence, this section discusses all the features extracted from reviews.

  1. Noun Strength (Rn): Nouns are subjects and considered as the most informative part of a language. The amount of subjects shows the importance of review because only a noun describes the prime factors of review (which tells us what the review is about). We did POS Tagging to find nouns in a review and computed score as:
    Score(Rn) = TFIDF(noun) / TFIDF(all words)
  2. Review Polarity (Rp): Its value lies between -1 to +1 which tells whether a review has sentiment or negative sentiment.
Sample Table for (Review, Polarity Score)
Histogram of Reviews Polarity

3. Review Subjectivity (Rs): The subjectivity is a measure of the sentiment being objective to subjective and goes from 0 to 1. Objective expressions are facts while Subjective expressions are opinions that describe a person’s feelings. Consider the following expression:
Bournvita tastes very good with milk: Subjective
Bournvita is brown in color: Objective

Sample Table for (Review, Subjectivity Score)
Histogram of Review Subjectivity

4. Review Complexity (Rc): To evaluate how good and complex a review is, in terms of unique words within a review and across entire review corpus of a particular product.

Rc = Number of unique words in a Review / Number of unique words in entire Corpus

5. Review Word Length (Rw): Word count of a Review

6. Service Tagger (Rd): The best review is one that talks more about how is the product, how it tastes, what are its uses, and the one which talks about the effectiveness of a product. Reviews are basically to describe a product. So, a dictionary of words is created which would mark reviews as service-based,
delivery reviews, and customer support.
Fuzzy matching of every word in a review is done with the words in the dictionary with Levenshtein distance. Levenshtein distance helps in measuring the difference between two sequences and tackle spell errors in review, for example, instead of “My delivery was on time”, Reviews is wrongly written as “My dilivery was on time”. In this case, Fuzzy matching would help us to match both the reviews.

7. Compound Score (Rsc): To improve the efficiency of the system. We compute the compound score using VaderSentimentAnalyser. This library is taken from VADER (Valence Aware Dictionary and sEntiment Reasoner). This is a lexicon and rule-based sentiment analysis tool that is specifically tuned to determine sentiments expressed in social media content. It has the ability to find the sentiment of Slang (e.g. SUX!), Emoji (😩, 😂), Emoticons ( :), :D ) and the difference between capitalized word expressions(I am SAD, I am sad are different expressions).
Rsc ≥ 0.5 (Positive Sentiment)
-0.5<Rsc<+0.5 (Neural Sentiment)
Rsc≤ -0.5 (Negative Sentiment)

Miscellaneous: We purposely did not include Reviews Rating as a feature. Inclusion of Ratings totally blunders the entire system because of two reasons:
1. Common confusion between Rating and Reviews. For example, someone who rates the product ‘1’ (On a rating scale of 1–5, ‘1’ being the ‘lowest’ and ‘5’ being the ‘highest’) writes the review comment as ‘very good and useful medicine’.
2. A large portion of Reviews from customers are either 5 stars or 1 star.

We are halfway through now in our workflow of the review ranking process.

We now have a filtered/NSpreprocessed right set of reviews (50k+ in count) and its features that would help us to rank review on the basis of relevance.

Next, we shall discuss Section V, VI and VII which would elucidate how we used Pairwise Ranking with these Extracted Features, Transfiguring problem to a Classification Problem and Results, Conclusion & Future Scope.

I hope you enjoyed this blog and learned something! I will be back with Part Two soon. [TO BE CONTINUED…]

Please feel free to share your views and feedback in the comments section below.

References:

  • Sculley, D. (2009). Large scale learning to rank.
  • Seki, Y. (2002). Sentence Extraction by tf/idf and position weighting from Newspaper Articles.
  • Yu, R., Zhang, Y., Ye, Y., Wu, L., Wang, C., Liu, Q., & Chen, E. (2018,
    October). Multiple Pairwise Ranking with Implicit Feedback. In
    Proceedings of the 27th ACM International Conference on Information
    and Knowledge Management (pp. 1727–1730). ACM.
  • Liu, T. Y. (2009). Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval, 3(3), 225–331.

--

--