Machine Learning-Powered, Pairwise Ranking of Reviews at 1mg (Part Two)

Shaurya Uppal
Tata 1mg Technology
5 min readDec 24, 2019
Review us on 1mg.com

PART ONE Link: https://medium.com/1mgofficial/machine-learning-powered-pairwise-ranking-of-reviews-at-1mg-part-one-b06f03474e26

A short recap of our Overall WorkFlow.

Overall Workflow

In our previous blog post, we discussed how we filtered/ preprocessed and extracted relevant features from our review data which we later used for our review ranking system.

Picking up from where we left off, here’s a quick look at what we will be discussing in this blog:
Section V elucidates the methodology used for Pairwise Ranking. Section VI discusses the Classification model which has been used to validate the pairwise ranking outcome.
Section VII provides a detailed outcome of the performed experiment and finally, the conclusion reached.

V. Pairwise Ranking Model Training

Ranking is a canonical problem for humans. It is easy to classify whether a review is useful (informative) or not. However, ranking reviews on the basis of usefulness, is a complex task. Our ranking methodology is based on this simple education:

Pairwise ranking approach is applied to rank reviews in the semi-supervised learning method. The pairwise ranking approach looks at a pair of documents at a time in a loss function and predicts a relative ordering. The objective is not to determine the relevance score but to find which document is more relevant than others. This relevance is developed to judge the preference of one review over another.
In the semi-supervised learning method, mapping is constructed between input and output. This input-output pair in the training model is used to learn the system.

Review Segregation: We segregated two sets of reviews on which we train our model.
Set 0 represents reviews with label 0, i.e., ones that are not informative. These include reviews based on delivery, customer support, packaging, etc. These reviews do not describe the product.
Set 1 represents reviews with label 1, i.e., reviews that are informative and are better than all reviews of Set 0;

How we segregated and determined labels for reviews:

Our entire review ranking system is based on the idea that it is easier for humans to binary classify reviews which we call Set 0 and Set 1.
For around 10 different products Evion (Vitamin E), Neurobion Forte(Vitamin B Tablet), Shampoo, Accu-Chek (Blood Sugar Test Strips), Shelcal Tablet, Neem Face Wash, Omega3 Capsule, BP Monitor, Face Cream, and Amla Juice each having 70+ reviews, we asked 10 different people to label reviews as a 1 (informative review) and 0 ( not informative review). The participants were asked to label so that there is no bias and the model learns to its best.

Building the training set:
We pairwise compared each review of set1 with all reviews of set0 and vice-versa

  • (Rx, Ry,1) where x∈Set1 and y∈Set0 → Rx is better than Ry
  • (Ry, Rx, 0) where x∈Set1 and y∈Set0 → Ry is worst than Rx

This now becomes a classification problem.

Classification Problem

Using the Trained Model to Rank Reviews:

  • For a given product, we compared each review (Ri) with every other review and got a classification about whether the review Ri is better than other reviews or not.
  • Review Score is derived using Total #Wins / Total #Comparisons
  • Reviews are sorted by the Review Score calculated above.

VI. Classification Problem Experiments

We pairwise compared two reviews and predicted which document/review is better than the other in a pair (Review1, Review2) we experimented with different models.

Experiments

Classifier Accuracy

Classifier Accuracy

Accuracy of Ranking Methodology

  • After sorting the reviews by the review score, we wanted all reviews in Set 1 to be above all reviews of Set 0.
  • To test this hypothesis, we developed the following Ranking Metric
  • Let the number of 1s in our Dataset be x.
    Ranking Accuracy on Single Product = Number of 1s found in first x positions / x

VII. Results, Conclusion and Future Improvement

Once we ensured that for the 10 selected labeled products, the classification and ranking accuracy were good to go, we scaled this model to rank 60K+ reviews for 1800+ products [ Sample Result Screenshot for an unseen product ( Gulab Jal and Soap) attached ].

Our methodology helped us rank good/ informative reviews above irrelevant reviews.

Results for Ranked Reviews for Gulab Jal
Results for Ranked Reviews for Soap

Future Scope
In the near future, we aim to add the feature of ‘Upvote’/ ‘Downvote’ in our ranking system, as a means of collecting feedback from our users. This will enable us to make this system more powerful.

Another big challenge that we face is grammatical and spelling errors in reviews. For example, users write the word ‘awesome’ as ‘awsme’, ‘awsom’, and so on. To better rank such reviews we are yet to test out the addition of readability score in our model.

Is there anything you would like to add? Please feel free to share your views and feedback in the comments section below.

References:

  • Sculley, D. (2009). Large scale learning to rank.
  • Seki, Y. (2002). Sentence Extraction by tf/idf and position weighting from Newspaper Articles.
  • Yu, R., Zhang, Y., Ye, Y., Wu, L., Wang, C., Liu, Q., & Chen, E. (2018,
    October). Multiple Pairwise Ranking with Implicit Feedback. In
    Proceedings of the 27th ACM International Conference on Information
    and Knowledge Management (pp. 1727–1730). ACM.
  • Liu, T. Y. (2009). Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval, 3(3), 225–331.

--

--