Wilson Lower bound Score and Bayesian Approximation for K star scale rating to Rate products

Aditya Kumar
tech-that-works
Published in
4 min readJan 8, 2020
Photo by Charles on Unsplash

As a maintainer of an online community, which is having a lot of products where user gives a rating to products based on their experience, then it is definite that at some point you have to find an answer to questions like, how you are going to show the product on the page based on filters i.e. like highest voted or lowest voted, etc.? Or How can you rate a product based on upvotes and downvotes? How you can give a score to a product which is rated on a K scale by users?

There are some ways you can find a score and rate products:

  1. Score = Average rating of products
  2. Score = Positive rating - Negative rating
  3. Score = Proportion of Positive ratings

Evan Miller’s famous blog How not to sort explains why the above two scores are not good ways to rate the product or sort a product.

Lower bound of Wilson score confidence interval for a Bernoulli parameter provides a way to sort a product based on positive and negative ratings.

The idea here is to treat the existing set of user ratings as a statistical sampling of a hypothetical set of user ratings from all users and then use this score. In other words, what user community would think about upvoting a product with 95% confidence given that we have an existing rating for this product with a sample (subset from the whole community) user ratings.

Therefore if we know what a sample population thinks i.e. user reviews for a product, you can use this to estimate the preferences of the whole community.

If there are X positive votes and Y negative votes for a product and we want to understand how popular the product will be across the whole community. We can estimate that with 95% confidence between wilson_lower_bound_score and wilson_upper_bound_score% of users will upvote this product using the Wilson Score of Confidence interval.

Wilson Score

Wilson Lower Bound Score
where p̂ ​= (# of positive ratings)/(Total ratings)
n = Total ratings
Z_(α/2)= Quantile of the standard normal distribution
Wilson Score

Wilson Confidence Interval considers binomial distribution for score calculation i.e. it considers only positive and negative ratings. If your product is rated on a 5 scale rating, then we can convert ratings {1–3} into negative and {4,5} to positive rating and can calculate Wilson score.

Here are some examples:

  1. If a product is rated across each category uniformly [10, 10, 10, 10, 10], i.e. 10 votes for rating {1–5}, then wilson_lower_bound(20,50,.95), avg_rating([10, 10, 10, 10, 10]) => (0.2760838973025655, 3.0)
  2. A product receives only one rating i.e. positive and one product receives 10 positive and 2 negative ratings: in that case value of product having more ratings should be greater wilson_lower_bound(1,1,0.95) < wilson_lower_bound(10,12,0.95) , which is true.
  3. A product having ratings A: (209 up and 50 down votes) and B: (118 up and 25 down) wilson_lower_bound(209,259,0.95) < wilson_lower_bound(118,143,0.95)
  4. Suppose one product receives [5, 10, 20, 0, 0] ratings, then wilson_lower_bound(0,35,0.95) = 0. If any product does not have any positive ratings associated with it then the Wilson score is zero.
  5. Wilson Score can not be applied to a new product which is yet to receive any rating. Above implementation gives score equal to zero if there is no rating associated with the product i.e. wilson_lower_bound(0,0,0.95) = 0

Wilson score gives us the zero value for both the product which does not receive any positive user rating and to the product which is new and yet to receive any rating, which essentially does not make any sense as this implies no user rated product is the same as a product having lower ratings. Also, it is not clear how tight the lower bound is i.e., how far it deviates away from the “real” proportion of thumb-ups [1]. It does not seem intuitive to convert items rated on the five-star scale to convert to upvotes and downvotes for calculating scores to follow a binomial distribution.

Bayesian Approximation provides a way to give a score to a product when they are rated on the K star scale.

Bayesian Approximation for K scale rating

where s_k = k i.e. 1 point, 2 points, ….N point Scale

N = total ratings with n_k ratings for k scale

The above expression provides the lower bound of a normal approximation to a Bayesian credible interval for the average rating. For more mathematical details please check [4].

Bayesian Approximation Rating for K Star Scale

Let’s look at some test samples

  1. bayesian_rating_products([0, 0, 0, 0, 1]) = 2.2290
  2. bayesian_rating_products([0, 2, 0, 10, 0]) = 2.9921
  3. bayesian_rating_products([5, 10, 20, 0, 0]) = 2.2349
  4. bayesian_rating_products([10, 10, 10, 10, 10]) = 2.6296

On comparing 1 and 2, we can observe the second product should have a higher score and that is the case here. Also in point 3, unlike Wilson score it provides a score to the product which does not have a positive rating and still that score is greater than the first product which seems reasonable to me.

Bayesian Approximation does not consider only upvotes unlike Wilson's score but considers ratings across the K scale and proves to be better in this scenario.

Please let me know your views and suggestions in the comments.

--

--