Search Ranking with Bayesian Inference

Published in

Airy ♥ Science

9 min readJun 24, 2019

Have you ever bought something online? Let’s say you want to book a hotel from an online travel agent. How do you decide which item you would like to buy when there are many alternatives?

There could be many possible considerations before buying things online, such as popularity, quality, and price. Although people have different preferences, they want to get the best deal possible. Customer reviews could be a helping hand before making a decision. Based on Podium’s infographic, 93% of consumers say online reviews impact their purchasing decisions [2].

Knowing that review is one of the most important aspects considered by customers, we will focus on making a decision based on it. But, having customer reviews as a reference doesn’t mean our task to choose the best product becomes easy. Imagine that we are going to book a hotel and come up with three nearest hotels to the destination:

Hotel A, review score: 5.00, # reviewers: 2
Hotel B, review score: 4.80, # reviewers: 300
Hotel C, review score: 4.64, # reviewers: 50

Are we more convinced to book Hotel A or Hotel B? Hotel A has higher review score than Hotel B, but there is more uncertainty whether Hotel A is really good given it only has two reviews. Most likely we will choose Hotel B because we are more convinced with 300 reviews, although the hotel has a lower review score. Comparing hotel A and B is an easier task since the number of reviews between the two hotels are hugely different. But, what if we compare Hotel A and Hotel C? It would be nice if we have a tool which could help us in making the decision.

Frequentist and Bayesian Inference

Statistics could be the tools for us to incorporate those uncertainties. The field of statistics is divided into two major divisions: descriptive and inferential [4]. Descriptive statistics is about summarizing raw data or sample. On the other hand, inferential statistics is about generalization, finding the “properties of a population” or “parameter of the distribution” from which we took the sample. For example, from the result of flipping a coin 30 times, we want to infer what kind of a coin it is. Is it a fair coin or a loaded coin with 0.7 probability p to get a head?

There are 2 general “philosophies” in inferential statistics, frequentist inference and Bayesian inference [5]. Both have a different concept in interpreting probability. According to frequentist, only repeatable random events like the result of flipping a coin have probabilities. And, the probability of an event is equal to the long-term frequency of occurrence of the event [5]. For example, a coin-flip has probability p giving a head (Bernoulli distribution). The parameter p is the properties of a population. It is fixed and we don’t know the value. To get the value of parameter p, we need to flip the coin infinite number of times and see the relative frequency of getting head.

Bayesian approach allows us to assign probabilities to repeatable random events and it is also applicable to non-repeatable events like the probability of Team A win over Team B in a match. Bayesian also allows us to assign probabilities to the properties of a population [5]. For example, we have a loaded coin with probability p giving a head (Bernoulli distribution). We can ask what is the probability that the coin is loaded with probability p between 0.6 and 0.7 (probability of probability). In addition, we could incorporate our own belief (prior). This belief could be updated as we get more data (posterior). Bayesian ability to incorporate uncertainty and calculate a probability distribution of the population’s parameter could help us do ranking in a different way.

Modeling Review Score Distribution

After we have the tools, we want to model hotel’s review score. We know that review score is a bounded integer, usually taking value from 1 to 5. But, we want to approach the problem by assuming review score as a binary value, 0 (bad) or 1 (good). The reason is it is easier to calculate and explain although we lose property reviews’ spread/variance information. Each hotel will have a different p to get a good review. If we have 100 user reviews to a hotel and we know that p=0.9 to get good reviews, we could ask “what is the probability that we will get up to 80 good reviews?” We could get the answer by using probability mass function (PMF) and cumulative distribution function (CDF) of Binomial distribution. Equation below is PMF of Binomial distribution.

Probability mass function of Binomial distribution

Binomial probability mass function, PMF(x=80; n=100, p=0.9), gives the probability of getting exactly 80 good reviews from 100 reviews with p=0.9 to get a good review. To answer the question, the probability that we will get up to 80 good reviews, we need to sum up probability mass function, PMF(X; n=100, p=0.9), with x from 0 to 80. That is equivalent to cumulative distribution function, CDF(X=80; n=100, p=0.9). CDF(X=x), is the sum of PMF(X) with X from -infinity to x.

But, what if we want to ask the opposite? We observe 80 success from 100 user reviews. What is the probability of parameter p>0.9? By using Bayesian inference on Binomial distribution, we get the probability of p given the data follows a Beta distribution, Beta(α, β). Parameter α equals to the number of good reviews and parameter β equals to the number of bad reviews. We could see the derivation from Bayesian Inference of Binomial Proportion by QuantStart [3]. We will not discuss the detail here. If we believe parameter p of a Binomial distribution could take any value ranging from 0 to 1 equally likely, we could use uniform distribution prior (own belief), Beta(1,1). The answer to the question above, probability of p>0.9 after we observe 80 success from 100 user reviews, is 1-CDF(p<0.9). The CDF is from posterior distribution, Beta(1+80, 1+20).

We approximate the Beta distribution for the 0–5 scale review by normalizing review score to 0–1. We assume 1 good review if the user gives a review score of 5 and we assume 0.8 good review if the user gives a review score of 4. The Beta distribution represents the probability of the expected review score.

Figure 2. Pivot and beta distribution chart of Hotel C‘s review score

In Figure 2, we visualize a summary of Hotel C’s review scores with an average review score of 4.64 from 50 reviews. The chart on the right side is the posterior Beta distribution of the normalized review score. The solid blue line represents the probability density function (PDF). PDF is like probability mass function (PMF) mentioned above, but it is for continuous distribution. The area under the PDF line is the probability of Hotel C review score. For example, if we want to know the probability of the review score of Hotel C above 4, we could calculate the blue area under the Beta distribution with x-axis between 0.8 and 1. The area will be equal to 1-CDF(X=0.8) of the Beta distribution.

Search Ranking with Lower Beta

My colleague, Ali Akbar Septiandri, was the one who proposed the idea of using lower beta for search ranking. Ali himself is inspired by Cam Davidson-Pilon’s notebook [1]. We aim to pay more attention in recommending hotels to customers. We only recommend good hotels which consistently get high review score. By utilizing Beta distribution of hotel’s review scores, we want to find a value which we believe as the minimum review score of the hotel. From the Beta distribution, we take the 5th percentile as the minimum review score. This value is what we called Lower Beta. We could see a dashed line on Figure 2 which is the lower Beta of Hotel C’s review score. Blue area on the right side of the dashed line covers 95% area of the distribution. We could interpret this as 95% probability of the review score is above lower Beta. The lower Beta of Hotel C’s normalized review score is 0.84, which equal to 4.2 in non-normalized score. We are 95% certain that Hotel C’s review score is above 4.2.

We could sort the hotels in search page based on their lower Beta values in descending order. This ranking method is not as naïve as just sorting by average review score. It follows our way of thinking when we decide which product to buy. Figure 3 shows each hotel’s Beta distribution and its lower Beta value represented by dashed line. Hotel A (red) have average review score of 5 but the distribution spread is very wide because it only has 2 reviews and more uncertainty. We could see the calculation result in Table 1, which we use to visualize Figure 3. Hotel B is in the first rank with the highest lower Beta, followed by Hotel C and Hotel A.

Table 1. The lower beta calculation result

This method has disadvantages which we could improve, like cold start problem. A new hotel which starts with no review will rank relatively lower. We could see that although Hotel A got a review score of 5 from 2 reviews, our model shows 95% probability that Hotel A’s review score is above 1.84 from 5 scaled score range. To handle this, we could use our own prior/belief. For example, if we believe that hotel’s review score is usually around 4 from 5 scaled score range, we could use Beta(4,1) or Beta(8, 2) as our prior. With prior Beta(α, β), the weight of our belief is α+β. And, the expected value is α/(α+β).

We ran an A/B test to compare this ranking method against current ranking method. But, this method didn’t significantly improve our metrics. From the result of A/B test, we had a hypothesis why the metrics didn’t improve. We will fix the problem based on the hypothesis and iteratively improve the search ranking. Despite that, we have utilized Bayesian method for ranking in analytics. We use different quantiles, higher or lower, depends on the problem in hand.

Source Code

We have provided the code to help you understand this article. You could run the code by clicking the green button. Feel free to play with it! You might want to change the dataset with items from online marketplace and see which item should be purchased.

Bayesian search ranking source code

Conclusion

We could use Bayesian inference as a tool to help us choose a product in online marketplace, incorporating uncertainty of review scores. The tools could be used for search ranking as well, recommending good hotels which consistently get high review score. We ran an A/B testing to compare Bayesian ranking method against current ranking method. But, this method didn’t significantly improve our metrics. Despite that, we have utilized Bayesian method for ranking in analytics.

Airy’s Data Team conduct weekly sharing session so that we could learn from each other. If you want to learn more about data or if you have any idea which could transform hospitality industry through data science, come and join us!

Reference

[1] Davidson-Pilon, C. (2013). Bayesian methods for hackers: probabilistic programming and Bayesian inference. Retrieved from http://nbviewer.jupyter.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter4_TheGreatestTheoremNeverTold/Ch4_LawOfLargeNumbers_PyMC3.ipynb

[2] Podium. (2017). Consumers Get “Buy” With a Little Help From Their Friends. Retrieved from http://learn.podium.com/rs/841-BRM-380/images/2017-SOOR-Infographic.jpg

[3] QuantStart. Bayesian Inference of a Binomial Proportion. Retrieved from https://www.quantstart.com/articles/Bayesian-Inference-of-a-Binomial-Proportion-The-Analytical-Approach

[4] Taylor, C. (2018). Descriptive vs. Inferential Statistics. Retrieved from https://www.thoughtco.com/differences-in-descriptive-and-inferential-statistics-3126224

[5] Thaeh, T. C. (2016). Frequentist and Bayesian Approach in Statistics. Retrieved from https://www.probabilisticworld.com/frequentist-bayesian-approaches-inferential-statistics/