One of the most important issues in online shopping platforms, which have many products that users rate products based on their experiences, is scoring, and you certainly need to find an answer to questions such as how to do it.
In everyday life, many people get the highest positive rating or the lowest negative rating, etc. examines the products they will buy on the pages according to such filters. So if we were to make an evaluation, according to what and how should we follow?
There are some ways to find a rating and rate products:
1. Overall average of rating
2.The difference between Positives and Negative Rating
3. Proportion of Positive ratings
However, these may have consequences that we would call untrustworthy, very prone to manipulation.
At this point, the knowledge of “Wilson Lower Bound”, which is a good solution, is precious. The main idea is to treat the current set of user ratings as a statistical sampling of a hypothetical set of user ratings for all users, and then use that score. To summarize is what the user community would think of upgrading a product with 95% confidence, given that we have an existing rating with exemplary user ratings.
So if we know what a sample population thinks, i.e. if we do user reviews for a product, you can use it to predict preferences for the entire community.
If there are helpful_yes and helpful_no votes for a product and we want to understand how popular the product will be in the whole community. Here I will make our predictions with wilson_lower_bound with a 95% confidence interval.
So what is wilson_lower_bound?
The Wilson Confidence Interval takes into account the binomial distribution for score calculation, that is, it only considers positive and negative ratings. (Since I have already converted the data set I use in this project to helpful_yes, helpful_no, it will be quite easy for me to use it without doing a conversion here again)
Now let’s see what I have explained by applying this to a project.
First Step: Reading and Understanding Data
reviewerID — User ID. For example: A2SUAM1J3GNN3B
asin — Product ID. Example: 0000013714
reviewerName — Username
helpful — The degree of useful review. Ex : [2, 3]
reviewText — User-written review text
overall — Product rating
summary — Review summary
unixReviewTime — Review time (Unix time)
reviewTime — Review time (raw)
Second Step: Average Point of Product
However, before doing this, let’s take a direct average of the overall variable.
Third Step: Calculate Weighted Average Point of Product by Date
df["reviewTime"] = pd.to_datetime(df["reviewTime"], dayfirst=True)
current_date = pd.to_datetime("2021-02-12")
df["days"] = (current_date - df["reviewTime"]).dt.days
a = df["days"].quantile(0.25)
b = df["days"].quantile(0.50)
c = df["days"].quantile(0.75)df.loc[df["days"] <= a, "overall"].mean() * 28 / 100 + \
df.loc[(df["days"] > a) & (df["days"] <= b), "overall"].mean() * 26 / 100 + \
df.loc[(df["days"] > b) & (df["days"] <= c), "overall"].mean() * 24 / 100 + \
df.loc[(df["days"] > c), "overall"].mean() * 22 / 100
If we want to determine the first 20 comments to be displayed on the product page, we have to choose the most useful and informative content for users.
For this, we need to separate our variable “helpful” into “helpful_yes” and “helpful_no” as it represents.
Thanks to these two variables, we find the total number of votes “total_vote” given.
new_features = df["helpful"].str.split(",",expand=True)
### First step of task 2: Create new "meaningful" variables from helpful variable ###
new_features = new_features.astype("string")
helpful_yes = new_features.str.lstrip("[")
helpful_yes = helpful_yes.astype("int64")
total_vote = new_features.str.rstrip("]")
total_vote = total_vote.astype("int64")
helpful_no = total_vote - helpful_yes
df["helpful_yes"] = helpful_yes
df["helpful_no"] = helpful_no
df["total_vote"] = total_vote
Third Step: Create score_pos_neg_diff Scores
As I mentioned at the beginning of the article, let’s create a score value using the positive and negative differences used for ranking.
def score_pos_neg_diff(pos, neg):
return pos - neg
df["score_pos_neg_diff"] = df.apply(lambda x: score_pos_neg_diff(x["helpful_yes"],
Fourth Step: Create a core_average_rating variable, using with average_rating function
Another consistent ranking method would be to use the average of the votes given as positive with all the votes.
def score_average_rating(pos, neg):
if pos - neg == 0:
return pos / (pos + neg)
df["score_average_rating"] = df.apply(lambda x: score_average_rating(x["helpful_yes"],
Fifth Step: Create new scores with Wilson lower bound rule, using with wilson_lower_bound
def wilson_lower_bound(pos, neg, confidence=0.95):
import scipy.stats as st
n = pos + neg
if (pos-neg) == 0:
z = st.norm.ppf(1 - (1 - confidence) / 2)
phat = 1.0 * pos / n
return (phat + z * z / (2 * n) - z * math.sqrt((phat * (1 - phat) + z * z / (4 * n)) / n)) / (1 + z * z / n)
df["wilson_lower_bound"] = df.apply(lambda x: wilson_lower_bound(x["helpful_yes"],
Sixth Step: Conclusion
Now, let’s think that we have a lot of reviews and votes for them. Should it be shown first, or should social proof be sought, who gets the most helpful approval?
Let’s see the helpful variable according to the scores we have created for this.
Except for a few minor changes, the difference between Wilson lower bound and positive-negative looks the same, but all of these are explained statistically. I wanted to take a different step here and create a change in the total score. Thus, I think the result will be more beneficial by giving different weights to two different methods.
df["total_score"] = (df["score_average_rating"] * 40 / 100 +
df["wilson_lower_bound"] * 60 / 100)
Thanks to total_score, we were able to see sensible and potential comments, although they were more social proof rich and received fewer votes. As an extra, we have provided a variety of comments.