Exposing fake reviews with Python
Instagram ads always try to gaslight me into thinking I have all sorts of health problems. Typically I ignore them, but when one appeared for a personalized nootropic blend, I was a little curious. The company in question, Thesis, sells blends of nootropics, which are supplements that are supposed to help with focus, memory, energy, etc.
Our algorithm was built by data scientists and has over 12,230,136 data points total. It matches you with formulas that are specifically aligned with your unique neurological make up, and identifies areas for improvement so you can reach your desired goals.
After taking a short quiz, you’re presented a starter kit containing 4 supplement blends … out of the 5 varieties that they sell. For being a major selling point, that doesn’t feel very “customized to my neurologic make up”.
Highly rated brain enhancers?
Out of 7411 reviews, the nootropic starter pack had at least 4.5 stars from what I could tell. For a $119/month supplement, I want to make sure I’m spending my money on something that’s going to work! The reviews section shows handfuls of snippets from people like “Elliot S” and “George L”,. claiming “Love it!” or “So far, super great!” with a 5 star rating.
After clicking “next” repeatedly to page through the reviews, I didn’t see any genuinely negative reviews, or anything less than 4 stars.
Where are the reviews coming from?
In the Chrome Inspect menu, you can record network traffic. Every time I clicked “next” on the reviews page, it was making a request to https://api.yotpo.com/
.
A quick Google search reveals:
Yotpo is an eCommerce marketing platform with the most advanced solutions for customer reviews, visual marketing, loyalty, referrals, and SMS marketing.
Searching for “yotpo fake reviews” reveals Trustpilot’s 1.9 star assessment of them, where one user claims:
“…the store can delete every review that they don’t like. You can’t trust any reviews on any company which uses Yotpo. All just fake.”
Though … is this review real? Can Trustpilot be trusted to not alter reviews? How deep does this rabbit hole go?
Scraping time!
To find out if the reviews have been tampered with, I figured I should first download all of them.
I right clicked the request in the Chrome inspect menu then clicked “Copy as cURL command (bash)”.
Then I pasted this into a terminal and ran it.
This part of the request is the most interesting.
--data-raw '{"page":10,"domain_key":"starter-kit","sortings":[{"sort_by":"votes_up","ascending":false}],"free_text_search":null}'
To get all the reviews, this command needs to be run repeatedly while incrementing the page counter. Ask for page 1, then page 2, then page 3, etc. until there are no more results.
But first, let’s look at the response. I saved this command to a shell script then ran it with some extra pretty-printing. This was done by adding | python -m json.tool
after the cURL command.
The response contains a lot more detail than the reviews section on the site shows. In the reviews
array, we have expected fields like title
, score
, created_at
. Surprisingly we're also getting fields like topics
which breaks down individual reviews down into sections, and sentiment
, which seems to guess at "how positive the review is".
Downloading all the reviews
Because the API is paginated, I’ll need to query it repeatedly to get all 7411 reviews.
First, I changed the the page field to "page":'"$1"'
. This allows me to specify the page number as a positional argument in the script. For example, sh ./curl_reviews.sh 7
would return the seventh page of reviews.
Right now, we’re only getting 5 reviews at a time. I don’t want to query their endpoint 1500 times to get all the reviews — they might block my IP. So I found the docs for Yotpo’s review search API and saw that the request can include a per_page
field. I added "per_page":1000
. This caused them to send back 150 results at a time, which must be their limit per request.
To download and combine all those reviews, I wrote the following Python script.
Wrote 7411 reviews to reviews.json.
This would more normally be done using the requests package, but there are always multiple ways to do the same thing.
Now I have a giant JSON file of all the reviews. Time to analyze it.
Analyzing the reviews
To start, I counted how many reviews of each star rating there were.
At first I thought I would set the histogram bins to[1, 2, 3, 4, 5]
, being the various scores a review can have. Instead, these numbers define the range of the bins, so we need[0.5, 1.5, 2.5, 3.5, 4.5, 5.5]
. These should be read like: "a bin of 0.5 to 1.5 values", "a bin of 1.5 to 2.5 values", etc. One star reviews will fall in the first bin, two star reviews in the second, etc.
So, time to interpret the results: [3, 2, 10, 3439, 3957]
Incredible! Out of 7411 reviews, only 15 were three stars or less.
Only 0.2% of reviews are 3 stars or less
Next, I checked the date distribution of these reviews.
There’s a sudden drop in reviews after July, 2021. I’m not sure what this means, but I have a few theories:
- Some review incentive program ended?
- Recent reviews had been deleted?
- After spending all summer generating fake reviews, they took a sabbatical.
Just for fun, I also plotted the names of the most common reviewers. The API only gives us the first name and last initial.
Stephanie S. has left 23 reviews. How many Stephanie S’s are there?
In conclusion…
The distribution of scores is enough evidence that Thesis tampered with their reviews. Don’t believe me? Try finding a single Amazon product with comparable ratings. The highest rated products on Amazon all have from 8% to 10% negative (3 or less stars) reviews. Even this perfect product has 9% reviews 3 stars or under. Nothing comes close to 0.2%.
Five star rating systems are inherently flawed. They take hundreds of complex experiences and output a simple number. They’re biased toward extreme reactions — the five stars and the one star. And as naïve consumers, we have to trust that they’re actually taking the average of the ratings. Well… Amazon isn’t; they use a ML based weighting algorithm. This means some reviews are weighted higher than others when computing the overall score. Without any transparency behind this algorithm, Amazon can effectively “remove” negative reviews, in a slightly less conspicuous way than Thesis did.