Can You Trust Amazon Reviews?

For commodities, maybe. But for supplements, no.


Most of us check the Amazon reviews before buying something online, from electronics to home goods. If an item is top-rated on Amazon, we assume it’s good. And if other products are rated better, we assume they’re better. That works well for drones and phones, but if you’re buying supplements, it might be a mistake.

That’s because Amazon reviewers are prone to subjectivity. They can only evaluate the things they can sense — appearance, flavor, smell, shape, consistency, and price — but they can’t measure purity or efficacy. That’s a serious limitation. For supplements, purity and efficacy are sometimes the only objective measurements that matter.

Two honest reviews

Take Centrum vitamins for example. How much of a positive Amazon review is thanks to a good product, and how much is just brand loyalty? What impact does the fancy packaging have? What about all the negative reviews for slow shipping times, or paid positive reviews? There’s a lot of noise to cut through when reviewers are swayed by issues unrelated to quality.

Expiration dates matter, but are these reviews throwing off the score for Centrum as a whole?

When a product affects your health, purity and ingredient levels might matter more than how the product is marketed or branded. Flavor and texture would also probably matter less if you knew that heavy metals from the supplement were being collected in your cells. It’s pretty presumptive to say paid and subjective user reviews like those on Amazon.com actually relate well to these criteria, but we decided to test it out anyway.

Tommy Noonan of SupplementReviews.com conducted an experiment where he compared Labdoor’s grades for 77 protein powders to both Amazon.com and Bodybuilding.com reviews. Labdoor’s grades and Amazon’s ratings were converted to a 0 to 10 scale to match Bodybuilding.com’s rating system, so that scores could be easily compared. (Labdoor’s A was a 9.5, for example, and a B was a 8.5. Amazon’s 5 was a 10, and Amazon’s 2 was 4.) Then, he plotted each product’s scores on a graph.

A comparison of Labdoor vs. Amazon and Bodybuilding.com ratings. If there was a correlation, points would cluster along the black line (they don’t).

The figure above is the result. The left-most blue dot, for example, shows that the product received a 6 in Amazon ratings (3 stars) and a 9.5 in Labdoor grades (an A). In this type of graph, if the dots were clustered along the black line, that would indicate a strong correlation between user reviews and lab quality. Instead, the points skew in all directions, indicating no correlation.

No correlation. That means that some of the highly-ranked products on Amazon didn’t do as well in lab tests, and that some of the most pure protein powders aren’t getting good reviews on Amazon. But protein powders have issues with solubility and flavor that wouldn’t be counted in Labdoor’s ratings. So, let’s look at something simpler.

Natural Zest magnesium received a grade F from Labdoor and 4.7/5 stars from Amazon reviewers

Magnesium is a great example. Of the 36 brands we tested, nine received a grade D- or F. The worst Labdoor-ranked product — Natural Zest Ultimate Magnesium Citrate, is currently rated an almost flawless 4.7 stars on Amazon with 108 5-star reviews, with many calling it a “great product”.

Amazon Vine reviewers are given free products in exchange for reviews on Amazon. Some critics argue that these reviews are heavily biased.

In our own tests of Natural Zest, we discovered 90% of the claimed magnesium content was missing, and measured high levels of arsenic. In fact, all 9 of the low-rated magnesium supplements on Labdoor had high ratings of 4.5+ stars on Amazon. Amazon reviewers simply couldn’t tell when they were getting a lower-quality supplement.

So what’s the best way to judge a product? That depends on values. Some combination of lab tests and user reviews might be ideal. Lab tests can demonstrate purity and safety, after which potential customers could turn to consumer reviews for guidance on flavor. We’re still a few years off from measuring that in a lab.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.