Web scraping Trustpilot reviews

Figure 1: Screenshot of reviews page in Trustpilot.com.

So, what is the easiest way to scrape reviews from Trustpilot.com?

In this article we will try to scrape user reviews for a low code web scraping software called Octoparse.com.

At the end of this article, you will be able to extract these reviews as a CSV file shown in figure 2.

Figure 2: Screenshot of individual review of a product at Trustpilot.com.

Option 1: Hire a fully managed web scraping service.

You can contact us contact us for our fully managed web scraping service to get Trustpilot reviews data as a CSV or excel file without dealing with any coding.

Our pricing starts at $99 for fully managed Trustpilot scraping with upto 20,000 rows of data.

You can simply sit back and let us handle all complexities of web scraping a site like Google that has plenty of anti-scraping protections built in to try and dissuade from people scraping it in bulk.

We can also create a rest API endpoint for you if you want structured data on demand.

Option 2: Scrape Trustpilot.com on your own

We will use a browser automation library called Selenium to extract data.

Selenium has bindings available in all major programming language so you use whichever language you like, but we will use Python here.

Extracting Trustpilot reviews

Once we have the raw html source, we should use a Python library called BeautifulSoup for parsing the raw html files.

Extracting review author

From inspecting the html source, we see that review authors have div tags and belong to class ‘consumer-information__name’.

The next step is extracting review dates of each review.

Extracting review title and review contents

For brevity we will only show results from first five results, and you can verify that the first result matches the text in figure 2 above.

Converting into CSV file

You can take the lists above, and read it as a pandas DataFrame. Once you have the Dataframe, you can convert to CSV, Excel or JSON easily without any issues.

Figure 3: Screenshot of Trustpilot.com reviews saved as a CSV file.

Scaling up to a full crawler for extracting all Trustpilot reviews of an app

Pagination

  • To fetch all the reviews, you will have to paginate through the results.

Implementing anti-CAPTCHA measures

  • After few dozen requests, the Truspilot servers will start blocking your IP address outright or you will be flagged and will start getting CAPTCHA.
  • For successfully fetching data, you will have to implement:
  • rotating proxy IP addresses preferably using residential proxies.
  • rotate user agents
  • Use an external CAPTCHA solving service like 2captcha or anticaptcha.com

After you follow all the steps above, you will realize that our pricing for managed web scraping is one of the most competitive in the market.

Originally published at https://www.specrom.com.

--

--

We cover all the cutting edge natural language processing, machine learning and AI powered strategies to extract web data on big data scale.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Jay M. Patel

Cofounder/principal data scientist at Specrom Analytics (specrom.com) natural language processing and web crawling/scraping expert. Personal site: JayMPatel.com