How to use Review API
Extracting reviews about Zara and Shein without web scraping
One of the things I’m trying to learn is how to web scrape. While I was trying to figure that out, I found a tool that could do what I wanted: Review API. In my opinion, knowing how to extract data from Review API is a great stepping stone if you are still learning web scraping and regex. The free version only lets you extract 30 reviews, but it’s still a great learning tool.
For this project, I wanted to compare reviews about two popular fashion companies: Zara and Shein. They are different in terms of pricing, product quality, marketing, and customer segments. The goal of this project was to see the differences in how their customers review them.
The documentation that Review API provides is very helpful; it shows you how to request reviews through your browser, CURL, Python, Node.js, and PHP. I will be going over how I did this using Python.
Review API supports several platforms where you can extract your data from, but since I am only interested in reviews about companies, I will be showing how I extracted data from sitejabber and Consumer Affairs.
This is Zara’s sitejabber page and this was how I extracted 30 reviews:
import requests
import re
import pandas as pdheaders = {
"apikey": "ENTER_YOUR_API_KEY_HERE"}params = (
("url","https://www.sitejabber.com/reviews/zara.com"),
("amount","30"),
);response = requests.get('https://app.reviewapi.io/api/v1/reviews', headers=headers, params=params);#view response
print(response.text)
The response gives you a large string text, but I want to create a data table containing: Username, Date, Rating, and Review. Now we have to use some regex to extract what we want.
Here’s a snippet of how the output looks:
{"query":{"url":"https:\/\/www.sitejabber.com\/reviews\/zara.com","amount":"30"},"reviews":[{"platform":"sitejabber.com","rating":1,"user_name":"JenniferG.","text":"It is disappointing that Zara's customer service policies are so poor. 30 minutes to wait in line to make a purchase or return is unreasonable. And, now that we're transitioning from the pandemic, the dressing rooms are still not open. And, if you buy something online, you cannot return it to a store. Terrible!ServiceValueShippingReturnsQuality","title":"TERRIBLE CUSTOMER SERVICE","timestamp":"2021-08-02","platform_specific":{"user_review_count":"1","user_image_url":"https:\/\/static.sitejabber.com\/img\/stock_photos\/200\/thumbnail_small.1476463632.jpg","user_helpful_vote":"1"}}
After some trial and error, I was finally able to figure out how to obtain the variables I was interested in:
#make empty dataframe
zara_rev = pd.DataFrame(columns=['Username', 'Date', 'Rating', 'Text'])#extract data for each column
zara_rev['Username'] = re.findall(r'(?<=\"user_name\"\:.)(\w*)', response.text)
zara_rev['Date'] = re.findall(r'(?<=\"timestamp\"\:.).[0-9]*\-[0-9]*\-[0-9]*(?=\"\,\"platform_specific\")', response.text)
zara_rev['Rating'] = re.findall(r'(?<=\"rating\"\:)(\d)*', response.text)
zara_rev['Text'] = re.findall(r'(?<=\"text\"\:.).*?(?=\"\,\"title\")',response.text)
When it comes to regex, I learned to be VERY specific about where I want to extract (or else I end up with one data point). You may need to change some of this depending on how your response looks.
Now, your data set should look something like this:
That’s it! Super easy right? Just do the same thing for Shein. (Note: Shein does not have a Consumer Affairs page, so I only have sitejabber data for this company.)
Here is a graph I made showing Zara and Shein’s rating distribution:
Many consumers use sites like sitejabber and Consumer Affairs to share bad experiences with a company’s product or customer service. Here are some of the 1-star reviews:
This was interesting because I would hear these sentiments from my friends that shop at these stores. Many customer complaints about Shein were on sizing and not getting their refund back. Many Zara reviewers complained about bad customer service experience.
Company reviews are a great way to learn about what customers are saying about your product and customer service. Customer reviews can help companies find areas where they can improve!
If you want to go even deeper with your text analysis, refer to my other article: NLP on Disneyland Reviews.
I hope this article helps you if you are just starting to learn how to use regex or are in the process of learning how to web scrape.
Sources: