The Witcher Netflix — Social Media Analytics: web scraping, natural language processing with Python

Kiarash Yasoubi
Data Analytics Centre
4 min readJul 9, 2020

Social media is one the main source of accessing to difference data these days. Many organizations invest in social media analytics tools and techniques in order to monitor and predict events as well as people behaviors. However, Social media is one the main source of accessing to difference data these days. Many organizations invest in social media analytics tools and techniques in order to monitor and predict events as well as people behaviors. However, usually we people talk about social media they think about twitter, Facebook and Instagram, however, there are other social media for other fields or industries too. IMDB is one of the biggest database of films, television programs and video games that offers vast information and enable users to share their opinion about programs and rank them. Therefore, in this article we aimed to collect IMDB user’s reviews about the series the Witcher that believed to be the next Game of Thrones. Instead of looking for other social Medias, we are going to collect our data directly from IMDB website using a single web scraping by Python and then analyze the comments. Our goal is to use Python to explorer users’ comment and get insight about what they are think about the Witcher.

Here is the framework we are going to implement with Python:

1. Data collection:

We load the required libraries (you may need to install some of them)

Then we enter the URL(s) of reviews and use the BeautifulSoup library to download and parse the html tags.

In this step, we should find the exact address of html tags we are looking for. BeautifulSoup function’s find_all could be useful here. We know that comment are stored in class name “review-container”, so apply the function and extract comments from html code.

2. Data Manipulation

Now we have a messy and semi structured data, which is not ready for processing. We need to first transform it to tabular data set and then do some cleaning.

As you may noticed, Reviews are messy and ranking column need to be cleaned

Now ranking column cleared so let us see what is happening here

3.Visualization

4. Natural Language Processing

For this part, we need to make corpus and tokenize the comments

Now we can visualized the most frequent words used by users

Then we apply sentiment analysis

We noticed that most people had positive comment

Now visualize most frequent positive and negative words

Positive reviews
Negative reviews

Conclusion:

In this article tried to show how to combine web scraping with natural language processing in order to get insight about how user thing about the series.

--

--