Selenium Python — Movie Recommendation From IMDB Reviews (without actually read the reviews)

Jeffrey Triandi Sabarman
Analytics Vidhya
Published in
8 min readApr 26, 2020

Python web crawling selenium package and web scrapping BeautifulSoup package can give you a movie recommendation you want to watch without reading all the reviews

Source : indiatoday.in (Breaking Bad)

Have you ever want to watch a movie or start binge watching a new tv shows but you don’t know whether it is good or bad. You can see the rating, but sometimes the rating is bias. That leave us to the reviews. The reviews give us a more in-depth sentiment about how people perceived the movie or the tv show. But who have time to read the reviews anyway?

As a programmer, we can make the boring process more fast and fun using code. So, let’s use Python to give us the recommendation of a movie or a tv show. Let’s go!

Import Packages

We use Python to get the review from IMDB reviews. We need a web crawling packages for going to the IMDB reviews page and web scrapping packages for getting all the reviews in the page. Here’s the packages that we needed.

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import pandas as pd
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from textblob import TextBlob
import time

If you don’t have the packages, you have to install it first on your command prompt with this line of code.

pip install (name of the package)

For example : pip install pandas

Here’s the explanation for all the packages :

  • requests package is for getting the url of the page for web scrapping the reviews
  • BeautifulSoup package is for getting the HTML of the page for web scrapping the reviews
  • webdriver package is for web crawling the web pages
  • Keys package is for another way to click a button in the web page
  • pandas package is to make the reviews in a dataframe
  • SentimentIntensityAnalyzer package is to make the sentiment analysis of the reviews using vadersentiment
  • TextBlob package is to make the sentiment analysis of the reviews using textblob
  • time package is to make a pause when we do the web crawling

Why use two sentiment analysis? Because i check the accuracy of both sentiment analysis and the result is not that good. So, to eliminate the error, i use both of the vadersentiment and textblob to give us to result that we can choose. At the end of the day, whether we watch it or not is up to us.

Let’s begin the journey

Web Crawling To IMDB User Reviews Website

After you import all the necessary packages, let’s start the web crawling journey. First of all, we have to set the web browser that we want to run. Personally, i use google chrome for my web browser. So, i’m going to show you how to use the chrome web browser for web crawling. Before you jump to the code, make sure to install the chromedriver. If you use another web browser, you have to intall the web browser of your choice too.

First, you have to check the version of you chrome in the help section and then about google chrome.

Google Chrome Version

As you can see, my google chrome version is version 81. After you know the version, you have to install the chromedriver spesific to your version.

chromedriver install

After you install it, we can continue our journey.

movie = input("What movie or tv shows do you want to watch? : ")#Set the web browser
driver = webdriver.Chrome(executable_path=r"C:\chromedriver.exe")

#Go to Google
driver.get("https://www.google.com/")

First we ask to enter the name of the movie or tv show you want to watch. Then, we set up the “driver” variable for the web crawling part. The driver variable is the chrome driver that we previously install. we have to use the “executable_path=r” , or else we get an error. Inside that, we enter the address of your chromedriver that you install previously (just copy address as text from your directory).

After that, we use .get method to go to google. When you came to google you have to inspect the element. You can do this with a right click and click inspect on the thing you want to inspect. For this example, the google search bar.

Inspect Element Google Search Box

As you can see from the highlighted part, there’s a name=’q’.

#Enter the keyword
driver.find_element_by_name("q").send_keys(movie + " imdb")
time.sleep(1)

So, we use .find_element_by_name. We use “by_name” because the element we want to grab is the name element. After that, we use .send_keys() to enter the keyword for the google search bar. We enter the name of the movie and “imdb” to search the google. For example, if the movie variable is avengers, the keyword to enter to the google search bar is avengers imdb. The movie variable is the input you enter earlier. We use time.sleep(1) to delay the next driver code 1 second.

After you enter the keyword, you have to click the google search button. Same as before, you click right on the google search button, and click the inspect.

Inspect Element Google Search Button

As you can see from the highlighted part, there’s a name=’btnK’.

#Click the google search button
driver.find_element_by_name("btnK").send_keys(Keys.ENTER)
time.sleep(1)

So, we use .find_element_by_name. We use “by_name” same as before because the element we want to grab is the name element. After that, we use .send_keys(Keys.ENTER) to click the button. We can also use .click() for alternative.

Breaking Bad IMDB Google Search Results

We are going to use the breaking bad tv show for the example. After you search the keyword, you are going to the results page. The first result will be the imdb page for the film. After that, you inspect the link from the web and here’s the inspect element.

Inspect Element for the first url

From the highlighted part, the name of the class is “r”

driver.find_element_by_class_name("r").click()

We use the .click() to click the link. After you click that, it will automatically go to the imdb page of that film.

Breaking Bad IMDB page

After that you select the user reviews. This time i use the xpath to simplify finding the element.

User Reviews Inspect Element
#Click the user reviews
driver.find_element_by_xpath("/html/body/div[3]/div/div[2]/div[5]/div[1]/div/div/div[1]/div[1]/div[1]/a[3]").click()

After we click the user reviews, we jump into the user reviews page.

User Review Page

After we are in the user review page, that’s the end of the web crawling journey.

Web Scraping The Reviews

We use BeautifulSoup package to scrapping the review form the IMDB review page.

#Scrap IMBD review
ans = driver.current_url
page = requests.get(ans)
soup = BeautifulSoup(page.content, "html.parser")
all = soup.find(id="main")

First of all we use the variable “ans” to store the url of the user review page. We use request to get the link and the BeautifulSoup for getting the HTML of the page. We use .find to get all the reviews and store the reviews in the variable.

#Get the title of the movie
all = soup.find(id="main")
parent = all.find(class_ ="parent")
name = parent.find(itemprop = "name")
url = name.find(itemprop = 'url')
film_title = url.get_text()

We can get the title of the movie from the page with finding the right element that store the title of the film.

The Title of the move Inspect Element
#Get the title of the review
title_rev = all.select(".title")
title = [t.get_text().replace("\n", "") for t in title_rev]

After that we can get the title of the review with .select(“.title”) from the element. I use lambda expression for getting all the review title in the page. All the title of the review are stored in the title variable. We use .get_text() for just getting the text from the HTML.

#Get the review
review_rev = all.select(".content .text")
review = [r.get_text() for r in review_rev]

We get the review by using .select(“.content .text”). The .content and .text are from the inspect element. We also use lambda expression like before to store all the review in the variable “review”.

The type of the title and review variables are list.

#Make it into dataframe
table_review = pd.DataFrame({
"Title" : title,
"Review" : review
})

We can make the title of the review and the review it self stored in a dataframe. After we get all the review, let’s start the sentiment analysis.

#Vadersentiment
analyser = SentimentIntensityAnalyzer()
sentiment1 = []
sentiment2 = []

for rev in review:
score1 = analyser.polarity_scores(rev)
com_score = score1.get('compound')
if com_score >= 0.05:
sentiment1.append('positive')
elif com_score > -0.05 and com_score < 0.05:
sentiment1.append('neutral')
elif com_score <= -0.05:
sentiment1.append('negative')

The first one, i use the vadersentiment package. You can learn the vadersentiment here. After that, we use TextBlob.

#TextBlob
for rev in review:
score2 = TextBlob(rev).sentiment.polarity
if score2 >= 0:
sentiment2.append('positive')
else:
sentiment2.append('negative')

After we use the TextBlob sentiment analysis, we can get the result.

The Program Output

According to our program, we should watch breaking bad!

As you can see, the vadersentiment give the positive reviews are 23 and the negative one are 2 reviews. The textblob give all the reviews a positive sentiment. The results are different, but you can make a assumption that breaking bad is good and you should watch it.

After this for a challenge, you can add a driver to click more on the reviews and get all the reviews from IMDB, not just from one page.

Here’s the overall code : Web Scraping and Crawling IMDB Reviews

So, there’s the program you can do to build your own movie recommendation based on IMDB reviews. When you want to watch a movie or a tv show, you can use this program first to help your decision! Enjoy the program!

If you have any trouble or a confusion about any of the step, hit me up on instagram or check out my github.

If you like this kind of content, please let me know! You can find me on instagram or twitter @jeffsabarman.

--

--

Jeffrey Triandi Sabarman
Analytics Vidhya

Just a “keep learning” guy who wants to share his knowledge — Programming enthusiast