How to Scrape data from YouTube using Python: Tutorial

Uma Khanna
Life and Tech
Published in
3 min readDec 20, 2020

I am a fervid Youtube user. Not only have I watched a lot of python tutorials but I also love watching travel vlogs.

So, As I was browsing YouTube content to watch the best traveling vlogs. That’s when my data scientist thought process kicked in. Given my love for web scraping and machine learning, could I extract data about the best travel vloggers?

This picture was captured during my visit to Kanyakumari

As soon as I realized that it was the best opportunity for me to incorporate two of my favorite hobbies. I was in for it the very next moment. This task merely requires web scrapping. which according to me should be a part of every data science enthusiast. It is quite helpful for social media analysis.

  1. Install Selenium
  2. Install Web driver
  3. Scraping data from Youtube
  4. Cleaning extracted data
  5. Analysis of the data

Selenium is a powerful tool for controlling web browsers through programs and performing browser automation. It is the driving force of Selenium that allows us to perform various operations on multiple elements on a webpage.

pip install selenium

Download web driver from the link mentioned below and just in case you encounter any error, make sure the version of google chrome and the web driver is the same.

https://chromedriver.chromium.org/downloads

Now that all the requirements are fulfilled. we can write a script to gather all the information related to the travel vlogs. Mention the path where you have downloaded the chrome web driver.

from selenium import webdriver 
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chrome_path = r”C:\SeleniumDriver\chromedriver.exe”
driver = webdriver.Chrome(chrome_path)

Further, we need to mention the URL from which we can scrape details about the travel vloggers.

driver.get("https://www.youtube.com/results?search_query=travel+vloggers")
user_data = driver.find_elements_by_xpath('//*[@id="video-title"]')
links = []
for i in user_data:
links.append(i.get_attribute('href'))
print(links)

The next part is copying the XPath from the elements. As we need to collect the title, name, likes, dislikes, and views of the links. So, we copy the respective XPath. In Selenium automation, if the elements are not found by the general locators like id, class, name, etc. then XPath is used to find an element on the web page. To copy Xpath, firstly one has to inspect (ctrl+shift+i)the element and then right-click to copy it.

wait = WebDriverWait(driver, 10)for x in links:
driver.get(x)
v_title =wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,”h1.title.style-scope.ytd-video-primary-info-renderer”))).text
v_name = wait.until(EC.presence_of_element_located((By.XPATH,’//*[@id=”text”]/a’))).text
v_likes = wait.until(EC.presence_of_element_located((By.XPATH,’//*[@id=”text”]’))).text
v_dlikes= wait.until(EC.presence_of_element_located((By.XPATH,’//*[@id=”top-level-buttons”]/ytd-toggle-button-renderer[2]/a’))).text
v_views= wait.until(EC.presence_of_element_located((By.XPATH,’//*[@id=”count”]/yt-view-count-renderer/span[1]’))).text

print(v_title)
print(v_name)
print(v_likes)
print(v_dlikes)
print(v_views)

As it prints all the data. You can further save it in a dataframe to do further analysis and conclude some of the top vloggers from the different parameters like the number of views or likes.

I had fun working on social media analysis and the best part is, I have written this blog while traveling. Hope that it was illuminating for all the budding data scientist. Just make sure you are providing the correct path and installing the required libraries beforehand. Have a bug-free coding!

--

--

Uma Khanna
Life and Tech

A Data Scientist by Profession and Autodidact by heart.