How to Scrape data from YouTube using Python: Tutorial
I am a fervid Youtube user. Not only have I watched a lot of python tutorials but I also love watching travel vlogs.
So, As I was browsing YouTube content to watch the best traveling vlogs. That’s when my data scientist thought process kicked in. Given my love for web scraping and machine learning, could I extract data about the best travel vloggers?
As soon as I realized that it was the best opportunity for me to incorporate two of my favorite hobbies. I was in for it the very next moment. This task merely requires web scrapping. which according to me should be a part of every data science enthusiast. It is quite helpful for social media analysis.
- Install Selenium
- Install Web driver
- Scraping data from Youtube
- Cleaning extracted data
- Analysis of the data
Selenium is a powerful tool for controlling web browsers through programs and performing browser automation. It is the driving force of Selenium that allows us to perform various operations on multiple elements on a webpage.
pip install selenium
Download web driver from the link mentioned below and just in case you encounter any error, make sure the version of google chrome and the web driver is the same.
https://chromedriver.chromium.org/downloads
Now that all the requirements are fulfilled. we can write a script to gather all the information related to the travel vlogs. Mention the path where you have downloaded the chrome web driver.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ECchrome_path = r”C:\SeleniumDriver\chromedriver.exe”
driver = webdriver.Chrome(chrome_path)
Further, we need to mention the URL from which we can scrape details about the travel vloggers.
driver.get("https://www.youtube.com/results?search_query=travel+vloggers")
user_data = driver.find_elements_by_xpath('//*[@id="video-title"]')links = []
for i in user_data:
links.append(i.get_attribute('href'))
print(links)
The next part is copying the XPath from the elements. As we need to collect the title, name, likes, dislikes, and views of the links. So, we copy the respective XPath. In Selenium automation, if the elements are not found by the general locators like id, class, name, etc. then XPath is used to find an element on the web page. To copy Xpath, firstly one has to inspect (ctrl+shift+i)the element and then right-click to copy it.
wait = WebDriverWait(driver, 10)for x in links:
driver.get(x)
v_title =wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,”h1.title.style-scope.ytd-video-primary-info-renderer”))).text
v_name = wait.until(EC.presence_of_element_located((By.XPATH,’//*[@id=”text”]/a’))).text
v_likes = wait.until(EC.presence_of_element_located((By.XPATH,’//*[@id=”text”]’))).text
v_dlikes= wait.until(EC.presence_of_element_located((By.XPATH,’//*[@id=”top-level-buttons”]/ytd-toggle-button-renderer[2]/a’))).text
v_views= wait.until(EC.presence_of_element_located((By.XPATH,’//*[@id=”count”]/yt-view-count-renderer/span[1]’))).text
print(v_title)
print(v_name)
print(v_likes)
print(v_dlikes)
print(v_views)
As it prints all the data. You can further save it in a dataframe to do further analysis and conclude some of the top vloggers from the different parameters like the number of views or likes.
I had fun working on social media analysis and the best part is, I have written this blog while traveling. Hope that it was illuminating for all the budding data scientist. Just make sure you are providing the correct path and installing the required libraries beforehand. Have a bug-free coding!