Scrape Bing Videos with Python

Dmitriy Zub ☀️
Geek Culture
Published in
3 min readApr 25, 2021

--

Contents: intro, imports, what will be scraped, process, code, extracting all videos, links, outro.

Intro

This blog post was requested by Rosalex A and is a follow-up to the Bing web scraping series. Here you’ll see how to scrape Video Results using Python with beautifulsoup, requests, lxml, and selenium libraries.

You’ll see two approaches:

  • scrape up to 10 video results with beautifulsoup.
  • scrape all video results using selenium by scrolling to the end of search results.

Note: This blog post assumes that you’re familiar with beautifulsoup, requests, and selenium libraries as well as have basic experience with CSS selectors.

Imports

from bs4 import BeautifulSoup
import requests, lxml
from selenium import webdriver

What will be scraped

Process

Selecting CSS selectors for container, title, video URL, number of views, channel name, platform, date published. CSS selectors reference.

Code

Extracting all videos

To extract all videos we need to scroll the page downwards, then click on the button when it appears and scroll down again to the end of video results.

To scroll down with Selenium you can use .send_keys() method.

driver.find_element_by_xpath('xpath').send_keys(Keys.KEYBOARD_KEY) 
# in our case .send_keys(Keys.END) to scroll to the bottom of the page

You can also execute the script by calling the DOM as I did in YouTube Search web scraping series, but it didn’t work for me in this situation, or I was doing something wrong.

driver . execute_script ( "var scrollingElement = (document.scrollingElement || document.body);scrollingElement.scrollTop = scrollingElement.scrollHeight;" ) # https://stackoverflow.com/a/57076690/15164646 (contains several references)

Then, clicking. To click you have only one option: .click() method.

driver.find_element_by_css_selector('.selector').click()

Full code to extract all video results

The GIF is sped up by a 1550%

Links

GitHub Gist for first 10 results GitHub Gist for all video results

Outro

Selenium isn't the fastest solution, but it gives an ability to scroll until no more results are shown. Other steps might be to add async functionality to this code.

If you found that something isn’t working correctly or if you want to see how to scrape something with Python/Ruby I didn’t write about yet, write me a message.

Originally published at https://dev.to on April 25, 2021.

--

--

Dmitriy Zub ☀️
Geek Culture

Environment art: Houdini, Blender, Substance, Unreal. Former Developer Advocate at SerpApi.