Scraping Dynamic Websites Through Remote WebDrivers, Selenium Part-IV

Irfan Ahmad
Geek Culture
Published in
3 min readDec 25, 2021

In first three parts I used selenium locally for web scraping. This time I am going to use a ‘Remote WebDriver’ to scrap some data.

Why Remote WebDriver?

Sometimes we need to run our scraper script on some CLI servers. So, selenium’s webdrivers won’t work there. In that case we may use remote webdriver, a webdriver hosted somewhere else. We just need to access it through docker.

Let’s Start

All the selectors and the script is same. The only change is the webdriver’s location. Let’s have a look at our script employing the remote driver.

The script contains a ‘scraper’ class with a method ‘open_support’. The ‘open_support’ takes two arguments, an object of ‘scraper’ class and a name of the platform. The method will then find a support URL of the platform specified in the name.

What’s New Here?

While scraping through local selenium we defined driver like this.

from selenium import webdriver
from selenium.webdriver import Chrome, ChromeOptions, Remote, FirefoxOptions

In the current script, we defined the ‘driver’ in the constructor method (__init__ method) like this;

self.driver = Remote( command_executor=’http://localhost:4444/wd/hub’, options=options )

The only change we made for web scraping remotely is in defining ‘driver’.

In ‘Remote’ the argument ‘command_executor’ refers to the URL or address of the “container” that hosts ‘selenium’.

Container!!!

Briefly, a container is like a PC that may be anywhere on Earth. It’s a PC without any hardware to care about. More on containers, may be latter.

We are done with scripting. Now we need a container that we will access through ‘command_executor’.

Let’s Prepare Container for Remote WebDriver

1- First we need to install ‘Docker’, a famous container provider. A tutorial about ‘How to install docker in Ubuntu 20.04’ can be found here.
2- Pull selenium container for chrome with

docker pull selenium/standalone-chrome

While pulling the container image, the Terminal displays some details about the image being pulled. It also displays a ‘tag’ showing the version of the image, usually latest if not specified.

image pulling a docker image

The image above is pulling some docker image. In its second line

Using default tag: latest

displays its tag as ‘latest’.
For our ‘selenium’ projects we call this tag as ‘flag’.
3- Start the docker service with

sudo service docker start

4- Initiate the container image with

docker run -d -p 4444:4444 --shm-size="2g" selenium/standalone-chrome:latest

You may replace ‘latest’ with flag of your image.
Detailed instructions about selenium containers can be found here.

Let’s Run the Scraper

> python scraper.py

6- When the scraper is done, it’s better to stop the container. You can either first stop the container and then docker service

docker container ls

Copy container id from the output then type the following and press enter.

docker stop container-id  

Replace container-id with the id you just copied.

sudo service docker stop # stop docker service

OR directly stop the docker service.

sudo service docker stop

This is it for today. We used a remote webdriver for scraping websites.

If you have not yet read then you may like to read Scraping A Dynamic Website, Selenium Part-I, Part-II and Part-III.

Happy Scraping!

--

--

Irfan Ahmad
Geek Culture

A freelance python programmer, web developer and web scraper, data science and Bioinformatics student.