Automate Your Browser Tasks With Python

Did you ever wish you could automate some of your browser tasks, such as searching on a webpage, taking screenshots, etc? If yes, then you are in the right place. We are going to walk through a working example, which you would be able to replicate easily.

Munish Goyal
The Startup
5 min readJul 18, 2020

--

Note: This article is targeted for both technical and non-technical folks. First, we are going to cover some basics, but don’t fret if you don’t get them all, as those are there just in helping you understand each step you are performing. Even if you don’t get any of these details, in the end, you still would be able to achieve your aim by just copying-and-pasting the code and with some modifications based on your requirements.

Selenium Python Bindings

Through Selenium Python API you can access all functionalities of Selenium WebDriver in an intuitive way. It provides access to Selenium WebDrivers like Firefox, Ie, Chrome, Remote, etc.

You can install selenium like this:

pip install selenium

Browser Drivers

Selenium requires a driver to interface with the chosen browser. Firefox, for example, requires geckodriver, which needs to be installed and in the PATH (e.g., place it is /usr/bin or /usr/local/bin).

Links to some of the most popular browser drives follow:

Selenium Server

Note: The Selenium server is only required if you want to use the Remote WebDriver.

Selenium server is a Java program, JRE 1.6 or newer version is recommended to run Selenium server. It can be downloaded from Selenium Website, or you can use Selenium Docker Image. You can get descriptions about various Selenium Docker Images on SeleniumHQ/docker-selenium.

In case of Remote WebDriver, you would need to specify command_executor (URL for connecting to remote WebDriver) and desired_capabilities (it is a dict specifying the capabilities such as {'browserName': 'htmlunit', 'version': '2', 'javascriptEnabled': True}.

The easiest way to set this up, is by using selenium/standalone-chrome Docker image. The Remote WebDriver for Chrome can be launched as:

# pull docker image
docker pull selenium/standalone-chrome
# run image
docker run --name chrome -p 4444:4444 -v /dev/shm:/dev/shm selenium/standalone-chrome

So, now you have your Selenium server running with Chrome driver!

Using Selenium Python Bindings

Once you are done with setup, you can start using it from Python like this:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
# provide driver details
selenium_driver_url = "http://127.0.0.1:4444/wd/hub"
desired_capabilities = DesiredCapabilities.CHROME
# creating instance of the WebDriver
driver = webdriver.Remote(command_executor=selenium_driver_url, desired_capabilities=desired_capabilities)
# navigate to provided page url
page_url = "https://www.python.org"
driver.get(page_url)
# list of methods available on `driver` (optional)
dir(driver)
# invoke full-screen operation
driver.fullscreen_window()
# scroll through the page and take screenshots
driver.save_screenshot("screenshot_01.png")
driver.execute_script("window.scrollBy(0, 500)");
driver.save_screenshot("screenshot_02.png")
driver.execute_script("window.scrollBy(0, 500)");
driver.save_screenshot("screenshot_03.png")
driver.execute_script("window.scrollBy(0, 500)");
driver.save_screenshot("screenshot_04.png")
driver.execute_script("window.scrollBy(0, 500)");
# exit the browser
driver.close()

This would generate screenshots similar to following:

screenshot_01.png:

screenshot_02.png:

screenshot_03.png:

screenshot_04.png:

You are done! Now, you can modify the script and run it according to your needs.

For example, let’s search for “PEP” on the search box of the page:

# getting info about page (optional)
driver.title
driver.current_url
dirver.page_source
driver.find_elements_by_class_name("tier-1")
# playing with search box (optional)
elem = driver.find_element_by_id("id-search-field")
elem.clear()
elem.send_keys("PEP")
elem.send_keys(Keys.RETURN)
driver.save_screenshot("screenshot_05.png")
# executing a script
driver.execute_script("window.scrollBy(0,1000)"); # scroll vertically by 1000 pixels
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") # scroll to bottom of the page
# close the current tab, or exit the browser if has only single tab
driver.close()

screenshot_05.png:

How it actually happened?

The selenium.webdriver module provides all the WebDriver implementations. Currently supported WebDriver implementations are Chrome, Firebox, IE, and Remote.

The instance of Remote driver is created as webdriver.Remote(), for Firefox it is created as webdriver.Firefox(), etc.

The <driver>.get(url) method will navigate to a page given by the url. WebDriver will wait until the page has fully loaded (that is, the “onload” event has fired) before returning control to your test or script.

The WebDriver offers a number of was to find elements using one of the find_element_by_* methods. Refer Locating Elements for all available methods.

We can also send keys (send_keys), similar to entering keys using your keyboard. Special keys can be sent using Keys class imported from selenium.webdriver.common.keys module. To be safe, it is always best to first clear any pre-populated text in the input field (e.g. “Search”) so that it doesn’t affect our search results.

Finally, the browser window is closed, close. You can also call quite method instead of close. The quite will exit entire browser whereas close will close one tab, but if just one tab was open, by default most browsers will exit entirely.

Waits:

As, now-a-days most of the web apps are using AJAX techniques, when a page is loaded by the browser, the elements within that page may load at different time intervals. This makes locating elements difficult: if an element is not yet present in the DOM, a locate function will raise an ElementNotVisibleException exception. Using waits, we can solve this issue. Waiting provides some slack between actions performed—mostly loacting an element or any other operation with the element.

Selenium WebDriver provides two types of waits:

  • implicit wait: It makes WebDriver poll the DOM for a certain amount of time when trying to locate an element.
  • explicit wait: It makes WebDriver wait for a certain condition to occur before proceeding further with execution.

Page Objects:

A page object represents an area in the web application user interface that your test is interacting.

WebDriver API:

The WebDriver API defines interfaces of Selenium WebDriver.

Here are some related interesting stories that you might find helpful:

--

--

Munish Goyal
The Startup

Designing and building large-scale data-intensive cloud-based applications/APIs.