Automate Your Browser Tasks With Python
Did you ever wish you could automate some of your browser tasks, such as searching on a webpage, taking screenshots, etc? If yes, then you are in the right place. We are going to walk through a working example, which you would be able to replicate easily.
Note: This article is targeted for both technical and non-technical folks. First, we are going to cover some basics, but don’t fret if you don’t get them all, as those are there just in helping you understand each step you are performing. Even if you don’t get any of these details, in the end, you still would be able to achieve your aim by just copying-and-pasting the code and with some modifications based on your requirements.
Selenium Python Bindings
You can install
selenium like this:
pip install selenium
Selenium requires a driver to interface with the chosen browser. Firefox, for example, requires
geckodriver, which needs to be installed and in the
PATH (e.g., place it is
Links to some of the most popular browser drives follow:
Note: The Selenium server is only required if you want to use the Remote WebDriver.
Selenium server is a Java program,
JRE 1.6 or newer version is recommended to run Selenium server. It can be downloaded from Selenium Website, or you can use Selenium Docker Image. You can get descriptions about various Selenium Docker Images on SeleniumHQ/docker-selenium.
In case of Remote WebDriver, you would need to specify
command_executor (URL for connecting to remote WebDriver) and
desired_capabilities (it is a
dict specifying the capabilities such as
The easiest way to set this up, is by using
selenium/standalone-chrome Docker image. The Remote WebDriver for Chrome can be launched as:
# pull docker image
docker pull selenium/standalone-chrome# run image
docker run --name chrome -p 4444:4444 -v /dev/shm:/dev/shm selenium/standalone-chrome
So, now you have your Selenium server running with Chrome driver!
Using Selenium Python Bindings
Once you are done with setup, you can start using it from Python like this:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities# provide driver details
selenium_driver_url = "http://127.0.0.1:4444/wd/hub"
desired_capabilities = DesiredCapabilities.CHROME# creating instance of the WebDriver
driver = webdriver.Remote(command_executor=selenium_driver_url, desired_capabilities=desired_capabilities)# navigate to provided page url
page_url = "https://www.python.org"
driver.get(page_url)# list of methods available on `driver` (optional)
dir(driver)# invoke full-screen operation
driver.fullscreen_window()# scroll through the page and take screenshots
driver.execute_script("window.scrollBy(0, 500)");# exit the browser
This would generate screenshots similar to following:
You are done! Now, you can modify the script and run it according to your needs.
For example, let’s search for “PEP” on the search box of the page:
# getting info about page (optional)
driver.find_elements_by_class_name("tier-1")# playing with search box (optional)
elem = driver.find_element_by_id("id-search-field")
driver.save_screenshot("screenshot_05.png")# executing a script
driver.execute_script("window.scrollBy(0,1000)"); # scroll vertically by 1000 pixels
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") # scroll to bottom of the page# close the current tab, or exit the browser if has only single tab
How it actually happened?
selenium.webdriver module provides all the WebDriver implementations. Currently supported WebDriver implementations are Chrome, Firebox, IE, and Remote.
The instance of Remote driver is created as
webdriver.Remote(), for Firefox it is created as
<driver>.get(url) method will navigate to a page given by the
url. WebDriver will wait until the page has fully loaded (that is, the “onload” event has fired) before returning control to your test or script.
The WebDriver offers a number of was to find elements using one of the
find_element_by_* methods. Refer Locating Elements for all available methods.
We can also send keys (
send_keys), similar to entering keys using your keyboard. Special keys can be sent using
Keys class imported from
selenium.webdriver.common.keys module. To be safe, it is always best to first clear any pre-populated text in the input field (e.g. “Search”) so that it doesn’t affect our search results.
Finally, the browser window is closed,
close. You can also call
quite method instead of
quite will exit entire browser whereas
close will close one tab, but if just one tab was open, by default most browsers will exit entirely.
As, now-a-days most of the web apps are using AJAX techniques, when a page is loaded by the browser, the elements within that page may load at different time intervals. This makes locating elements difficult: if an element is not yet present in the DOM, a locate function will raise an
ElementNotVisibleException exception. Using waits, we can solve this issue. Waiting provides some slack between actions performed—mostly loacting an element or any other operation with the element.
Selenium WebDriver provides two types of waits:
- implicit wait: It makes WebDriver poll the DOM for a certain amount of time when trying to locate an element.
- explicit wait: It makes WebDriver wait for a certain condition to occur before proceeding further with execution.
A page object represents an area in the web application user interface that your test is interacting.
The WebDriver API defines interfaces of Selenium WebDriver.
Here are some related interesting stories that you might find helpful: