Browser automation using Selenium with python — part I

Sourin Karmakar
3 min readAug 1, 2021

--

Hi Readers, I would like to discuss here how we can automate anything related to the browser or web with selenium using python. This blog will serve as an introduction to the concept. So, anyone new to this domain will find this blog as a starting point. Let’s dive deep into the concepts now.

Introduction

What is Selenium?

Selenium is an open source tool which is used for automating web browser. It allows us to write script in any known programming languages like Java, Python, C# etc. It works across all major OS and also works on all major web browser.

For this blog, we will be using Chrome browser and working on Windows operating system. We will use python for writing selenium scripts and automate a few tasks.

Before automating, we need to install selenium for python and need to set up our environment. It is also expected that you have python basics which include loops, conditional statements, defining a function, data structures (list, tuples etc.).

Setting things up

The first step will be to install python on your system, if python is already installed then you can skip this paragraph. You can download the latest python version from the official website. It is always recommended to use the latest version or any version >3.5. The steps for installing python are discussed here.

After installation is done, open the command prompt and write these lines commands to check if things are working fine.

python --version
'Python 3.8.5'

For installing selenium we write the following command. For more details check this page.

pip install selenium

The next step is to download the selenium webdriver. It is a browser-dependent executable file that acts as a bridge between your script and the browser. We need chromedriver for chrome browser and gecko driver for firefox. For this blog, since we are working with chrome browser, we will focus on chromedriver but the concept remains the same for other browsers.

You can download the Chromedriver from here. Note: The version of chromedriver and your browser’s version must be the same. For example, if you have chrome installed with Version 92.0.4515.107, then you need the chromedriver version starting with ‘92’. Otherwise, it will not work and it will give a compatibility error.

We will discuss about the usage of chromedriver in the following sections.

That’s all you are now ready for writing some test and automation scripts.

Getting started

In this section let’s try out a simple test script and check whether the setup is done properly or not. This test script will open “google.com” and try to search for something.

Now let us understand this code line by line

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

The above lines are for importing the libraries we just now installed using ‘pip’.

driver = webdriver.Chrome(r"D:\\chromedriver.exe")

Here, we are actually connecting to the webdriver. We need to specify the path where chromedriver.exe is present in your computer. And also, we are using webdriver.Chrome to specify that we will be using webdriver for chrome browser or the chromedriver. This function returns a driver object, which represents the browser. Now we can access all the web elements and access the functions related to browser navigation.

driver.maximize_window()

As mentioned, we can control our browser window using the driver object, here we are maximizing the browser window.

driver.get(r"http://www.google.com")

Here we are using driver’s get method to access google.com

search = driver.find_element_by_name("q")

Here we are accessing the textbox for typing our search query in google.com. There are multiple ways to get the textbox (or any web element like table, textbox, paragraph etc.). In this case, we are accessing the textbox using the name of the textbox. We can use ‘id’, ‘class’, ‘XPATH’ to access the targetted web element. I will discuss more about web elements, access methods, XPATH in a separate post because it needs some detailed explanation regarding HTML tags and attributes, DOM(Document Object Model) structure.

search.send_keys("usd to inr")
search.send_keys(Keys.ENTER)

After accessing the textbox, we need to write a search query. Here, we are sending “usd to inr” as our search text. The send_keys method is used to send some keyboard input like text or some keys like ENTER.

The next line is to press enter after typing the search query.

driver.close()

This line will close the browser window.

Congrats! we are done with our first script.

What’s Next

So, in this part we covered the pre-requisites, setting up the environment and ran a basic script. In the upcoming part, we will explore more on HTML DOM structure, accessing web elements and more functionalities that selenium offers.

--

--