THE COMPREHENSIVE GUIDE

Let’s take a look at: Selenium

What is Selenium, and how do you use it?

Stephen Boyd

Published in

The Startup

14 min readJun 20, 2019

What is Selenium?

What do most people who have heard of Selenium think about when talking about Selenium? It’s something that I’ve heard time and time again by those who don’t quite understand it.

That thing for running automated tests.

While that is correct, Selenium is a little bit more than that. There are many different parts, layers, and tools that makeup Selenium.

Here is the article that I wish I had when first looking into writing automated tests.

Selenium IDE
Selenium RC
Selenium WebDriver
— BrowserDriver
Selenium Server
Selenium Grid
Frameworks
Getting Started
— Playing with the BrowserDriver
— Setting up Selenium Server with Chromedriver
— Creating a Node.js Project

Selenium IDE

The Selenium IDE is actually a browser plugin.

When you open this plugin you get a screen that looks like this:

From here you can create new tests. When you create a new test it will monitor your actions in a web browser.

Then it will save these actions to run the test again. While the IDE can be useful for many different features, large scale automation suites are not created this way. It does not support conditionals or other features that programming languages have.

There are some pretty awesome features, so I would recommend this if you know your application is not dynamic, but static.

I do not use the IDE, but if you do and have used it in complex apps please let me know in the comments! I would love to hear about them.

Selenium RC

The Selenium Remote Control (RC), also known as Selenium 1.0, would inject JavaScript into the browsers.

We won’t go over this too much since the Selenium WebDriver has replaced it.

Selenium WebDriver

The Selenium WebDriver is the successor of Selenium RC.

The Selenium WebDriver is an API, and there are many languages that can use it. This integrated support for each language is just a dependency that can communicate with browser drivers.

In a very simple explanation, the Selenium WebDriver sends HTTP requests (RESTful API) to your browsers driver and handles the responses.

Browser Drivers

Each browser works and behaves differently. We can’t expect Selenium to keep up with how every single browser works. So we need some sort of standard functionality to control the browsers.

Due to this, each browser must maintain its own implementation of the WebDriver standard.

These drivers are web servers that control the browsers and are created in accordance with the WebDriver specifications specified within the W3C standards.

Shameless plug for understanding the W3C

So what’s the difference between the Selenium WebDriver and the browser drivers?

Here is where a lot of confusion comes from. The WebDriver standard vs the Selenium WebDriver. Two different things, with a very similar name.

The Selenium WebDriver is an API created by the Selenium team to interact with any browser driver that is created with the WebDriver standard.

The WebDriver standard is a specification used for the creation of browser drivers. The browser driver is the actual implementation of the WebDriver standard that communicates with the browser.

The Selenium WebDriver is the API dependency that talks to the implementations of the WebDriver standard.

I will be using the term BrowserDriver to refer to all implementations of it, including ChromeDriver, GeckoDriver, etc… Selenium WebDriver will be the API.

In your code, you will import the Selenium WebDriver dependency. This alone is not enough to start using the browser you want.

Side note: You don’t always need to import the Selenium WebDriver API. As long as you have something that can make the HTTP requests to the BrowserDriver and is following the W3C standards. Like a Framework.

You also need to download the BrowserDriver executable for the browser you wish to use. The BrowserDriver is a web server that runs and can talk to your browser.

The BrowserDriver can:

Start the browser
Run commands on the browser (ex. clicking or typing)
Take screenshots
Upload files
Much more…

BrowserDrivers:
Chrome: ChromeDriver
Firefox: GeckoDriver
IE: EdgeDriver
There are more drivers available, just a quick Google search away.

Here is how it works:

Diagram showing Client connecting with the BrowserDriver that connects to the Browser

Since I use JavaScript mostly, here is an example of the selenium-webdriver package :

await driver.findElement(By.name('q')).click()

This code sends a request to the BrowserDriver to find the element named q
The BrowserDriver then sends a request to the browser asking for an element named q
The browser returns the element that it found to the BrowserDriver
The BrowserDriver then returns the element that it found to the code
The code then sends a request the BrowserDriver to click on the element it received
The BrowserDriver sends that element and the requested action of click to the browser
The browser returns a status code and message based on the result of the click to the BrowserDriver
The BrowserDriver returns the status code and message to the code for it to handle

Once you have the BrowserDriver downloaded, you can run it. You should get an output similar to this:

The output of starting the ChromeDriver

As we can see, this BrowserDriver has started up a web server running on Port 9515.

To see what the requests to the BrowserDriver look like, here is a list (This WebDriver W3C Living Document has replaced the JSON Wire Protocol): https://w3c.github.io/webdriver/#endpoints

If you have that BrowserDriver running you can make your own HTTP requests to it! There is a guide in the Getting Started portion of the guide!

Selenium Server

For a second, I want you to think about running multiple tests at a time using just the Selenium WebDriver and the BrowserDrivers. What would that look like?

Something like this:

Diagram showing the connection of many BrowserDrivers to the Client

This looks a little messy and it leaves it up to your Client to manage all of these different driver sessions, and ensure that they are all acting accordingly. This also means that all these browsers and BrowserDrivers will be running on the Client machine!

One major problem I have seen is that sometimes the BrowserDrivers do not close and remain running on your machine.

A Selenium Server is a standalone web server that will run wherever you want. Instead of the Client controlling the BrowserDrivers directly, you can offload this work to the server.

So when you are writing your tests, you tell the Client to send the requests to the Selenium Server. Each language/framework has its own way of connecting the Client to the Server.

When creating a session with the Selenium Server, you send it desired capabilities. These capabilities tell the Selenium Server everything it needs to know such as which browser you want to use, which operating system and much more.

The Selenium Server then handles all of your requests and handles each BrowserDriver. This also includes automatically shutting the browsers down if something went wrong.

Why might this be useful? A great reason for doing this is to run the tests on a remote machine, or a VM. This allows you to keep your tests from overloading your Client machine.

This is what it would look like with a Selenium Server in place:

Diagram of the Client connecting to a remote Selenium Server

Selenium Grid

Using the Selenium Server looks a lot better than just the Client running BrowserDriver instances. At least the dedicated server takes over the workload. But what if we want to run a lot of tests at the same time that would overwork the server, or we want our tests to run on different Operating Systems?

In comes Selenium Grid to make it even more complex, but so much better.

The Selenium Grid creates a connection of many different Selenium Servers (called nodes) and brings them all into one place (called the hub).

The Grid works by setting up two things: the hub, and the node.

Hub

The hub is a web server where you send all your requests.

It can either be running on your local machine, a remote server, or a VM. It also can run on a variety of Operating Systems, anything that supports Java.

The hub takes those requests and forwards them to the correct node. It is like a mailroom. It figures out where each request needs to go and sends them there.

You can specify which Operating Systems and browser you want to use in your code when you are creating your tests. The hub will look for any attached nodes that run on that operating system that has that browsers BrowserDriver.

Node

A node is like a Selenium Server since it has similar functions. It manages the BrowserDrivers and passes requests to them.

Like the hub, nodes can be created on your local machine, a remote server, or a VM. It also can run on a variety of Operating Systems, anything that supports Java. So you could have one Node on MacOS, another on a Windows Server, etc…

You can attach many different nodes to a hub, and these nodes can be on many separate machines. This way you are dividing the work between more than one machine.

You can also set up a node to handle many BrowserDrivers at once. That way it can run tests in Firefox, Edge, Chrome, IE, etc…

Here is how it works:

Diagram of the Client connecting to the Selenium Hub, which is connected to multiple nodes

As you can see, we have many nodes set up with a different amount of drivers available to them.

This process offloads the work from the Client machine, and onto different machines to handle the work. If I had this setup, I could execute multiple tests at a time on many different machines, even really crappy laptops.

Frameworks

As a quick side note, there are many different frameworks out there that are built on Seleniums WebDriver specification or the selenium-webdriver module. Since I primarily work with Node.js, all my mentioned frameworks are also Node.js frameworks.

Frameworks tend to do all the heavy lifting for you. They will start up your local Selenium Server, or your direct connections to the BrowserDrivers, they will create their own commands for you to use that also do all the heavy lifting for you. I would recommend using one of these great frameworks instead of building it all from scratch!

WebdriverIO: I am a big advocate of WebdriverIO. They don’t use already created tools such as the selenium-webdriver module to communicate with the WebDriver. Instead, they have created their own implementation in regards to the WebDriver specifications. Great team of people, and an awesome tool!

Protractor: Protractor is built on top of the selenium-webdriver module. It was designed specifically for Angular applications. However, any other framework also works with Angular applications of course.

Nightwatch.js: I can’t say too much about Nightwatch, considering I have never used it. But like WebdriverIO, they have also created their own implementation in regards to the WebDriver specifications. I’ve heard great things about it!

While I am an advocate of WebdriverIO, find which one fits your needs and wants better. You’ll be happy that you did!

This is by no means a comprehensive list, do your research if you want to use a framework!

How do I get started?

BrowserDriver API

Before we dive into setting up an actual project. I wanted to go over how to make HTTP requests to a BrowserDriver.

If you don’t care about this part, then you can skip on ahead to Creating the Project. This is for those who want to understand how all that code communicates with the BrowserDriver on a deeper level.

I will be showing an example of the ChromeDriver, and how it accepts HTTP requests and controls the browser.

Downloading the BrowserDriver

Find the version of Chrome you are using to find the correct ChromeDriver version.
http://chromedriver.chromium.org/downloads

Starting the ChromeDriver

This step is pretty easy. Just extract the .exe file out. And run it! You should see something like this. Notice the Port number it is running on. For me it is 9515.

The output of starting the ChromeDriver

Making your requests

First things first, open Postman or something similar.

To start controlling your browser you first need a session. To create your session, make a POST request to your newly running server like this:

POST requesthttp://localhost:9515/session
Body of your POST request{
  “capabilities”: {
    “browserName”: “chrome”
  }
}

Once you make the POST request, you should see Chrome open up. In the response body, you should see the sessionId , hang onto that because you will need it. Mine was: 3b784ca884984ecce8868025285021c0

Now we are controlling Chrome. Let’s navigate to Google!

We will be making a POST request, that contains our session ID in the URL like this:

POST requesthttp://localhost:9515/session/3b784ca884984ecce8868025285021c0/urlBody of your POST request{
  "url": "https://www.google.com"
}

You should see Chrome navigate to Google.

Next, let’s find an element that is on the page. Let’s get the search bar!

POST requesthttp://localhost:9515/session/3b784ca884984ecce8868025285021c0/elementBody of your POST request{
  "using": "xpath", 
  "value": "//input[@name='q']"
}

We should see a return like this:

Return Value
{
  "value": {
    "element-6066-11e4-a52e-4f735466cecf": "b8363554-d3a1-4342-b1f1-92ae2d492c14"
  }
}

Let’s keep track of that element value so we can type into the search bar.

Now we want to type into that element with:

POST requesthttp://localhost:9515/session/3b784ca884984ecce8868025285021c0/element/b8363554-d3a1-4342-b1f1-92ae2d492c14/valueBody of your POST request{
  "text": "selenium automation"
}

We should see selenium automation entered into the search bar!

That’s it for now on the WebDriver, but if you want to see more of these API calls, check out the documentation here: https://w3c.github.io/webdriver/#endpoints

Creating the Project

I know I said I recommend using an automation framework instead of building it yourself, but I do believe it is important to understand how it all works. So that is what we are going to do here.

The first step of figuring out how to start is figuring out what your needs are.

For simplicity sake, we will be running one test at a time, so we don’t quite need a Selenium Grid. But we are going to use a Selenium Server.

We will be setting up this Selenium Server locally, but the same principles apply when setting it up on a remote machine.

As I mentioned earlier, I usually use JavaScript, so we will be making this with Node.js. So if you want to follow along, you will need to install Node. I am using version 10.16 at the time of this writing.

To use the Selenium Server you will also need Java.

I won’t be using shortcut methods (such as this package: selenium standalone in order to allow everyone (including people who use other languages) to follow along with the Selenium Server setup.

Starting the Selenium Server with ChromeDriver

Download the Selenium Standalone Server:

Download link: https://www.seleniumhq.org/download/

Download the ChromeDriver

Download Link: http://chromedriver.chromium.org/

Create your project folder

Create a folder where you want your project to be
Extract the chromedriver.exe file from the downloaded zip file
Place the selenium-server-standalone-versionNumber.jar file and the chromedriver.exe file in your project folder

Start the Selenium Server

Open your favorite terminal and navigate to your project folder
Run the following: java -Dwebdriver.chrome.driver=chromedriver.exe -jar selenium-server-standalone-*versionNumber*.jar . Replace *versionNumber* with your versions number.
You should see something like: Selenium Server is up and running on port 4444

The command in Step 2, runs the selenium-server-standalone with the option of using the ChromeDriver.

If you navigate to http://localhost:4444/wd/hub/static/resource/hub.html, you should see something like this:

Fig 2.0 Screen capture of the Selenium Server web interface

Creating your Node.js Project

Create your initial project

Open your favorite IDE or text editor
Create a new project with npm init in your project folder
Fill out the npm init questions
Install selenium-webdriver dependency with:
npm i selenium-webdriver --save
Install mocha, andchai with:
npm i mocha chai --save

Writing your test (explanation below)

Gist from my GitHub account

Code Explanation

selenium-webdriver dependency:

const {until, By, Builder, Capabilities, Key} = require(‘selenium-webdriver’);

until: Used in conjunction with driver.wait . It will wait until a condition is met
By: a mechanism for locating elements on the page
Builder: Used to create a new WebDriver instance
Capabilities: Contains sets of capabilities for WebDriver sessions
Key: This object contains key values that are not text, such as pressing the Enter key

This is the Selenium WebDriver.

A full list of what the selenium-webdriver object contains...
Look here: https://seleniumhq.github.io/selenium/docs/api/javascript/module/selenium-webdriver/index.html

chai dependency:

const chai = require(‘chai’);
const {expect} = chai;

Next, we import our assertion library, Chai. If you have never used Chai before, I recommend checking it out!
We then use the Chai library to create the expect object.

Creating our driver

const driver = new Builder()
  .withCapabilities(Capabilities.chrome())
  .usingServer(‘http://localhost:4444/wd/hub')
  .build();

The driver is the API we use to make requests to our BrowserDriver. We are using the Builder to create ourselves a driver.

We set the default Chrome capabilities with Capabilities.chrome()

We set the driver to use a Selenium Server with usingServer, and giving it the location that our Selenium Server is running on. Which for us is http://localhost:4444/wd/hub

Then we build our driver!

Creating our Mocha test

If you are not familiar with Mocha, go check it out! There are many great guides on how to use it.

describe(‘Google Test’, function() {

We give our Mocha test the name Google Test .

this.timeout(0);

We remove the timeout by setting it to 0 . We do this because the default timeout is 2000 milliseconds. This is usually too short of a time for a UI test to run. You can set the timeout to whatever you want, but for now, I’m just disabling it.

beforeEach(async function() { 
  await driver.get(‘https://www.google.com'); 
  await driver.wait(until.elementLocated(
    By.xpath(`//input[@name=’q’]`)), 20000); 
});

We use the beforeEach function to have the driver navigate to Google’s homepage, and wait until the search bar is on the page.

it(‘Use searchbar’, async function() { 
  await driver.findElement(By.xpath(`//input[@name=’q’]`)).click();  await driver.findElement(
    By.xpath(`//input[@name=’q’]`)
  ).sendKeys(‘selenium automation’, Key.RETURN);  await driver.wait(
    until.elementIsVisible(driver.findElement(
      By.xpath(`//div[@class=’g’]`))
  ), 5000);   const firstResult = await driver.findElement(
    By.xpath(`//div[@class=’g’]//h3`)
  ).getText();  expect(firstResult.toLowerCase()).to.contain(‘selenium’); 
});

We then start our first test. This test clicks on the search bar, types in selenium automation and presses Enter. It then waits for the first result to be displayed by way of XPath //div[@class='g']. It will wait 5 seconds for this result.
The test will then grab the title of the first result, and make sure that it contains the text selenium.

after(async function() { 
  await driver.quit(); 
});

After all of our tests have run, the after method gets called. This calls driver.quit(). We must call driver.quit() otherwise the driver and browser session will also remain open.

Running the test

In our package.json file, we want to add a script called test.
We want this test script to run mocha test.js:
"test": mocha test.js

In your terminal run the test script:
npm run test

You should see:

Your browser open
Navigation to Google
Typing ofselenium automation in the search bar
Google search executed
Google search results

After the browser has closed you should see this in your console:

Congrats! You have now set up a Selenium Server, and created Automated UI tests to use that server!

Final Thoughts

So we went over a lot of information here. We covered different components of Selenium, including the Selenium IDE, Selenium RC (kinda), Selenium WebDriver, WebDriver implementations (BrowserDriver), Selenium Server, Selenium Grid, and how to create your own automated tests running against a Selenium Server!

If I missed anything please let me know in the comments below, I’d love to be able to clarify or adjust a few things!

P.S. For all the nitty-gritty details of the WebDriver standard, read the documentation here! https://w3c.github.io/webdriver/