Beyond Webdriver

Thoughts on improving website testing design

I was going to start this by mis-quoting Churchill:


1. Code Independence

One of the things that Selenium offers is a Page Object Model. This to an engineer may sound great. I’ve previously implemented something similar myself. The basic concept is to provide an automation API on top of each page of your app to allow tests to interact with it. This idea becomes more seductive when you have non-developer testers around who have to test your app. System details and complexity can behind a simple API and the testers can just call simple functions. This is flawed for a number of reasons.

public HomePage loginAs(String username, String password) { … }
FIVE = 5
FOUR = 6
def enterprise_max(a, b):
Takes two T_INTEGER and returns the one with the higher
return b if a is FOUR else a
def test_enterprise_max():
assert enterprise_max(FIVE, FOUR) == FIVE
The POM is a formal API that must be implemented and maintained by (expensive) engineers

2. Resumability

I prefer my unit test runners to start in less than a second, and to run a single unit test in around 0.1s (on average). This is important for producing high quality code fast, and allowing fearless development.

3. Understandable Scripts

This is an easy one, all tests should be understandable right? But for browser testing, this is arguably much more important than for other types of tests.

4. Test Semantics

Selenium tests should be about proving (for example) ‘a user can log-in to the site’ and not ‘the site can log-in a user’. Logically these are almost identical, but semantically they’re very different.

menubar = thing.find_by_css_selector(“#menubar”)


I went into some detail on these requirements to try and outline why I think they are important, if you have any comments or questions please leave them, or tweet me @stestagg.

Existing Tools

Native Selenium (Webdriver)

Webdriver provides a lot, it covers the various servers for driving web browsers, and various client APIs. It’s great at automating browsers, but not so great at writing more complex tests:

  • Resumability — As a raw driver, any resumability functionality are delegated to the client language, and while there are some methods to achieve this, they don’t really count in this context.
  • Understandable scripts — I’ve written many selenium scripts in Python, and none of them have been things I would want to share with anyone non-technical.
  • Test Semantics — The webdriver API actually has a few methods that I would call semantically relevant: things like click and back, but also: find_element_by_link_text is a good one. The number of useful methods however is too small to write good tests with. It’s too easy to just revert to using the find_element_by_css_selector or similar


The various BDD testing tools out there allow you to define user-readable scripts with ease, and many people use them on top of webdriver to write beautiful test cases, but I feel that there’s still some things missing:

When(/^I enter "(.*?)" into the search field$/) do |arg1|
  • Understandable scripts — This is where BDD tools really shine, the test output is clear, and readable (usually). Getting non-technical people to write tests using BDD tools is theoretically possible, but actually quite complex, the language is more strict than people may be used-to, grammars tend to be sensitive to whitespace etc, and unless you have a truly abstracted BDD language, you’re really just using a POM with all the associated communication issues.
  • Test Semantics — Good but not great, because steps are behind-the-scenes implemented in native app-specific code, they will naturally tend to interact with the app internals more than they should.

Selenium IDE

  • Resumability — Not bad, the script can be edited dynamically, and individual steps re-run, but the UI is flawed and it doesn’t seem to let you easily resume a partially complete test run without running each step individually
  • Understandable scripts — The test scripts can be output in a variety of formats, but the IDE never knows the things that are needed to be able to describe what is happening in a useful way. clickAndWait id=btnG will never mean anything to someone not involved in writing the system.
  • Test Semantics — The methods available in the IDE are the same as provided by the underlying Selenium API, so semantically they are poor, elements are referenced by IDs and paths.


I haven’t used Watir before, and it looks like a great tool for doing this sort of thing. But from my brief overview of the examples and documentation, it seems to help with, but not concretely address the requirements listed above.


That somewhat lengthy introduction paints a picture of my understanding of the current selenium landscape. I think that with some careful design and a few lines of code, something amazing could emerge. Here are my thoughts so far, however unorganised they may seem.

Basic system organisation

1. An action list

Each test script should be composed of nested list of operations, in some abstract language. I’m proposing a JSON array.

action: "find",
startswith: "I agree to the terms",
and: [
action: "click"
action: "remember_text",
as: "disclaimer text"
  1. It can be easily serialised and parsed by any language or runner
  2. It defines a common, well-defined API that can be publicly shared
  3. By designing the actions carefully, very complex scripts can be built-up with reusable components without compromising on the overall readability and structure.

A web-based designed

This is a crucial element of the system, by building a web-based tool for developing scripts, the system can really easily be made interactive, and cross-platform.

Example of how the Web based editor may look, with live-updating browser screenshots, and ability to run steps on-demand

A command-line runner

Because the language is well-defined, and self-contained, a script can be passed to a very simple command-line runner that can execute the script against different browsers, and without human interaction.

Cucumber adapter

It would also be possible to write an adapter that converted the built-in actions into BDD style grammar, and then translated between that and the underlying JSON format.

A User-oriented vocabulary

A fairly small collection of ‘things that people might want to do’ should cover most cases, this list has a lot of overlap with the selenium API, but the differences are crucial.

Identify [toolbar] by css [#header #toolbar]
In [toolbar]:
Click [Save]
Define [fast login] with summary: [Quickly login as <user> with <password>] and actions:
Browse to [root url]
Fill in [username] with [user]
Fill in [password] with [password]
Click on [login]
Do [fast login]: Quickly login as [bill] with [Passw0rd]
Evaluate [assert int(context[“current year”]) ==]

Comparing to the Requirements

  • Code independance — The basic verbs and language used here has no knowledge of individual implementations. Where such links are to be added, they are isolated within the test scripts, this makes coupling test code to implementations hard, which I consider a very good feature
  • Resumability — Runtime context can be directly edited, steps can be run dynamically and on-demand from the editor, so developing and debugging scripts is trivial. The command-line runner may not have any of this functionality (except PDB?)
  • Understandable scripts — The system can convert any script into human readable output, with support for added plain-text context in the process. It should also be possible (defining the grammar may be hard) to write a full BDD style script definition language that compiles to the internal script format.
  • Test Semantics — This is down to the verbs used and defined, but by providing a built-in set of verbs that set the scene for a user-action oriented test script, it should be much simpler and more natural to write scripts that make sense to a user. This is about directionality more than anything.


None of this system I’ve described is particularly ground-breaking, most of the ideas above are spread around other projects out there, but bringing this all together produces a really powerful, simple to use system.

Tech Lead

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store