Webdriver Tests in OCaml

Mark Nichols
The Aleph
Published in
5 min readMay 15, 2023
Automobile assembly line — Ford Europe/Flickr, CC BY-NC

Overview

The project discussed here is an OCaml application with a web front end developed using Js_of_ocaml. Our goal for web testing was to implement sufficient automation of our GUI to ensure that key workflow paths would continue to work properly through future code changes. Achieving full coverage of all aspects of the UI was not part of our plan.

A secondary goal was to produce automated screen shots of important activities in the UI, suitable for use in documentation.

We considered a few different technologies for this work. The first was Selenium — an open source project focused on automated testing of code within web browsers. Selenium has an IDE in the form of a browser plug-in, and also provides a DSL for writing tests in various languages. OCaml is unfortunately not one of the supported languages, and so we chose not to work directly with Selenium.

As of version Selenium 2.0, the name Selenium Webdriver refers to the API used to pass instructions to the native web browsers. These instructions are standardized rather than browser-specific, and should work the same way on Firefox as they do on Chrome. The full W3C Webdriver spec and API can be found here ( Webdriver specification). The Webdriver spec can be used both with or without Selenium.

The server side of the Webdriver API is implemented for specific web browsers in the form of drivers. For Firefox there is Geckodriver and for Google Chrome there is Chromedriver. We primarily use Firefox for development, and in our Alpine based CI environment, we add Geckodriver to our Docker setup with the command:

sudo apk add geckodriver --update-cache --repository http://dl-3.alpinelinux.org/alpine/edge/testing/ --allow-untrusted

The library ocaml-webdriver provides Ocaml access to the Webdriver API. We decided to use this library as the basis for our tests, which will run in OCaml and act as a Webdriver client through use of the Webdriver library. The documentation for this library can be found here. The ocaml-webdriver library isn’t available through Opam, so we had to bundle it with our application.

While ocaml-webdriver doesn’t provide any unit testing framework, our needs for such things were minimal, and so we coded the few assertion testing functions that we needed.

Another consideration was this library, confusingly with the same name “ocaml-webdriver”. This library has support for unit testing, but is more focused on interacting with Selenium scripts, and so we did not pursue it.

Difficulties

Since the OCaml code running the tests is often faster then the JavaScript code executing in the browser, it is common for failures to occur when a test tries to find specific GUI elements that have not yet been fully created by the JavaScript code. To handle this type of problem, we catch errors thrown by the relevant functions, look for specific exception types, and allow some finite number of retries through which the problem can resolve itself. Typically we see between 0 and 2 retries needed for successful completion of the tests we implemented. Here is a simplified example showing how retries can be handled:

open Webdriver_cohttp_lwt_unix
module W = Webdriver_cohttp_lwt_unix

let do_with_retry =
let max_retries = 10 in
let rec retry count =
W.Error.catch
(fun () ->
let* _elt = find_first `xpath "my xpath search" in
(* check attributes of the element, etc. *)
return ())
~errors:[ `no_such_element; `stale_element_reference ]
(fun e ->
let* () = sleep 1 in
if count + 1 >= max_retries
then (
Fmt.epr "[FAIL] Failure after max retries: %s\n%!" (W.Error.to_string e);
return ())
else
retry (count + 1))
in
retry 0

As part of taking screenshots, we also used the Webdriver API to move the cursor over particular elements on the screen, in order to take advantage of mouse-over highlighting in the the screen capture images. We had some difficulty with this, in particular when we attempted to mouse-over elements that were off the screen and required scrolling to be seen. The few areas in our application where this happened were not key to what we were trying to do, and after a few attempts at fixing this we decided it made more sense to turn off screen shots for those few elements rather than spending additional time troubleshooting.

We also had problems with our mouse move / screen shot code in CI environments, particularly when there was no configured display device. To get around this, we decided for the time being to disable mouse moves and screen shots for regular CI runs, and then set an environment variable when we wanted the tests to produce output images.

Most of our searches involve using XPath to locate HTML elements, and there are some limitations to doing this. XPath is better suited to XML documents where the tags can be both unique and meaningful to the data. Using HTML tags like <H1> or <input> as search targets can make it challenging to uniquely identify elements of interest.

We eventually found ways to make all the searches work, sometimes using a combination of find_all (which returns a list of matching elements), further filtered by the value of some attribute, etc.

These issues are further complicated by dynamic parts of the web page, that could alter specific XPath values depending on whether some optional components are visible or not. The down-side to all of this is the time it takes to hand craft searches that work the way we need them to.

Our design includes the concept of units of work, which groups together dependency-related queries, actions, and assertions. In particular, the unit of work describes the scope of the code that will be retried on error.

A unit of work always contains a query, which identifies a particular element within the page. An optional list of commands can then act upon the found element, and finally an optional assertion can be applied to the result.

The individual queries can either return a single element (and fail if it’s not found), or can use XPath wild cards and return a list of results. In the latter case, the list is then applied to a selector function which needs to narrow the list down to a single element. Queries can also be nested, in which case the element found from an initial query is used as the base of search for the following query.

We only needed a few actions: clicking on an element, entering some keystrokes into an input field, moving the mouse over an element, and taking a screen shot.

We built all this up into a small framework and then defined each test in terms of a single unit of work. After a while, adding new tests became just an exercise in creating the right data elements within the framework.

We are open to feedback, and can make the code available to anyone who would like to take a closer look. We would also consider contributing the code as open source if there is interest in making further contributions to this effort.

If you find that some part of your project’s GUI occasionally breaks without realizing it, or if your team spends a lot of time testing over and over the same steps in the web browser, it probably makes sense that you give this a try. We would appreciate hearing about how it turns out.

--

--