Algorithm based test automation

Selenium + ML/AI

4 min readJan 6, 2022

Introduction

We will be working with this Selenium Reinforcement Learning project.

The algorithm used for training is Q-Learning. It creates a discreet Q Matrix using a set of predefined actions and rules to achieve a goal as fast as possible.

There is a nice overview of why this approach is beneficial and what it is trying to achieve in the repo’s readme, so please read through that in full — in summary:

Writing and maintaining UI tests is expensive
If we can train a system to interact with an application and reach a goal, all of the complexity in the middle becomes less of an issue
We want to abstract away as much as possible, and provide guidance to the algorithms being executed

Application under test

This is a demo form which will be used to train against — the goal is to reach the point where the .third-panel element becomes visible.

In a real-world example you might set the end goal to be the submission of a form or logging in of your application — where you expect to see a specific condition, such as being redirected to a different page, or being presented with a welcome message. (where you can target a specific element)

Trainer goal

This is a snippet of the logic, we are verifying that the goal has been achieved — in this instance, it is that the .third-panel locator is displayed and enabled.

Analysing elements for interaction

We are querying all of the elements on the page that can be interacted with, and storing it in a collection.

There is a lot of logic going on behind the scenes, and its better to debug/step through the code rather than me sharing loads of snippets — in summary its storing locator identifiers (id/class/xpath etc.) and types (field/checkbox etc.)

At the start of the test we are providing it with locators (keys) to match against from the list returned, and data to send (values).

Why don’t we hardcode the locators?

This is where it differs from the normal approach, instead of following the steps of sending keys to name, description and text sequentially and following a fixed test flow — we are randomly moving through the application.

There is logic tracking the current and next state of the application, and we perform a random action at each point — until we reach the goal.

Trainer logic

This is where the actual interactions take place, in this example the trainer will iterate through the defined algorithm 5 times, and perform a maximum of 20 actions(type/click/select) each time.

Let’s run it!

This is what the scenario looks like when executed — it completes the available fields, selects the button, ticks the checkbox and reaches the goal — it completes 5 runs and follows a different path each time.

This is a nice example of what is actually happening in the background— if you look at the highlighted lines, you can see it actually tries to submit the form a few times, before completing the last remaining field — and eventually reaches the accept conditions checkbox.

Compared with this iteration which made it through first time.

This is where you can start to train your system based on rewards for reaching the goal in fewer steps. If you want to deep-dive into that, start by understanding the model used in this project.

The goal of the agent is to maximize its total reward. It does this by adding the maximum reward attainable from future states to the reward for achieving its current state, effectively influencing the current action by the potential future reward. This potential reward is a weighted sum of expected values of the rewards of all future steps starting from the current state.

Conclusion

AI/ML is a hot topic in the test automation field at the moment — and a lot of enterprise companies are claiming to provide solutions to this, through their own UI/IDE’s.

Based on my findings and insight into this project, there is a lot of potential for building your own projects with Selenium, and taking advantage of readily available algorithms.