Detecting the unexpected in (Web) UI

What if my CI turned into my preferred QA? Let’s fuzz UI

Inspired from fast-check logo combined with famous web frameworks

And then one day, it happened. Despite all the unit, integration and end-to-end tests that have been implemented. A customer of yours complained about what appears to be a bug you should have caught before releasing:

[…] Steps to reproduce the bug:
1- Start a new project
2- Save it
3- Refresh the page
4- Click on ‘go back home’ 💥

Some of those uncaught bugs turned out to be even weirder than expected:

  • Android issue: vulnerability allowing anyone to bypass the lock screen by making it crash (more details)
  • GitHub issue: submitting a comment after clicking the “Start a new conversation” button on a pull request diff raised an error under some circumstances (more details)
  • Jira issue: unable to save comments containing emojis (more details)

Found with the technic explained in the article:

  • Spotify issue: enabling repeat mode ignored all tracks added after the user pressed play (more details, see “…and there were bugs”)
  • Google LevelDB issue: reappearing “ghost” key after 17 steps (more details)

Here you’ll find an easier and quicker way to detect this typology of issue. While the issue chosen to illustrate the problem is a web-app, the solution can easily be extended to other technologies.

Hard to catch

Most of those bugs relied on scenarios which were not the mainstream. For instance, in the Android issue, the attacker had to open the “emergency dialer”, tap hundreds of characters into it, open the camera.

Thinking about covering such scenario with classical end-to-end or unit tests beforehand would have seemed crazy. Indeed we do not want to hard-code or consider unlikely strange scenarios that a user can execute. Truth is, it is not even possible as the cardinality of such problem would be infinite and therefore we might miss lots of them.

Yet, humans can

But humans can. You, me or your two-year-old nephew randomly playing with your phone for hours may have detected the Android issue. This statement is a direct consequence of the infinite monkey theorem: “a monkey hitting keys at random on a typewriter keyboard for an infinite amount of time will almost surely type any given text, such as the complete works of William Shakespeare”.

Instead of applying very strict scenarios, as our tests usually do, we apply each time a new path to do the “same” operation. We also have the knowledge of what are the expected outcomes of all our actions — eg.: when I click on open camera, I expect the camera to open itself or when I log with random credentials, I expect not to be logged in.

Computers can too

What if we were able to give such knowledge to computers?

Basically, a scenario is a succession of valid actions on a UI or automaton. Before applying an action we need to ensure it can be applied given the current state of the UI. Only then, we run it and check if it executed what it was supposed to do.

“commands” can be summarized as being a “combination of an action plus conditions” such as:

  • pre-conditions: can the command be executed given the current context?
  • action: what is the command supposed to do?
  • post-conditions: what are the expected outcomes of the command visible?

In addition to that, we have to define and build a “model” or “current context” so that commands know what they are dealing with and update it if needed. The model can be the UI itself but it can also include additional data such as history, user settings… or any data we do not want to get back from the UI.

Given that we defined all the possible commands for our system, we are able to generate our own random scenario. In order to do so, we generate a random array of commands, we take them one by one and we execute them in order.

Execution of a single command

If everything is running fine with this array, we can restart with another random scenario. If not, we can try to reduce to a smaller scenario or return it directly.

Property based testing to the rescue

Generating random arrays is quite trivial. Generating them in a stable and reproducible way requires seeded generators.

Wait a minute…

If a bug occurs, we expect the scenario to be marked as “failed”. As it was randomly generated there is no certainty that it will be easy to replay manually. At worst it can contain commands that will guide us on the wrong tracks. For all those reasons, reducing it to the minimal scenario is more than useful.

Property based testing frameworks are built with that purpose in mind. They have been designed to generate random datasets and reduce them in case they make a program fail.

Frameworks like fast-check in JavaScript or RapidCheck in C++ already provide a built-in way to handle the generation of such scenarios.

For more details on property based testing, you can refer to my previous publication on this subject.

Hands-on commands

Let’s apply it to a Connect Four game having the following capabilities:

  • User can play a token if there is no winner and if the column is not full,
  • User can restart the game by pressing “New Game” button,
  • User can copy the url to share the state of the game to someone else,
  • User can undo or redo any move…
Connect Four UI

Let’s try the system manually

The Connect Four project comes with two distinct implementations. One of them has a bug that can be detected with command-based approach:

Do you find it?

The bugged version is available on branch “buggy-implementation”.

User capabilities

In order to extract what will be our commands and our model, we need to identify the actions an user can execute. Given the system above, an user can:

  • click on the grid
  • click on one of the available buttons: undo, redo or new game
  • re-use a bookmarked url

In terms of commands we can extract the following ones , among others:

  • play a valid move
  • undo a move
  • redo a move
  • restart the game
  • bookmark a url
  • open a previously bookmarked url

From human specs to machine

For all those commands we can expect some pre-conditions and post-conditions to be fulfilled. All the commands defined to test this project are available at

“play a valid move” [code]:

  • pre-condition: no winner yet, column is not full — or the column is clickable
  • action: click on the column
  • post-condition: a new token has been added into the column with the color of the current player

“play an invalid move” [code]:

  • pre-condition: winner or the column is full — or the column is forbidden
  • action: click on the column
  • post-condition: the grid has not changed

“undo a move” [code]:

  • pre-condition: board is not empty
  • action: click on undo
  • post-condition: last played token as been removed

In all the commands above the “model” could have been reduced to the UI itself. Undo might require additional details if we really want to check that it removed the last played token and not another one. The attached implementation just checks that one token has been removed.

For commands like “bookmark an url” or “open a previously bookmarked url”, the model has to include an additional field responsible to store all the bookmarked urls. Commands might be implemented as follow:

“bookmark an url” [code]:

  • pre-condition: none, it is always possible
  • action: add the url and its associated grid into the model
  • post-condition: it is the first time we bookmark this url or it was already bookmarked with the same grid content

“open a previously bookmarked url” [code]:

  • pre-condition: model has at least one bookmarked url
  • action: open the bookmarked url
  • post-condition: the grid is the same as the one we bookmarked

And computer found a bug

If you have not found the bug manually, no worries, the computer found it for you. The failure after shrinking was:

0- (Not a command) Open the game
1- Bookmark the url — empty grid and player 1 has to play
2- Play token in the 6th column
3- Click on “new game”
4- Bookmark the url — empty grid and player 2 has to play
 — The url is the same but the state is not

And the shrinking was more than necessary. The original failure the framework found was less clear. It included 27 steps:

0- (Not a command) Open the game
1- Click on “new game”
2- Bookmark the url
3- Open bookmarked url: http://localhost:3000/#/
4- Press F5 to refresh the page
5- Bookmark the url
6- Play token in the 2nd column
7- Click on “new game”
8- Play token in the 2nd column
9- Play token in the 5th column
10- Play token in the 5th column
11- Play token in the 3rd column
12- Play token in the 7th column
13- Play token in the 7th column
14- Bookmark the url
15- Bookmark the url
16- Click on “undo”
17- Play token in the 6th column
18- Bookmark the url
19- Check the player stated in the top bar
20- Play token in the 1st column
21- Undo all then redo all
22- Check the player stated in the top bar
23- Click on “undo”
24- Bookmark the url
25- Click on “new game”
26- Press F5 to refresh the page
27- Bookmark the url
How does it look like? Video of one scenario

Going further

Throughout this article we introduced ways to detect unexpected bugs by turning property based testing frameworks like fast-check into QA for our applications. With that setup enabled, we are now ready to detect bugs, regressions automatically and come with very succinct counterexamples.

I personally used it successfully at work to detect unexpected behaviours on a Web UI Single Page App we are working on.

Additionally, it proved how property can be used outside of the simple scenario of a pure function which is often considered when explaining the technic.

Additional readings

👋 I really thank folks of Criteo for the proofreading and support, with special thanks to @sinedied ‏, @zhenyi2697

👏 Feedbacks or questions more than welcome