Emerging Technologies: Testing Challenges & Opportunities

While immersed in the hype surrounding emerging technologies at CogX in London, with discussions around the broad range of possible applications of the convergence of AI and Blockchain, I found it easy to lose sight of the software testing challenges and indeed the opportunities presented by this changing landscape.

Now that I have had some time to decompress and fire up my healthy dose of tester cynicism, I’ve collated some thoughts on where I think we need to focus our attention in the coming months and years: data, environments, usability, and tooling.

Test Data & Environments

Whether it is for the purposes of testing form validation, script injection, API calls or load testing, data is a fundamental building block of any test effort and as Prof. AC Grayling of the New College of the Humanities reminded us that CogX: “AI is data hungry”

“AI is data hungry” — Prof. AC Grayling of the New College of the Humanities

If the predicted trend is for the software we develop to utilise increasing amounts of AI and the ability of those systems to perform valuable learning is only as good as the quantity and quality of the data it consumes, then we as testers need to consider the structure, growth and maintenance of our test data sets to ensure that we are always exercising the system under test in a way that is loyal to real-world usage while still unearthing those elusive edge cases.

At a previous employer, the accountants maintained the books for a fictitious company, in an attempt to simulate the growth of the books and to expand the coverage the data-set provided. While this is a sensible attempt to simulate an organic data set that will exercise the system under test in realistic ways, it is not really scalable.

To scale data sets in parallel with the application AI’s hunger for data, we too will need to harness AI. In the aforementioned example, this might mean creating a number of AI managed “businesses”; trading with each other and contributing transactions to each other’s accounts. But this raises the question of reproducing failure conditions.

It is important to be able to reproduce failure conditions and in a world where AI will be generating the test data, we need to find a way to seed that data set or take a snapshot of the system under test, which would be expensive in terms of storage. We could also try to build in re-playable or reversible state to the system under test by leveraging some auditing mechanism.

In the “AI at Scale” talk by Chris Wigley of QuantumBlack, he suggested that there is no gentle approach to doing AI at scale “… if you do not do AI at scale now, you will never do it”. I think this is also true for testing — where the opportunity arises to use AI we need to seize it. When the application we are testing begins to use AI we also need to embrace it in our test framework or quickly find ourselves under-tooled for the complexity of the task.

So what does this mean for our test environments? According to the World Quality Report 2017–18 from Capgemini, test environments are an increasing concern for testers, with 46% of those asked citing it is a major problem — up 3 points on the previous year’s report. This highlights that environments are already a pain point in the industry.

Traditionally, test environments are light-weight representations of the production environment, getting less and less light the nearer the environment is to production. If we need to maintain larger (and growing) data sets on these environments, the storage and computational power required for AI to make use of this data will see the provisioning for test environments get closer to that of production.

UX & Tooling

At CogX, Sarah Gold of Projects by IF raised the prospect of AI-generated customised user experience. This could negatively impact usability when the designer cannot predict the journey of the user.

Similarly, a tester or automated flow that emulates the user’s UI experience in order to highlight potential blockers and impairments will quickly find the non-deterministic nature of the AI-generated flow hinders progress. Multiple traversals through the UI will potentially produce different content, presentation or change the entire outcome based on the decisions made by the AI in response to what it learns from the user’s interaction.

To combat this, we’ll need to re-invent our current automated UI tooling, which is often quite rigid and brittle, to deal with this new non-deterministic and dynamic user experience.

The Page Object Model (POM) allows us to model a page which makes it easier to manage brittleness. When the Document Object Model (DOM) changes, we have a single point of failure in the POM, which can be updated and fix all reliant specs. If pages are no longer as easily definable, then the POM approach needs to make more use of a Component Object Model.

In the world of Behaviour-Driven Development (BDD), Gherkin allows us to document functionality and drive automated tests. But if the flows within a system are more fluid and each interaction does not necessarily produce the same outcome, can the functionality be documented successfully in this traditional manner?

BDD should still be a perfect fit for AI — writing Given/When/Then statements that avoid the implementation technicalities and maintain a business language level description of the actions performed and outcomes expected. This focus on the measures of success rather than the means of achieving it, are highlighted in the article entitled What’s The Score? Developing The Right Measurement Capability Is Critical To AI Success written by Steven Gustafson of Maana.

Consider the following video that was presented by Mike Hearn of r3 at CogX. He explained that without adequate measures of success, AI will strive to achieve success by whatever means possible, hitting the Local Minimum — the 2D HalfCheetah found that it could achieve its goal by means of grinding along on it’s back.

If AI has control of the priority, placement or even inclusion/exclusion of content and features on a page, we need to consult it’s inner workings to make sure that our test framework knows what to expect next. If AI has a more complex control of UX, for example: the application learns that a user prefers certain input types to others (a star rating input represented by a widget, slider or drop-down) — will our definitions in the POM need to be loose and will our means of interacting with the UI require more wrappers to handle the increased diversity of interactions with non-deterministic input types?

The non-deterministic elements of the application will have to be tested using some of the inner workings of the system. This doesn’t sit well with me, since I have always advocated avoiding the use of the internals of a system to test the system.

“The test infrastructure would need to support learning expected test results from the same data that trains the decision-making AI” — Moshe Milman & Adam Carmi, co-founders of Applitools

An interesting take on AI-led QA is presented by InfoSys where the suggestion is that Machine Learning can be used to identify problems in the system by using existing data in the form of defects, tests cases, logging and the codebase itself. The historical test resources and results is consumed by AI and it learns how to predict similar faults.


Prof. AC Grayling of the New College of the Humanities suggested that something similar to the Antarctic Treaty is required to ensure that AI is only used for good. He went on to highlight the 3 main types of human-AI relationship in combat systems: “in the loop” where the decision is presented to the human, “on the loop” where the human steps in at escalation points and “out of the loop” where the human is removed from the equation entirely. Gil Tayar of AppliTools goes one step further and defines “6 levels of AI-based testing: Have no fear, QA pros”.

The “out of the loop” (Grayling) approach is reminiscent of the film Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb. I believe that we are a long way off being fully out of the loop (Level 5, Tayar) in terms of testing, but I think we’ll soon be developing test systems in an “in the loop” (Level 2) or “on the loop” (Level 3) style to help us keep up with the growth of AI in the software we test.

We need to lay the foundations on which to build the testing approaches of the future. We need to leverage the technologies that will be used in our products, to create frameworks fit to test them. In a world where systems are capable of learning and changing; data is king and deterministic flows cannot be taken for granted, we need to be prepared to build test frameworks that harness AI capable of generating growing test data sets, accept that our test environments will become less light-weight in order to handle the processing required and acknowledge that the predictable and reproducible user journeys through our systems may soon be a thing of the past.

Sources & Further Reading:

CogX 2018 — https://www.youtube.com/c/cognitionx

5 Ways AI Will Change Software Testing — Paul Merrill — https://techbeacon.com/5-ways-ai-will-change-software-testing

Artificial Intelligence-led quality Insurance — InfoSys — https://www.infosys.com/IT-services/validation-solutions/service-offerings/Documents/machine-learning-qa.pdf

World Quality Report 2017–18: State of QA & Testing — Ericka Chickowski — https://techbeacon.com/world-quality-report-2017-18-state-qa-testing

6 levels of AI-based testing: Have no fear, QA pros — Gil Tayar — https://techbeacon.com/6-levels-ai-based-testing-have-no-fear-qa-pros

Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1964) — https://www.imdb.com/title/tt0057012/

HalfCheetah: Local Minimum — Patrick Coady — https://www.youtube.com/watch?v=2-cU-_bdfHQ

What’s The Score? Developing The Right Measurement Capability Is Critical To AI Success — Steven Gustafson — https://www.forbes.com/sites/forbestechcouncil/2018/01/22/whats-the-score-developing-the-right-measurement-capability-is-critical-to-ai-success/#3db01bc51898