The Present and Future of AI in Testing

Iryna Suprun
Nov 3, 2020 · 4 min read

Part 1: The Current State of AI in Testing

Photo by Markus Winkler on Unsplash


During the last fifteen years, the QA profession has gone through many transformations to adapt to changes in technology and the software development process. Many predicted that QA as a profession would be eliminated, whether by Agile development, by test automation, by Test Driven Development (TDD), or by Behavior Driven Development (BDD). However, QA has managed to prove its relevance and survive all these changes. Will Artificial Intelligence (AI)/Machine Learning (ML) be the innovation that finally succeeds in replacing human testers? This article series provides a review of the current state of AI testing, reviews the available tools for implementing AI-based testing, and provides a case study that shows the benefits and limitations of AI and a continuing role for humans in QA.

The most difficult task for Quality Analysts (QA) has always been, ever since the first tests were written, the executing and automating of end-to-end tests. End-to-end tests (sometimes referred to as system-level tests) are tests that validate application flows from start to finish, effectively simulating real-world scenarios. They validate the system being tested, including its components, for both integration and data integrity. These tests also include the application’s communication with hardware, networks, databases, and other applications.

Even for a single application, where all components are designed to work together, automating end-to-end scenarios is a challenging, time- and resource-consuming task. These tests are often very complex and require a lot of ongoing maintenance, especially if the application is new and grows rapidly.

Most AI test automation tools claim that users can automate very complex scenarios without a single line of code, so the test automation process is really fast and can be easily done by anyone on the team.

Another big promise is low maintenance. AI tools claim they are able to update tests without human interaction by collecting and smartly using numerous data points for flows, UI elements, and user behavior.

There are many AI testing tools to choose from. The maturity of these tools, their usability, and the promises made vary. I reviewed the state of the AI-based tools market and tried some of them to find answers to two important questions:

  • Are tools that use AI as good as their marketing materials claim?
  • Could human QA teams be replaced by these AI testing tools?

Could AI tools deliver on their promise, enabling me to create much needed end-to-end tests in days (instead of weeks) and decrease maintenance efforts at the same time?

Based on my experiences, automating end-to-end tests for a software platform where the components were designed independently, using different technologies, and implemented by separate teams (some of which are working in different locations and time zones) is a highly complex task that often requires up to four times as much time and effort due to the increased complexity.

The State of AI in Testing

There is no standard definition of AI. AI is generally understood to be software that can mimic human intelligence. However, based on this definition, even a simple if-else statement can be considered as AI. When people talk about AI today though, they‘re typically referring to machine learning (ML) algorithms that use large amounts of data to learn how to perform complex tasks. For this discussion, we’ll assume that “AI-based testing” means a tool that uses ML algorithms to solve or perform a testing-related task.

There are four major groups of ML algorithms:

Supervised Learning is based on providing a large number of examples with known output.

Unsupervised Learning is used to find patterns in the datasets, where the outcome is not known.

Reinforcement Learning and Deep Learning behave more like humans, using neural network algorithms that can learn to solve specific tasks.

There are six levels of Test Automation Autonomy:

Level 0 is manual testing, using no AI or ML. Even when the goal is to automate as much testing as possible, most implementations don’t achieve greater than 70% automation. (This percentage varies from 50% to 85%, and depends on the source and calculation methods.)

Level 1 is traditional scripting and automated testing, using no AI or ML. Although this form of testing was introduced over 20 years ago, it still only accounts for 25% of all testing.

Level 2 is a codeless script generation or recording. AI/ML algorithms can be used, but are not required. 5% of testing today is Level 2.

Level 3 includes self-healing scripts and bots. In Level 3, Supervised Learning algorithms are often used to improve self-healing or the existing regression suite.

Level 4 is automatic script generation, with no human intervention, scripting, or recording. At this level, Unsupervised Learning algorithms can be used. For example, Level 4 with Unsupervised Learning could be used for data-mining production logs and automatically creating new tests based on the contents of the logs.

Level 5 uses fully self-generating tests that are able to validate complex systems and do so autonomously. Some AI-based automation tools claim that they provide instruments to achieve this level of autonomy. At this level, Reinforcement Learning and Deep Learning can be used to generate automated tests based on data analysis of clicks, links, and GUI elements

The overall percentage of testing done at Level 3-Level 5 is still very low. However, these higher levels of automation are slowly and steadily replacing Level 0-Level 2 automation.

In Part 2 of this article, I’ll provide an overview of currently available tools in this category. Later sections will cover my experiences using these tools in a real-life testing environment, and consider their potential for replacing human QA testers in the future.


Our latest thoughts, challenges, triumphs, try-again’s, most snarky and profound commit messages. Our proudest achievements, deepest darkest technical debt regrets (just kidding, maybe). All the humbling yet informative things you learn when you try to do things with computers.

Iryna Suprun

Written by

I started testing in 2007, cannot stop since then. The software hates me and never works as expected, so I guess I was born to be a QA.


Our latest thoughts, challenges, triumphs, try-again’s, most snarky and profound commit messages. Our proudest achievements, deepest darkest technical debt regrets (just kidding, maybe). All the humbling yet informative things you learn when you try to do things with computers.