Part 1 of this article, “The Current State of AI in Testing” can be found here
Part 2 of this article, “An Overview of AI-Based Test Tools” can be found here
Part 3 of this article, “AI Automation in the Wild,” can be found here
Codeless script generation has limits
We also should remember that the ability to record tests does not magically solve all common issues of automation. It does not matter if you use code or recording to create tests you still need to make sure that you automate what matters. Tests should be well designed and automated at the correct level. One should also think about test data management, including setup and tear down.
Recording tests can speed up automation by providing an initial set of raw scenarios to work on. Still, automation engineers need to work on defining and reusing steps that are common for many tests, setting up and maintaining testing accounts, selecting test subsets to execute on different stages of the software development lifecycle. Recording all this is not a magic bullet. Engineers still should work to have robust, easy to maintain automation tests.
Self-healing is a double-edged sword
Self-healing is a great feature, but it does not mean that all your tests will be magically fixed every time. Yes, insignificant changes are fixed automatically, but when your UI is mature, you don’t have these changes introduced often. I would say such changes are rather rare after a couple of major releases. Collecting multiple data points about each element and using them for element location definitely improves the stability of tests. It rather impacts robustness (tests do not fail if the text on element changes depending on the time of day, for example) than decreases maintenance time(tests needed to be updated because of some changes in code). I found that self-healing saves way more time if the application has a brand-new UI that changes often.
Different tools handle self-healing in different ways. Some of them require the approval of every change, at least at the beginning while the system learns, so it is still a time investment. Others just make changes and proceed without notification, which I think is dangerous. For example, a currency sign in the platform I test and the number of digits after decimals is important, but if they are missing, self-healing systems will “auto-fix” such cases.
I highly recommend that you evaluate the mechanism of handling auto-changes before selecting a test tool with self-healing features. Fortunately, most AI-based tools are starting to add the ability to review auto-fixed tests, accept or decline these changes, track history, and roll back to the previous version. Don’t forget that reviewing changes these still requires time.
To overcome the “auto-healing” problem for the small but important changes that should not be ignored, tests can always be built in a very specific way, where every sign, every comma has its own assertion. However, this results in huge, slow, and challenging-to-maintain automated tests. We can also make tests smaller and increase the number of tests, but if each of these tests requires its own time-consuming setup, having hundreds of such tests will lead to the increased time of test execution. We cannot wait for hours each time we build the software.
Every big change in the application, like introducing a new element, removing tabs or anything else, will still result in the test updates that should be done manually. Of course, QAs will be able to do it faster because re-recording steps is faster than adding code. We should remember that it will be faster if QAs use any tool that provides the recording, not only AI-based tools.
Self-generated tests improve test coverage — but what about quality?
AI-based automation tools can generate tests in different ways. Some of them generate tests by collecting information on how real users use the software in production. This information can be extracted from logs, clickstream, or both. This probably works for software that uses the Business-to-Consumer model, then there are many users that produce a lot of data for ML algorithms. In the case of the Business-to-Business model, there are often fewer users and it is harder to collect enough data to train ML models. There are also many applications/features that do not have “users”, for example, different reports that are generated as a result of data processing algorithms and calculations. There is another big drawback of generating tests in such a way — the application or the feature should be in production, and there should be users, so what about new functionality. That said, test generation based on the usage of application can only be used to add missing regression cases.
Another type of self-generated tests are tests generated by link crawlers. These tests check that every link in the app works. While these tests are useful, a working link doesn’t necessarily mean it’s the right link, or that it’s a functioning application.
Auto-generated tests typically only cover what can be easily tested. They find shallow bugs, like not working buttons, particular values, broken links. They will never help you find the requirement that was missed, logic flaws, the error message that was not generated to help the user to overcome the problem, or spot usability issues. Moreover, auto-generated tests might give you a false feeling of security and allow major issues that escape to production after thousands of automated tests have passed. Such auto-generated tests are great to supplement QA efforts, but not a full replacement.
Another concerning thing is that marketing materials for these tools claim that using one or another way of autonomous test generation will give you 100% coverage. This raises the question of 100% coverage of what and more important do you even need to achieve this mystical 100 %.
Usage of AI and ML testing tools is increasing daily. Almost every tool claims some AI-powered features that can help improve testing. The key word here, however, is “help”.
Test recording was introduced a long time ago, but introducing AI and ML hasn’t perfected it. These improvements save time, but automated test recording still requires human intervention, sometimes even for simple cases.
Fully autonomous test generation, although it sounds cool, is not mature enough that you can leave all your testing in its hands. While it can be useful to reveal gaps in coverage and easy-to-find bugs, you still can’t test complex systems using only autonomously generated tests, because these tests will never ask “what if?”
The same goes for self-healing tests. I may sound old-school, but the purpose of automation is to uncover changes that were accidentally introduced. If we look at self-healing from this perspective it is more concerning than exciting, because it hides changes.
AI tools are most effective in areas where a lot of information should be collected and quickly processed, such as visual testing, video/audio quality, log parsing, collecting real usage information, and performance testing. Visual testing was not covered much in this article only because it does not fit my particular use case, but there are many success stories across the industry. It is impossible for a human to detect any single visual change in UI, so using software is justified and pays off. AI in this case helps to find only differences that are perceptible to end-users. They do not use simple pixel-to-pixel comparison, but instead, they use a class of AI algorithms called computer vision, that helps to keep signal to noise ratio high.
To achieve robust, stable, effective, lightweight, trustworthy, and easy-to-maintain automated tests, selecting the right tool is not enough — you must also invest in test design. Someone still needs to find out what should be tested, when, and how to do it in the most efficient way. The test design capabilities in today’s AI tools are very limited, almost non-existent. Mechanical clicking on everything clickable and extracting test scenarios from logs or clickstream can hardly count as a test design technique. With AI, it’s easy to automatically generate many tests that don’t bring any business value or improve quality. Because running thousands of tests is either time- or resource-consuming, QA engineers must still identify the right balance between increasing coverage and speed of execution.
Complex tasks require human intervention
Tests often require complex data setup in software platforms or just large applications and none of the tools I’ve seen have features that help to solve this task. Data seeding, maintenance, and cleanup still should be done with some external tools or workarounds. I found this task sometimes most challenging when we talk about testing software platforms, solutions, and software that uses a business-to-business model.
Finally, to have effective automation, the software itself should be designed with testing in mind. No tool will be effective if the application is untestable.
To summarize, AI-based testing tools in the current state cannot fully replace QAs. They can be used in a supporting role and decrease the time spent on mechanical, boring tasks. They are great as a supplement to the QA team, to automate time-consuming activities, such as checking that every link in the application is working, or tasks that are impossible for a human, such as spotting every visual difference between current and previous build. AI-based tools can be used to collect data about users’ behavior to generate production-like test scenarios. They also provide the ability to generate tests from existing test documentation.
Existing AI-based testing tools are mainly aimed at web or mobile apps. Nevertheless, these tools continue to evolve and decrease human involvement in checking while at the same time increasing human productivity and giving them more time to concentrate on testing.
Will we lose our jobs? Well, any process automation results in the disappearance of one or another job function. Do we need to adapt? Definitely! But we’ve already done that many times before.
AI tools comparison table
The table below is a comparison table of the five most popular and most promising, in my opinion, AI-based test automation tools. The “+” indicates that the tool has or claims to have one or another feature or capability but does not indicate the maturity of that specific feature. The information for TestCraft, Testim and Mabl provided bases on hands-on experience and research. Information about Appvance and Functionize features provided based on research only.