AI for Software Testing

Software is eating the world and testing is next in line. Software and Test Engineers have always wanted to automate everything. We are about to turn over most test design and validation to Artificial Intelligence (AI). Instead of humans manually creating all the test automation, machines will write and execute test code, and continually improve as they learn from human input. This mechanization of test coverage means every app team will soon have access to a virtual test team, with more collective intelligence, speed and scale than even the best funded app teams of today.

Hand-crafted testing is incredibly expensive in both time and money. The smartest app teams and vendors hard-code thousands of lines of test code for their apps. Line after line of ‘click this’, ‘check that’. This approach has many drawbacks including paying all those engineers upfront to execute all these tests or manually turn them into test scripts. The creation of these tests distracts developers from the core value they create— the product itself. Worse, these hand-crafted scripts either require the management of many machines to run, or many human hours to manually execute. All of this eats up precious time. Days and sometimes weeks are spent completing a full test pass. This doesn’t square with today’s app teams who seek to build and deploy their apps daily, or even continuously. Hand-crafted testing couldn’t maintain the pace if it tried.

Maintenance is the hidden cost in test automation. When the app changes, the test code must often be updated as well. Most automation efforts quickly turn into pure maintenance, with little additional coverage. AI bots on the other hand, thrive on change. Since the bots aren’t hard-coded, they automatically discover all the new features and paths through the product — AI bots don’t break. When the AI finds new changes, it automatically evaluates them to determine if they are likely features or regressions (bugs). Hard-coded test scripts are brittle, but AI bots actually thrive on change.

Hand-crafted testing literally doesn’t scale because tests are created one at a time. Adding tests is a linear activity. Adding product functionality can increase complexity exponentially as new features and states interact with older features. Early in a project it appears testing can keep up with feature development, but the gap in coverage versus functionality and complexity will always rear it’s head.

Testing can never keep up

Perhaps most frustrating about the current state of hand-crafted testing (manual or automated) is that it only validates the specific things you chose to validate — and nothing more. If a new ‘feature’ is added, the hand crafted test automation will all run and still all pass — even if the new feature does not work. Only exploratory human-testing will find these new breaks, but human testing time is most often spent re-executing the basic tests that still pass. They miss the tree for the forest.

An AI approach to checking for quality thrives on the very things that cause so much pain for hand-crafted testing. If we had a simple AI that knew how to walk though the app like an end user (and tester), recorded everything such as performance and kept track of where every button and text box is located, it could generate and execute tens of thousands of test cases in a matter of minutes. What if you gave the machine thousands of examples of bugs, and examples of correct functionality. The bots could suggest where the app team should focus their efforts. And if these bots had already tested thousands of other apps, learning as they went, they would have an incredible amount of testing experience that can help a test team to make deployment decisions. The bots are beginning to sound very plausibly intelligent.

Perhaps most importantly, bots will be better than humans in many respects at executing tests if trained by great testers. The AI will notice every single new thing added or removed from the app. Where hand-crafted automation misses changes in the latest app build, the AI bots will automatically click every new button added to the app, and notice every image removed from an app. The bots will examine all changes and rank them based on how important they are based on the collective intelligence of the app team — and all other app teams that have labeled similar changes as either ‘feature’ or ‘bug’.

But, wait! “My app is special, a generic AI wouldn’t be useful.” “I’m always going to be smarter than a bot!” “Sure, how will the AI know what data to input into the app?” “How will these bots know if the app is functioning correctly?”. Will these AI bots will arrive soon? Let us walk through each of these reactions to such an AI-based testing system.

“My app is special, a generic AI wouldn’t be useful”.

If you are honest about it, your app sure does look whole lot like a lot of other apps. If you deconstruct your app, like a wanna-be chef on Top Chef, you will notice that your app has buttons, text boxes, images, etc., the same ingredients as every other app. If an AI can be created that can analyze someone else's app, it will very likely work pretty well on yours.

“Hey, I’m always going to be smarter than a bot!”

Doctors, Loan processors and Financial advisors once thought the couldn’t be replaced by AI and automation. The truth is, you only know what you know — your app and test cases, a book or two, and maybe what you learned during conference talk on testing. The AI’s are tireless, with near limitless memory, they are crazy fast and work in parallel. Are you really smarter and able to accomplish more than 100, 1000, or 10,000 bots analyzing your app? Even if you are smarter — wouldn’t you rather have bots do all the boring work for you so you can focus on the more creative and difficult testing problems instead?

“Sure, but how will the AI know what test data to input into the app?”

If you think about it…most apps accept the same data. Names, email addresses, phone numbers, shopping searches, profile photos from the camera roll, etc. Most all of the input into your app is just that — pretty common. Just a small set of data would be enough to build an impressive testing AI. At least it will be good enough to be your assistant.

“OK, but how will these bots know if the app is functioning correctly?”

The most difficult question; how do you know the app is functioning correctly? You never really do either. You have some tests, maybe 100 manual or automated test scripts, and they pass, but they only test a small sliver of the possible state space of your app. You also find that your biggest value is gathering feedback and bugs from real world users and fix those as you go. What you really want to know in testing is ‘Does it work just like yesterday?’. If not, are the differences good or bad? That’s really what most testing is doing, and AI bots are great at looking through thousands of points in your app, and checking tens of thousands of things to make sure all is still working like it did yesterday. With a quick scan, a bot can tell you 99% of your app is working exactly like it did last release, and you can focus on the 1% that has changed.

Will these AI bots will arrive soon?

AI bots have already started learning and testing some of the biggest apps in the apps store.

Jason Arbon, CEO Appdiff