Building an AI Test Agent

Gunay Ozkan
4 min readJul 7, 2024

--

Imagine an AI agent capable of testing a software system at every level. This is the vision we’ll explore. Testing is a critical and time-consuming phase of software development. An AI-driven testing process could dramatically enhance software quality and save considerable effort.

Testing at all levels means unit tests, integration tests, functional tests, and more. For the uninitiated, here’s a brief overview:

  • Functional Tests: These are end-to-end tests of the entire system from a user’s perspective.
  • Unit Tests: These focus on the smallest testable parts of the code, such as functions or classes, using mock dependencies.
  • Integration Tests: These test multiple components together, like the backend of a web app.

The AI test agent can be utilized at different phases of the software development. It can generate tests before writing the code, after writing the code or it can write tests for a legacy software system that is being maintained. Writing tests means, identifying test scenarios, generating test cases, creating input data, determining expected outputs, and writing the test code to verify the system’s correctness.

The real challenge lies in understanding the expected behavior of the software. Many systems lack detailed specifications, or their documentation is outdated and incomplete. So, what can the AI rely on? Potential inputs include:

  1. Code itself
  2. Code comments
  3. Code change history
  4. Generated user interfaces when you run the app
  5. Existing docs, such as the initial idea, business concept, requirements, design decisions, tradeoff, discussions on design, internal communication about the project, meeting notes and drawings

Unfortunately many of these sources by themselves are not trustable. For example, you can not assume a code itself is correct. If the code is not correct, the incorrect behavior will be what you assume correct. Documents and code comments are usually not up to date, and probably do not have enough details to write a test. Then what can be the source of truth for testing?

Combining all the available sources can help the AI predict the system’s expected behavior, but human confirmation is still needed. The AI must communicate its understanding clearly for humans to validate or correct. Let’s name this communicated information about the expected behavior, the “specs”. Specs will likely be based on human language, and with supporting visuals. Ideally AI would generate all the test steps from those confirmed specs, with minimal hallucination, since the specs would enable very clear prompts.

The “specs” will be similar to high-level requirements, low-level requirements, use cases, or domain-specific languages, in the current software development processes. In real world, well written requirement documents, such as the ones in safety critical system programs, are usually huge in size and may not be the best communication medium between human and AI. It may take very long time for human to write them and also too long or impossible for one person to comprehend all. In order to let the “specs” concept to succeed, we need to rely on AI to generate most of the specs by itself, either using existing sources or by interacting with human. We also need to rely on AI to represent the specs to human using various summarization, visualization techniques and answering human questions, to enable human to comprehend all.

So far, we assumed that the specs can be the mechanism for enabling AI to develop the tests. But the interesting point is, when the specs exist, AI can also implement the code itself. Instead of building a test only AI agent, focusing on building an AI powered end-to-end software development environment based on the specs concept will bring much more value.

We can envision an integrated development environment (IDE) where architecture, multiple levels of specs, tests, code, build and test execution environment coexist. AI can guide each step, but human will always be in the loop to confirm or correct the AI’s work, ensuring accurate architecture, specs, test steps, and code.

This approach will have many benefits:

  • AI being aware of tests and architecture at the same time, will be able to guide to an architectural design enabling simpler unit testing and mocking mechanisms
  • AI being aware of tests and code at the same time, will improve code quality with a similar impact as in Test Driven Development
  • AI will be able to maintain the overall traceability of every step, ensuring 100% consistency, 100% test coverage, avoiding stale specs.

So far, we have come to a point where we use the specs to drive not just the tests but also other aspects of software development. I have also suggested an end-to-end development environment to connect everything together. However, let’s keep our focus on testing. There are still multiple big challenges about testing. We have only focused on the “specs”, which may be the hardest one, but others are also hard. Some of these challenges, especially for higher-level testing, include:

  • Specs may not be enough for AI for determining the test scenarios, and human may need to define them to guide AI.
  • Identifying the expected outputs is difficult, and humans may need to validate or correct AI suggestions.
  • External dependencies need well-defined APIs and protocol definitions to allow AI to include them in tests and code. This may require more human involvement.
  • Various types of software systems, such as web apps, gaming apps, hardware-in-the-loop systems, and flight control software, may require specialized IDEs, test setups, and AI models trained for each.
  • The overall computation need for the suggested AI powered IDE, may be too high, esp. for more complex systems. Overall development will need to be separated into incremental steps, likely be defined by human. But once the system gets larger, maintaining consistency will require a huge amount of computation.

Conclusion

We started with the goal of an AI testing agent. We’ve focused more on how to define what to be tested. We have also identified additional challenges to be addressed. We envisioned an end-to-end integrated development environment where AI and human intelligence work together. Until AI matches human intelligence, we will need these collaborative and interactive approaches to accelerate and simplify software development. As AI becomes more sophisticated and reliable, less complex steps like the lower-level specs may diminish, reducing human involvement and speeding up development.

--

--

Gunay Ozkan

Software engineering manager at Meta. Ex-Amazonian. 25+ years of experience on development of complex, high end software systems.