Software 2.1 — AI Is Coding Now: Why Test Mastery Is the Software Engineers New Job Security

13 min readMar 12, 2024

Picture this: You’ve just booted up GitHub Copilot or fired off a query to ChatGPT, ready to tackle the next item on your coding to-do list. Your experiences thus far might have ranged from marveling at the sheer brilliance of AI-driven suggestions to moments of frustration where the results fell short of expectations. Whether your journey has been smooth or a bit bumpy, one thing is becoming increasingly clear: the future of software development is undeniably intertwined with artificial intelligence.

Let’s look at some intriguing figures that underscore this point. A study highlighted that developers utilizing GitHub Copilot found themselves a whopping 55% more productive on certain tasks. Cast your mind back to just a year ago, and already, a staggering 40% of the code checked in was AI-generated and unaltered.
(See https://www.microsoft.com/en-us/Investor/events/FY-2023/Morgan-Stanley-TMT-Conference)

Yet, when we dig deeper into the specifics, such as the University of Chicago’s study “Can Language Models Resolve Real-World GitHub Issues?” revealing a mere 4.8% issue resolution rate by a specialized AI model, the initial dazzle dims significantly. This stark contrast unveils the invaluable role developers continue to play. The journey from traditional code snippet generation to the SWE-bench’s complex, multi-file editing challenges underlines the sophisticated understanding and nuanced decision-making still squarely in the human court.
(See https://www.swebench.com/#)

SWE-bench sources task instances from real-world Python repositories by connecting GitHub issues to merged pull request solutions that resolve related tests. Provided with the issue text and a codebase snapshot, models generate a patch that is evaluated against real tests.

Crucially, the use of unit tests as a benchmark for AI effectiveness in bug fixes offers a glimpse into a future where AI assists more subtly but significantly. This landscape where AI augments rather than replaces human expertise suggests a future rich with potential and fraught with challenges.

I’m Sam, and I’ve been coding since the age of 12, with a trusty Commodore 16 and its feeble cassette tape storage by my side. Now, I bring a wealth of experience with me after having professionally worked in the industry for almost 20 years. So I hope to provide a very grounded view of AI driven software development to the table. One that goes beyond the superficial excitement of AI writing a few lines of code. My passion for coding has not only shaped my career but has also instilled a lifelong love for the craft. So… as we are looking at this impending AI revolution in software development, I find myself caught between excitement and apprehension. What will AI mean for our profession? How will it transform the way we work, collaborate, and create?

Here are my thoughts about what’s about to unfold.

Software 2.0

But before we go deeper into AI in software development, let’s take a moment to look at pivotal concept that is deeply insightful: Software 2.0. This term, coined by Andrej Karpathy, who co-founded OpenAI and served as the director of artificial intelligence at Tesla a little while back, introduced this new paradigm in programming back in 2017! Software 2.0 isn’t just a buzzword; it represents a seismic shift from traditional programming methods to a future where software development is increasingly produced by AI. Understanding this transition is critical or at least fascinating to understand, trust me!
https://karpathy.medium.com/software-2-0-a64152b37c35

In Software 1.0, programmers manually craft every line of code, targeting specific behaviors within a program. These lines of code, written in familiar languages like Python or C++, define clear instructions for computers to execute.

Contrastingly, Software 2.0 is about programming in a language that’s not immediately human-friendly: the language of neural network weights and biases. This approach doesn’t involve direct coding by humans due to the sheer volume of parameters involved — imagine trying to manually adjust millions of weights! Instead, the focus shifts to setting a goal for what the program should achieve, such as replicating a dataset of input-output pairs or winning a complex game. Machine learning then takes the helm, exploring a multitude of solutions within this framework to meet the set objectives.

Now with that knowledge I want to highlight another way of looking at that separation: Distinguishing between Software 2.0 and coding with Large Language Models (LLMs). It is key to understanding the future landscape of AI in software development. Software 2.0, as envisioned by Andrej Karpathy, operates on a ‘Probabilistic/ML’ paradigm, where the artifacts of programming are the weights and biases within a model. This method leverages machine learning to adjust these parameters, achieving desired outcomes without traditional code. On the other hand, LLM-based coding falls under a ‘Deterministic/Code’ model, where the artifact is the actual code base itself. Here, AI assists in generating code that is directly understandable and usable by developers, maintaining a more traditional approach to software creation and slot’s in easily in the existing tool chain. This clear distinction sets the stage for exploring how each approach uniquely contributes to the evolution of software development, offering distinct advantages and challenges in the quest to integrate AI more deeply into our coding practices.

Here are a few examples to further illustrate the difference:

Examples: Deterministic / Code

Web Application Backend: Developing server-side logic, RESTful APIs, and database interactions for web applications.
Mobile App Development: Building mobile applications with native or cross-platform frameworks for iOS and Android.
Embedded Systems: Programming firmware and low-level software for devices like microcontrollers and IoT devices.
Networking Software: Implementing protocols, network configuration tools, or network security software.
Financial Systems: Developing secure, high-performance systems for banking, trading.
E-commerce Platforms: Creating online storefronts, shopping carts, and payment processing systems.
Software Development Tools: Designing IDEs, compilers, debuggers, and other tools that facilitate software development.

Examples: Probabilistic / ML Model

Autonomous Vehicles: Developing algorithms for perception, decision-making, and control in self-driving cars.
Robotics: Programming robots with the ability to learn and adapt to tasks through reinforcement learning.
Anomaly Detection in Network Security: Developing models to identify unusual network traffic patterns indicative of security threats.
Natural Language Processing (NLP): Building chatbots, language translation systems, or sentiment analysis models.
Personalization Engines: Designing recommendation systems for e-commerce, content streaming, or social media platforms.
Fraud Detection: Implementing models to detect unusual patterns and prevent fraud in transactions.
Healthcare Diagnostics: Building diagnostic tools that use machine learning to identify diseases from medical images.

The Convergence: Software 2.0 for All

The Deterministic and Probabilistic approaches, are using a very similar approach, yet produce different artifacts.

While Karpathy’s original vision of Software 2.0 was geared towards probabilistic problems, a fascinating realization emerges: there is likely a second variant of Software 2.0 that can be applied to more deterministic problems as well. This convergence of paradigms holds the potential to revolutionize the way we approach both probabilistic and deterministic challenges in software development.

For deterministic problems, where we require a precise output, fine-grained control, low latency, high efficiency, clear understandability, and simplicity, the traditional Software 2.0 approach hasn’t gained traction yet. However, by introducing a few key modifications, we can adapt the Software 2.0 paradigm to cater to these deterministic needs.

The crux of this convergence lies in the shared underlying principles that govern both probabilistic and deterministic problem-solving. At their core, both approaches assert a goal for what constitutes “good” — whether it’s using vast amounts of training data for probabilistic problems or defined test cases for deterministic ones. Furthermore, both define a target behavior: probabilistic problems leverage labeled training data, while deterministic problems execute test cases to confirm the desired behavior is achieved.

The true power of Software 2.0 lies in the fact that, once these goals and target behaviors are defined, the AI takes over, exploring a vast solution space to find the optimal outcome. The key distinguisher, as mentioned before, is the nature of the output: probabilistic problems produce a model with weights and biases, while deterministic problems generate code.

This convergence presents a tantalizing opportunity to harness the power of AI in a unified manner. By leveraging the strengths of Software 2.0 and adapting it to cater to both paradigms, we can unlock new frontiers in software development, where human ingenuity is amplified by the capabilities of AI, driving innovation and efficiency across the entire spectrum of coding challenges.

The Rise of Test-Driven Development in the AI Era

As we embrace the convergence of Software 2.0 and its application to both probabilistic and deterministic problems, a profound shift in our approach to software development becomes evident: the increasing criticality of robust testing. In this new paradigm, where AI takes the helm in iterating and refining code, the role of comprehensive tests becomes paramount.

Envision a future where you begin by defining requirements and creating the necessary endpoints. Instead of diving headfirst into coding, you’ll focus on crafting a robust test suite that asserts the desired behavior of these endpoints (Yes TDD friends you’ve got a real leg up here 😄). This test suite, expressing the system requirements, will then be handed over to the AI coding assistant, which will embark on an iterative journey of coding, testing, and bug-fixing until either all tests pass with flying colors or it reaches an impasse, at which point your invaluable coding skills will be called upon.

Once the AI has successfully navigated the test suite and produced code that meets all the defined criteria, you’ll have the opportunity to review its work. You may find that the AI has taken shortcuts, prompting you to introduce additional tests and reinitiate the process. Alternatively, you might discover that the code organization leaves room for improvement, necessitating the provision of better guidelines to align the AI’s output with your preferred coding standards.

In this new era, various types of tests will prove invaluable. Behavior-Driven Development (BDD) will play a crucial role in defining and validating the high-level behavior of the system, ensuring that the broader requirements are met. However, unit tests will remain indispensable for capturing more granular and detailed requirements. Additionally, performance tests may be necessary to ensure that the AI-generated code meets the desired efficiency and speed benchmarks. (Just to name a few)

While this shift towards a test-driven approach introduces new challenges, it also inherits long-standing issues in software development. The need to abstract external systems, such as databases, I/O operations, and API interactions, will receive renewed attention, as these components must be seamlessly integrated into the testing framework.

The complexity of good testing as highlighted by Toby Clemson in 2014 https://martinfowler.com/articles/microservice-testing

Ultimately, the increasing prominence of robust testing is a natural and necessary evolution in the age of AI-assisted coding.

By embracing a test-first mindset and honing our skills in crafting comprehensive test scenarios, we not only provide clear guidance to the AI but also lay the foundation for more reliable, maintainable, and efficient software systems.

And the beauty is, that this paradigm shift aligns seamlessly with the findings of the DORA research program, which has consistently demonstrated the benefits of test automation, CI/CD, test data management, and related practices. Lower change failure rates, faster recovery times, shorter lead times for changes, and improved deployment frequencies are just a few of the advantages that organizations can reap by adopting a robust testing culture — advantages that become even more pronounced when AI is introduced into the development workflow.

The AI’s Expanding Role: From Code to Tests

While the role of AI in generating code is undoubtedly profound, we must also consider its potential to assist in the creation of tests themselves. As AI capabilities continue to evolve, we can expect to see these systems not only generate code but also contribute to the development of comprehensive test suites. This prospect is both intriguing and exciting, as it could further streamline the development process and enhance the efficiency of the test-driven approach.

However, it is crucial to recognize that the most critical responsibility for software engineers will always be to understand and assert the requirements accurately. While AI may aid in the construction of tests, the ultimate responsibility for defining the desired behavior and ensuring that the tests accurately capture the requirements will remain firmly in the hands of human software engineer. This task requires a deep understanding of the problem domain, clear communication with stakeholders, and the ability to translate complex requirements into testable scenarios.

Towards Autonomous AI Coding: The Next Frontier

As we look ahead, the tantalizing or potentially scary prospect of AI systems capable of handling the entirety of the coding process looms on the horizon. However, realizing this vision will require the development of sophisticated AI agents that can seamlessly integrate with and navigate the intricate landscape of software development. Here is an educated guess, what this may look like.

Let me just coin the term Software 2.1™ right here and now. 😆

Envisioning such a set of AI agents, we would need a system that can access and manipulate individual files within a codebase, comprehend and adhere to coding guidelines and best practices, and seamlessly integrate with the linting and build processes, interpreting and addressing any error messages that arise. Furthermore, these agents must be able to execute the test suite, evaluate its success, and access error details when necessary, enabling a continuous cycle of refinement and improvement.

While this level of AI integration into the coding workflow is still largely in the research phase, the potential rewards are immense and will drive significant investment and innovation in this domain. Projects like AutoGPT, MetaGPT, and GPT Engineer are already laying valuable groundwork, while hugely popular frameworks such as LangChain (who just raised $25 million in a series A round, led by Sequoia Capital) are providing crucial building blocks for the development of these advanced AI agents.

As the pursuit of autonomous AI coding continues to gain momentum, it is only a matter of time before the major players in the tech industry, such as OpenAI, Microsoft, Google, JetBrains, or even Stack Exchange, enter the fray — in fact they have most likely already begun exploring this frontier behind closed doors. The implications of such advancements are profound, promising to reshape the very nature of software development and the role of human developers within it.

Yet, as we approach this next frontier, we must remain cognizant of the fact that true autonomy in AI coding will require not only technical prowess but also a deep understanding of the problem domain, clear communication with stakeholders, and the ability to translate complex requirements into testable scenarios — skills that will continue to be the hallmark of skilled human software engineers for some time. 🤞

(Re)defining the Software Engineer: Beyond Coding

As AI coding assistants continue to evolve and take on more responsibilities within the development cycle, it is natural to ponder the future role of software engineers. Contrary to the notion that coding is the primary focus of a developer’s job, we know that our responsibilities already extend far beyond writing lines of code. Requirements analysis, system design and architecture, deployment and integration, maintenance and updates, code reviews and mentoring, collaboration and communication, continuous learning, project management, security and compliance, and documentation are all crucial facets of a software engineer’s role — many of which will also be augmented by AI systems in the future.

Perhaps the most apt definition of a software engineer’s role is to “bridge user and technology needs through the art of code and design.”

While the tools at their disposal may be radically different, this core responsibility remains a source of pride and fulfillment. As developers adopt and master these AI-powered tools, they will likely experience a significant boost in efficiency and potentially even a boost in satisfaction, as the boundaries between coding, design, and user needs become more seamlessly integrated.

Interestingly, the rise of AI coding assistants may actually alleviate some of the communication challenges that have historically plagued software development teams. With a greater emphasis on declarative approaches via the language of requirements and goals expressed through tests becoming more prevalent, the dialogue between engineers, business analysts, product managers, and delivery teams may become more streamlined and accessible to all stakeholders. Instead of grappling with the intricacies of implementation details, discussions can focus on the desired outcomes and goals, fostering a more seamless and collaborative development process.

However, one thing is clear: as AI takes on more coding responsibilities, the importance of asserting and testing requirements will become increasingly paramount.

Software engineers will need to hone their skills in crafting comprehensive test scenarios, ensuring that the AI coding assistants have clear and accurate guidance to work with. This shift towards a test-driven mindset will not only be essential for ensuring the quality and reliability of AI-generated code but will also align with established best practices that have proven beneficial even in traditional development environments.

If you haven’t at least started to practice TDD (Test Driven Development), there is no better time then now!

The Bittersweet Transition: Embracing the AI-Augmented Future

As I reflect on the impending changes that AI coding assistants will bring to our profession, I can’t help but feel a bittersweet sense of nostalgia. The thrill of solving intricate coding problems, the exhilaration of overcoming seemingly insurmountable challenges through sheer ingenuity and perseverance — these are experiences that have defined my career and fueled my passion for software development. Letting go of this craft, which has been so deeply intertwined with my identity, is undoubtedly a difficult prospect.

However, just as my transition into management roles exposed me to the immense rewards of nurturing and guiding junior engineers, as well as tackling complex challenges with newfound efficiency by building excellent engineering teams, I remain hopeful that the integration of AI tools will offer similar fulfillment. The ability to review and refine the AI’s code, to nudge it towards more optimal solutions through targeted tests and benchmarks, could prove to be a deeply satisfying endeavor — a delicate dance between human ingenuity and artificial intelligence.

Envisioning a scenario where I craft a performance test stipulating stringent benchmarks, and then witnessing the AI’s innovative approach to meeting those criteria, fills me with a sense of wonder and curiosity. It is a tantalizing glimpse into a future where our roles evolve, but our passion for problem-solving and our pursuit of excellence remain undiminished.

As there is a good amount of projection in my writing I’m genuinely curious how you see this all playout? Am I being too optimistic or pessimistic? Do you think code will die out entirely, and natural language is all we need for coding? What other trends do you foresee shaping our profession in the coming years?

Update 14/03/2024

Just 2 days after publishing this article, Cognition Labs published “Devin” a system (most likely based on GPT-4) that managed to score 13.86% in the SWE-Bench compared to the 1.96% GPT-4 got unassisted. Super impressive and further proof of the rapid development in this space.
See here for more details: https://www.cognition-labs.com/blog