Is AI-Driven Development the future of software engineering?

Brandon Lim
Layhill L. Tech
Published in
12 min readSep 13, 2023

Note: This essay is written in collaboration with my colleagues, Malte Hartwig and Thomas Wilhelm. Another colleague, Andrew Leong, edited it.

Photo by Jonathan Kemper on Unsplash

It has been an exhilarating year for technology. The world saw how far we have come with artificial intelligence (AIs).

In its infancy, AIs were restricted in their capabilities due to our insufficient understanding and limited computing power. Even as their capabilities evolved, they remained highly technical and out-of-reach for the average consumer. Even when deployed in consumer devices, they run hidden handling tasks such as sorting images or building photo albums.

Then, OpenAI released ChatGPT and DALL-E in 2022 alongside alternatives such as Midjourney and ChatSonic . These AIs, known as Generative AIs, differ in their ability to take in natural language descriptions to create new content. They are easily accessible via the web, ushering in a new era of changes to how we work and think. In time, the use of these AIs will require regulations to govern their ethical use.

Opinions regarding Generative AIs are split between excitement and fear. Some individuals are excited about its transformative potential, while others fear that it may displace their jobs.

It is understandable and natural for people to express such sentiments. Generative AIs can create impressive and convincing outputs when given the right command or instructions, and they can do so in a matter of seconds, if not minutes. However, it’s important to take a pragmatic approach and look at how we can incorporate Generative AIs into our workflows to enhance productivity and efficiency.

But first, what are generative AIs?

Generative AIs refer to a subset of artificial intelligence that utilises input data to generate original content, ranging from music, images and text.

When we asked ChatGPT “What is Generative AI”, this is the response we obtained:

Generative AI is a subfield of artificial intelligence that involves training models to generate new data that resembles a particular type of data. This type of AI is capable of creating original content, such as images, music, and text, by learning the patterns and characteristics of existing data and using that knowledge to create new examples. Generative AIs have many applications, including content creation, image and video synthesis, and data augmentation.

So, what can Generative AIs (e.g. ChatGPT) do for us software engineers?

This is undoubtedly one of the most asked questions these days, not only by engineers but also by engineering managers and leads. While the answer may seem as straightforward as “writing code”, there is one truth about software engineering we need to bear in mind:

Software engineering encompasses a range of activities from requirements analysis, design, coding, testing and project management

So, let us take you on a software engineering journey powered by ChatGPT as we investigate the following areas:

  1. Writing epics and user stories
  2. Developing software
  3. Answering our daily work questions
  4. Testing and test data
  5. Documentation

Writing Epics and Stories

Modern software projects typically define their requirements using epics, followed by stories and tasks using software tools such as JIRA. These stories and tasks are usually created by the product owner in cooperation with the team. While software engineers are usually proficient in writing code, they often struggle with writing acceptance criteria in prose from a user’s perspective.

With that said, we think ChatGPT can help us write completely new user stories or re-write our “is that even English?” stories. It is a great benefit as engineers can minimise doing this nasty and infamous piece of work. In addition, letting ChatGPT write your user stories forces us to read and validate what it wrote, which can lead to further questions and thoughts.

We asked ChatGPT to write a JIRA story for a notification system that takes into account daylight saving. Here is the output:

The response is pretty good, isn’t it? It convinced us that it was written by a product owner.

Your always available Pair Programming partner

Pair Programming is a great technique to deliver quality software but it requires a combination of factors to make it work:

  • Strong social and empathic skills from both engineers
  • Availability of a partner when needed
  • Supportive leadership that recognises the benefits of pair programming, instead of perceiving it as a waste of resources due to “double” the amount of resources used for a task

We had been in situations where one or more of these factors were missing, making the whole pair programming effort unproductive, frustrating, and maybe even making you question the meaning of life.

Since ChatGPT is able to generate high quality responses, we asked ourselves: Why not use it for pair programming?

In this scenario, ChatGPT assumed the role of the driver while we took on the role of the navigator.

.

.

.

The result was stunning.

We had the “Hitchhiker’s Guide” moment when we realised our main problem shifted from “What is the answer?” (“42” or in our case the code) to “What is the right question to ask?” (”What is the meaning of life” or in our case write a spring boot controller which parses an excel file).

Although the code generated by ChatGPT was functionally correct, it was far away from what we had in mind. One major issue was that our brains had many implicit assumptions that were not clearly articulated in our initial prompt. As a result, we had to add more functional and non-functional requirements to our prompt to enable ChatGPT to generate code that truly met our needs. This was where ChatGPT differentiated itself from a normal Google search: Its ability to understand the context in which the question was asked.

During a chat session, ChatGPT takes the chat history into consideration when generating its responses. This allows for an iterative process where the initial answer or solution can be refined step-by-step. With each iteration, we either improved the functionality of our feature or refactored the code to make it more readable. It was great and at the same time interesting to see how ChatGPT implemented and interpreted our commands. Without the need to write syntactically correct code, we could focus on the “what” rather than the “how”.

But, this was just the beginning.

Information condenser and explainer

During the software development process, there are moments when we might wonder how a framework works, how to use certain library features or need a quick explanation of a concept. Generally, we will have to pour through documentations–many of them poorly written–and try out the provided examples to understand what is going on.

One of the things ChatGPT can do for us is summarising content and giving us a good explanation of a topic based on our question.

Let us use MongooseJS documentation as an example.

If we look at the API documentation on query function, we see that it only focuses on how to use it and what it is. For someone new to the library, they can easily follow and use it.

But what if they want to understand more? What if they have a very specific question such as: What happens when you await on a Query?

Mongoose’s official website provides a collection of guides that can help us with these questions. Using the above question as an example, we can find out what happens when we await on a Query here.

Alternatively, you can use ChatGPT to provide you with the answer as shown below.

ChatGPT explaining await on Mongoose query with additional context

Both sources subjectively provide the same amount of information for us to draw the same conclusion or understanding: await executes the query. However, the difference lies in the additional context and explanation provided by ChatGPT, which can improve the overall understanding of promises.

Now, if we scale it up to asking questions related to different libraries that we used in a project, ChatGPT becomes an obvious choice for engineers to seek answers instead of searching for information on Google.

Test-Driven Development

High quality software has decent test coverage, and so we let ChatGPT write some tests for our feature. With this approach, we can spend the time saved thinking about what tests scenarios are needed and how the test should look like.

The opposite could work too.

With euphoria from the unit test experience, we went with asking ChatGPT to implement a solution using test driven (TDD) approach as per how professional software engineering is done.

And…

The experiment failed.

Pair programming with ChatGPT using TDD

As seen in the example above, ChatGPT replicated the typical Software engineer’s behaviour: Coming up with the solutions before writing the tests. Also, when we tried a more iterative approach, ChatGPT came up with the final solution very quickly ignoring the red-green-blue cycle.

The test-driven approach might have failed, but we refused to give up. We tried to write only the unit tests and let ChatGPT create the production code. We started with a straightforward task of creating a converter that converts Roman numerals into Integers. The result was great with all tests passing, and the code was quite good, close to what an average engineer would produce.

On the other hand, do we really care about the code quality if everything is working fine?

Here is a thought experiment…

We let ChatGPT create whatever good or crappy code it generates as long as all of our test passes. If any bug occurs, we will need to write new tests, and then we let ChatGPT re-create the code. If all tests pass, the bug is fixed.

However, for more complex scenarios i.e. multi-threading, the above approach will probably fail. In such cases, we could still leverage on ChatGPT to help us understand and improve upon the buggy code parts with a Refactor-By-Demand approach.

It sounds wrong to us. But, it might be our fear of losing our jobs that makes this a daring thesis. Our experience with developing applications tells us that software engineering is much more than simply writing code on a green field.

Test data Generation

Test1, the infamous 42-year-old with the email user@test.com, is a well-known figure among developers. He often makes appearances in our unit tests and development databases but is a poor substitute for real data.

And therein lies the problem. When testing with unrealistic and constantly repetitive data, the question arises: is your test really worth it? How will your stakeholders or even customers react to your demo if it’s based on poor quality data?

The reason we use poor test data is very simple: Generating a high-quality test data set takes time and creativity… until now.

Yep, you can already guess it: ChatGPT. It can support us in many ways.

In the example below, we asked ChatGPT to create real names from 10 different countries with special characters. Your customers, clients and stakeholders will thank you for a product that works under real life scenarios as genuine data was used for testing, improving its overall quality and usability.

Test data generated by ChatGPT

Documentations

Santa Claus and documentation have a lot in common. Everybody talks about him/it but no one really saw him/it. And, as much as we want to deny it, most engineers generally do not enjoy writing documentations. They would prefer to write codes or delve deeper into highly technical challenges.

With ChatGPT, creating documentation has become more accessible and efficient than ever before. In fact, some individuals have even used it to write academic papers. As such, ChatGPT can be an excellent tool for generating clear and concise documentation, regardless of whether it is user or technical documentation.

Let us use ChatGPT to write an Architecture Decision Log (ADL): Why are we using micro-frontend with React and module Federation instead of iFrame?

We give the following command to ChatGPT:

Can you generate an Architecture Decision Log on why we should use Microfrontend with React and Module Federation instead of using iFrames?

The AI generated the following response, which we subsequently refined to better suit our project requirements:

Architecture Decision Log on Microfrontend with React and Module Federation instead of iFrames

The response is considered to be very good for an initial draft.

What about writing a user guide?

Let us use it to write a user guide on: How to load the logs using Grafana using the following command?

Can you write a user guide on how to access logs on Grafana?

Below is the response generated by ChatGPT, which is considered to be reasonably good enough.

Awesome, right?

However, there are caveats

Generative AIs can change the way we work but they are not silver bullets, and the generated content should not be used as is without fact checking. Using a statement generated by ChatGPT that contains inaccuracies in official documents or publications could result in legal implications.

Facts are not guaranteed

Built on top of the language prediction model, GPT-3, ChatGPT can indeed generate high quality text that can be hard to distinguish from one created by a human. However, the correctness of the content generated is highly dependent on the training data and the probability of which word comes next in the sentence defined within the language model.

With that said, it lacks certain fact checking abilities. A query comparing the height between Shaquille O’Neal and Yao Ming revealed that flaw.

Shaq vs Yaoming

Note: The above issue has since been resolved in the months that followed since this essay was written. Now it serves as an illustration point.

Therefore, there should always be a follow-up review on the generated content and necessary changes made before publication of the content.

Solution quality is dependent on how and what questions were asked

Using the pair programming session with ChatGPT as an example, it has the potential to produce functionally correct and well-structured code.

However, as professional software engineers, we care about codes that are not only functionally correct and well-structured but also clean and thoroughly tested. ChatGPT will not be able to discern the need for these requirements unless it is provided with the appropriate prompts or commands. Furthermore, it requires the engineer to be able to judge what is lacking in a piece of code that needs improvement. This requires practice, training and experience in writing good quality code, which junior engineers may find challenging at the start.

It does not inherently train critical thinking skills

Humans are inherently lazy. If there is a way to get an answer or solution to their problems with bare minimum effort, they will do so. With Generative AIs such as ChatGPT, it is highly probable that we will delegate all our thinking to it, and simply copy-pasting its answers without much critical thinking.

In the context of software engineering, we may end up asking the AI to generate architecture solutions or codes without thinking whether it make sense for a given problem. Junior engineers may not learn how to identify and solve issues by themselves because they can simply ask ChatGPT to come up with answers.

Although ChatGPT might be able to provide explanations of various tools, it is up to us to determine whether these tools are suitable for our needs. Skills such as planning, designing, envisioning and evaluating the final outcome are inherently human and cannot be replaced by any tool or technology.

Conclusion

Despite its limitations, Generative AIs are great tools. They can generate good enough content given the right prompt, saving our time and energy and making us more productive and effective in our endeavours to build awesome products.

There will definitely be more improvements to these AIs in the future, resolving some of the pain points we mentioned in this article. Maybe one day they might even be able to intuit what we are asking for and give us the right answer or solution.

Until then, we will need develop the skill of knowing what the right prompt is and/or add specific information to a prompt to properly harness ChatGPT’s power.

We also must try to understand where our fears are coming from regarding these AIs, and then channel these feelings into something more productive, such as improving our critical thinking, creativity and interpersonal skills.

With that said, only by being pragmatic and embracing these AIs without prejudgment, we can be better engineers for our customers and contribute to their success.

P.S.: Humans wrote this article

--

--