Deciphering the buzz behind AI pair programmers (part 3 of 4)

Vedant Agrawal
5 min readFeb 10, 2023

--

This article is the third part of a 4-part series on AI pair programmers. The first and second articles can be found here and here, and they cover the history of AI pair programmers and how they work. If you’ve read the previous articles, welcome back!

The issues with AI pair programmers

The next few hundred words covers the most notable issues with AI pair programmers today. While these tools have got a tremendous amount of hype over the past 12 months, it’s important to appreciate that we’re still in the very nascent stages of what an AI pair programmer can do and that comes with a ton of challenges. Let’s dive in!

1/ Quality:

So, AI pair programmers are supposed to output code basis a text prompt. But is it good, usable code? According to OpenAI’s paper, Codex (the LLM that drives Copilot) only gives the correct answer 29% of the time. It’s not surprising then, that users claim that Copilot’s output is riddled with issues, some of them being:

a/ Sometimes doesn’t work / compile.

b/ Too much code (sometimes duplicate code) which makes it hard to manage or for a developer to read.

c/ Doesn’t check edge cases.

b/ Uses obsolete methods.

e/ Doesn’t check for security vulnerabilities.

As a result, some users and managers fear that usage of this product would lead to developers (or their managers) spending a lot of time constantly scrutinizing code outputted, which reduces the efficacy of the solution.

The reasons for these issues with Copilot can be boiled down to two main problems — (A) how LLMs work and (B) the training data used to train Copilot.

LLMs don’t really understand the meaning of the training data they are ingesting. So, Copilot doesn’t really understand the meaning of the code it has ingested or the outputs it gives out. It also does not return what it ‘thinks’ is best but returns what it has seen the most. So, it isn’t writing, it’s reproducing or copying. The way Copilot outputs code has drawn criticism from users that it is “just a tool that replaces googling for Stack Overflow answers”.

Compounding the problem that LLMs don’t really understand what they’re ‘learning’ comes the problem that most of the code written in GitHub is written by (mostly) average programmers and (by software standards) pretty old. So the output from Copilot resembles a pretty average programmer that has not updated his/her skills. Also, because Copilot does not compile the code, the software doesn’t know it’s spitting incorrect output and leaves it on the user to check.

While there is less written about Amazon CodeWhisperer and Tabnine (Copilot got the most publicity and correspondingly the most criticism too), CodeWhisperer tries to avoid this problem atleast partly by allowing users to check their code for quality and security so that CodeWhisperer isn’t including snippets of code that are sub-par.

2/ Doesn’t check the user’s assumptions:

While this might just be attributable to being a marketing problem, it is a problem nonetheless. All of GitHub Copilot, Amazon CodeWhisperer and Tabnine are touted as “AI pair programmers”. However, none of them really seem to be working as a pair programmer at all. A good pair programmer is someone who helps you question your assumptions, identify hidden problems, and see the bigger picture. None of these tools really does any of these things and blindly assumes your assumptions are appropriate and focuses entirely on churning out code based on the immediate context of where your text cursor is right now. That means that if you give the wrong prompt or are going down the wrong path (as an inexperienced programmer or an experienced one with an unfamiliar language might do), it will continue you down that path, leading you to create sub-standard code.

3/ IP infringement:

This last issue plagues GitHub Copilot primarily, with CodeWhisperer and Tabnine finding solutions to circumvent or avoid the problem altogether.

GitHub has millions of lines of public, open-source code in its GitHub repositories. Typically users are allowed to use the code in whatever way they wish, modifying it or using it as it is. However, the code is licensed with open-source licenses like the MIT or GNU license, that require the user of the licensed code to provide attribution (copy of the license along with the name and copyright notice to the original author) and sometimes changes the way the user can use the code (e.g. the GPL License requires the user to extend the original code’s license to their modified code base as well). Without getting into the specifics here (see this article for detail), the point is that the user of licensed code has to follow certain rules when using the licensed code.

GitHub Copilot is trained on public code in GitHub repositories but the output it provides the user fails to provide attribution to the original author of the code, sometimes authors of well-recognized computer science books or professors of Computer Science. This is likely because GitHub Copilot did not include these licenses when they trained the model, but we don’t know why.

In late 2022, a group of engineers and lawyers sued OpenAI, Microsoft and GitHub for $9B in damages because the outputs did not comply with copyright laws. Their arguments are summarized as:

1. GitHub is not allowed to use code without attention to the proper licenses.

2. Users of GitHub Copilot might use code with licenses they aren’t aware of, getting them in trouble.

3. Copilot’s output might have conflicting licenses (e.g., the MIT and GPL license), making such code un-usable

GitHub has defended against these claims, but their argument hinges around the idea that they would not need consent to use code from public GitHub repositories to train their language models as per global copyright laws, but doesn’t talk of attribution of such training data in the final outputs given.

While the lawsuit will play out over the next few months, technology leaders might choose to be careful about allowing Copilot to be used in their organizations, lest they use code that unintentionally gets them in trouble.

Amazon CodeWhisperer on the other hand, declares to the user if the output given by the tool has associated licenses. The user can hover over a piece of code suggested by CodeWhisperer and will be informed of the type of license (e.g. MIT, GPL etc). The developer can then either choose to include the code and provide the relevant attribution or choose not to accept that code. Tabnine takes a different approach from both and has only trained its model on “permissive” licenses, which are less restrictive and mostly only ask users to include a copyright notice and copy of the license text.

If you’ve reached this point of the series, congratulations! I know this has been a lot — we’ll wrap with the last of this series — “Will Big Tech capture most of the value or can startups compete?”. See you there! Next article here.

--

--