A robot working on a computer writing something sloppily
DALL•E 2 made this in response to “a noir depiction of a robot writing code and creating a mess” (Credit: the combined labor of everyone on the internet since 1995, but probably a few specific artists whose identities DALL•E 2 doesn’t share or seem to care about.)

Large language models will change programming… a little

Amy J. Ko
Bits and Behavior

--

I do not like hype. Perhaps it’s the skeptic in me, always seeking to question a claim, to look for evidence, to examine closely before making a judgement. This stance on the world means that I’m often last to accept dramatic change and first to question it. There is nothing more blasphemous in computer science, which has often put me on its margins.

And so over the past decade, as I’ve watched large language models (LLMs) mimic human communication through sheer probabilistic scale, I’ve watched with doubt. Doubt that that they would ever overcome bias, doubt that they would achieve any form of intelligence, and doubt that they would be applied for anything other than profit. There was only one thing I was sure of: that once they emerged from my small scholarly corner of the world into the mainstream that hype would be the primary lens through which they were seen. And that’s basically what’s happened.

As LLMs made their way to programming through systems like GitHub CoPilot and ChatGPT, however, I began to see LLMs with more nuance. Programming is, for those that don’t, kind of a mess. Languages are strange historical notations we live with for decades and the most popular ones are often the most haphazardly designed. APIs are often poorly designed and poorly documented. Nearly every effort to create something simpler only adds more complexity, as it tries to interoperate with a complex world. And so while I still think of LLMs as a threat to society at large, in the world of programming, they hold some promise of helping people sift through the complexity.

But as I’ve thought more about it, and played with them, my skepticism has returned. And so I’ve come to the following predictions.

First, LLMs will reduce complexity. And for somewhat obvious reasons: the world of open source shares a massive collection of solutions to common messy problems in programming languages and API usage, and so for the past two decades of the internet, the problem hasn’t been so much about solving those problems, but finding the solutions that someone has already written down. This was what enabled the explosive popularity of Stack Overflow and the decades of research on code search. LLM-driven code synthesis will be, and already are to an extent, better search engines for finding those solutions (albeit without giving anyone credit). This future will be a slightly better one for programmers, just like better documentation and better search has always been. It will make it easier to find the useful design patterns amidst an infinite space of wrong answers.

Of course, it will not do it correctly. Because most of that code that it’s trained on? It’s bad. It’s full of security defects, design flaws, usability problems, accessibility problems. These models know nothing of these flaws, because they do not know anything about design or software engineering. And the popularity of code, as anyone knows from upvoted wrong answers on Stack Overflow, is not a reliable indicator of its correctness. And so the best these engines can do is offer a guess that helps a developer get a jump start on a problem or think of a new direction. But developers will still have to do everything they already have to do to ensure correctness: verify, integrate, refactor, redesign, rearchitect, etc., as the world changes. These models will be, at best, a helpful but fallible brainstorming tool for short segments of code.

Of course, that is the best case scenario. Some developers are going to use these tools and trust the code they get. They’re going to put it into production code. And there will be many stories of that code causing problems. But you probably won’t hear them, because drawing the line between a failure and some code is a hard thing to do, and society still hasn’t decided to hold developers accountable for the code they write. And so most people will never even notice that bad code is leaking into the world, regurgitated by these stochastic parrots. They will just experience the usual failures that software always has, just for slightly different reasons, and at a slightly higher pace.

And then there are learners, including the students I teach, the youth who are first encountering code, the teachers who are learning to code to teach it. One might imagine that (currently) free and easy program synthesis would be a great boon to students and teachers who are stuck, allowing them to create things they couldn’t before, and overcome challenges with greater ease. But that would be the hype talking. The reality is that while writing code is hard, the harder part for students (and really anyone) is understanding how it executes and then making decisions about what it should do differently. Program comprehension is what makes APIs hard to use (because they intentionally hide what they do, capturing behavior only through poorly written natural language). It’s what makes programming languages hard to use (because debugging tools are so poor at slowing down enough to teach). It’s what makes large software systems hard to change and evolve (because of the sheer amount of code to understand). LLMs do nothing to make this comprehension or decision making easier.

And in an ironic way, having something else write code for you only makes program comprehension harder. Every developer already knows this: if you write yourself, you’re far more likely to understand its behavior than if someone else wrote it. This is even true if you wrote it, but a long time ago. And so getting some code that no one understands, because it was extruded from a probabilistic machine, may generate some of the hardest to understand code, with little of the human interaction that occurs on sites like Stack Overflow to provide some degree of rationale or context. (And, of course, if people stop writing content for sites like Stack Overflow, LLMs will have nothing to train on and stop being useful).

But comprehension won’t be the only new burden on learners. Some learners will also see LLM-driven program synthesis as yet another shortcut to avoid the hard task of learning, just as they do now with StackOverflow. This isn’t because they’re lazy, it’s because learning is hard, and we all do everything we can to avoid it when there are easier ways to solve a problem. Because of LLMs provide a path with such low resistance, I believe it will, and probably already is, leading to highly unproductive cycles of guess and check that avoid the hard tasks of planning and verification. Stack Overflow at least has so many dead ends that learners eventually realize it’s not a great resource for most problems and get back to reasoning and planning instead of searching. But LLMs will likely just aggravate the comprehension problems above by creating the illusion of right answers. I fear that the disincentive this creates to learn will ultimately result in more struggle, more drop out, and ultimately fewer people who will be willing to comprehend the mess.

Should LLMs exist? I think I feel the same about LLM-driven program synthesis as I do about any developer tool: we made this mess, so we should probably clean it up. Maybe LLMs can help. But it feels like a tool that just takes one mess and turns it into another mess in the endless pursuit of productivity. It does not feel like a revolution. And in the worst case, that new mess may be even harder to make sense of.

Could I be wrong? Of course! My curmudgeonly take above comes from twenty years of studying programming and programmers, but all of that research and experience could be pointing me in the wrong direction. Maybe LLMs do fundamentally change programming and I just can’t see it. If that’s the case, I’m sure I’ll be the last to notice :)

--

--

Amy J. Ko
Bits and Behavior

Professor, University of Washington iSchool (she/her). Code, learning, design, justice. Trans, queer, parent, and lover of learning.