Three child-like humanoid robots frolicking in a field of grass and bright flowers.
A DALL•E 2 generated image from the prompt “a robot frolicking in a field of flowers with friends” (Credit: whoever created those cute robots and took that photo of flowers, but whose identities I can’t know because DALL•E doesn’t value credit or human labor).

Large language models will change programming … a lot

Amy J. Ko
Bits and Behavior
Published in
5 min readFeb 24


Most that know me well know that I have a bit of a contrarian streak. I’m always interested in taking the other side of a debate. I generally see a lot to gain from an iterative Hegelian thesis/anti-thesis discourse, even on topics for which I have strong beliefs. Sometimes this is confusing to people who read my writing, because they feel like I’m hard to pin down. What do I actually believe? What are my actual values?

Well, my values are seeking situated, informed truths, and questioning those truths through the lens of human values. And so I wouldn’t be living my values if I didn’t follow up my largely skeptical critique of large-language models in programming this week with a sincere optimistic take, directly rebutting earlier essay. After all, how can we cut through the hype unless we examine all of the values, evidence, and experiences before us?

As someone who’s played with emerging program synthesis techniques for 20 years, it’s been a long road to imagining systems that largely construct code for us. Program sketching, natural language programming, common sense driven approaches for generating software architectures—these narratives are 40 years old, and have been at the heart of many people’s passion of bringing the speech-based program synthesis of Star Trek to life.

But by far, my favorite of all of these was Greg Little’s concept of keyword programming back in the mid 2000's. Greg observed, as a premise, that most of the possible programs in the world are useless; the useful search space is actually quite small, despite being infinite in size, and so it should be the case that a little bit of information should be sufficient for finding the useful programs. He demonstrated through a few prototypes in office productivity macros and even in certain domains of Java systems programming that for many of the most common programs, even just a few keywords was enough to generate meaningful programs for common tasks. These systems weren’t always that good at the edge cases or novel requirements, but the non-long tail part of the distribution, many programs could be easily retrieved with very little input. I was convinced from that point on (nearly twenty years ago at this point) that we would eventually have a world in which most programs for most requirements would be easy to generate automatically. It was just a matter of data, engineering, and probably a bit of innovation to stitch them all together.

Large language model (LLM)-driven program synthesis is basically Greg’s observation realized at scale, with more data, bigger models, and a bit of NLP to enable more sophisticated queries and to stitch together human explanations of code. And in the same way that Greg demonstrated how much was possible with a little bit of input and a large corpus, CoPilot, ChatCPT, and other models are demonstrating the same thing: 80% of programs that people need can be met with a small corpus of code, stitched together with a bit of heuristics and a lot of data.

After 40 years of chasing this dream, this is truly exciting. I wrote more than a decade ago that most of the programs that people need in the world are of exactly this kind: mostly routine, with a bit of customization that demanded a bit of knowledge of programming and systems. The vision I laid out was that what we would need to shift to this new world was a different set of skills: clearly articulating requirements, evaluating programs rather than constructing them, and likely greater attention on the infrastructure necessary for interoperability and support for people to learn its complexities. We would still have a long tail of novel requirements and still require designers and software engineers to imagine them, but most of the software needs on the planet would be met through an orchestration of skilled people and intelligent tools working together to realize relatively routine visions.

So if this future was visible 15 years ago in research, what is visible now that is coming in 15 years? In my view, I believe that most of the central issues about programming and software engineering will not be about code construction, but about everything before and after construction: namely, requirements and verification. Deciding what to make, why to make it, and whether what is made actually achieves these goals, these are the next frontier of software.

But these two big challenges have very different “attack surfaces”, if you will. Verification has long been studied in software engineering research, and I’m highly confident that its decades of sophisticated techniques will be brought to bear on LLM-driven synthesis to eventually create highly productive iterative loops of querying and verification, automating much of the construction and evaluation of programs. Give the research community 10–15 more years and we will see consistently high quality programs for this 80% of routine programs emerging from these models. Expect many researchers, and then many hype machines in industry, to celebrate having “solved” software engineering, just as we see in today’s hype around programming.

But what this will do is put great pressure on requirements. Because ultimately, all of these systems will be garbage-in, garbage-out. And as designers, requirements engineers, and researchers in these spaces know, getting requirements right is both hard, and not a technical problem that is amenable to automation. I spoke about this in my keynote at the 2021 IEEE International Requirements of Engineering Conference, deconstructing why these problems are inherently about human values, equity, and justice, and not syntax, semantics, algorithms, and data structures. And so our future is one of needing to place even greater pressure on holding developers accountable — including all the people who will be brought into software development through its LLM-lowered barriers, but may not see themselves as developers.

Why wouldn’t LLMs solve requirements engineering too? Well, it’s hard to imagine a future in which predictive models of language will be the first we consult with on what software people need in the world and who deserves to be served by it. These are not only ethical questions, but questions about what society is, who it is for, what justice is, and who gets to decide what kind of world we have. It would be awfully strange to query a probabilistic machine with the aggregate horribileness of internet speech for advice on justice. As anyone at the margins know, including myself as a trans person of color, the majority and its speech is simply awful. We can’t expect more from LLMs if they are trained on this awfulness.

And so while I still love programming, and am still actively trying to envision a world in which everyone can participate in and be empowered by its joys, most of my attention will be on imagining futures in which everyone has literacy on social justice. Because no matter how much software we have, or how easy it is to make, software will never make the world fair, and only seems to make it less so. Creating that just world is on us, not on the machines.



Amy J. Ko
Bits and Behavior

Professor of programming + learning + design + justice at the University of Washington Information School. Trans; she/her. #BlackLivesMatter.