On the short half-life of research prototypes in computing

Amy J. Ko
Bits and Behavior
Published in
3 min readNov 18, 2017
Research prototypes bit rot (Credit: CC0 Creative Commons).

As a researcher, I’ve built a lot of things, both by myself and with a lot of students. I’ve made new kinds of debugging tools, new verification tools, many kinds of learning technologies (including entire games).

All of them had research value, leading to research publications with new knowledge that informed future discoveries. Some of this knowledge has even impacted products. But the research prototypes themselves—those whither on the fine. They rapidly bit rot, and within months or years, no one can execute them, let alone build upon them.

In most cases, this is okay. The point of research is discovery, not necessarily robust software that others can use and depend on. Asking researchers to create and maintain software not for the purposes of science, but for the purposes of impact, does go outside the scope of how most of us view our jobs. In most cases, industry will create vastly superior implementations of our research ideas, and they’ll support them more than we ever will.

And yet, some research prototypes fill important niches that industry has no incentive to address. Take, for example, the Stanford NLP group and it’s free open source implementations of many common natural language processing algorithms. Up until ten years ago, there was very little market for advanced natural language processing libraries. But the Stanford libraries played a critical role in fostering further research on NLP and supporting many hobbyists experimenting with NLP in other open source projects. Some of these projects that built on these libraries eventually turned into entrepreneurial efforts, such as Apple’s Siri.

Unfortunately, projects like Stanford NLP are the exception rather than the rule. Most labs don’t have the resources to maintain software in this way. Moreover, academia doesn’t really incentivize it: doctoral students don’t graduate because they’ve fixed a lot of bugs in implementations of old ideas. They graduate because they’ve created new implementations of new ideas. And funding agencies don’t really want to pay for it either. They fund new discoveries, not the maintenance of research infrastructure for old discoveries. Moreover, if every research lab were to maintain every research prototype they created, the amount of code to maintain would monotonically increase. I’d have over fifty software repositories to maintain (and would have to raise funding to maintain all of them) and I’m only 15 years into my career. Just imagine faculty who’ve invented hundreds of systems over their career. The result of this is that most research prototypes, even the ones that could provide real value to furthering science or industry, rapidly decline in their ability to execute and provide value.

Maybe this is okay. Most research prototypes don’t matter that much. Or, maybe there are models to make this scale. For example, learning to build and maintain software is ostensibly what much computing education is about. Why don’t we have students practice these skills on our research systems, simultaneously exposing them to research that they might take into industry? Academic units that create software could hire a small team of versatile engineers, charged with project managing this maintenance, on-boarding undergraduates, ensuring robust archiving and reproducibility of research. Occasionally, one of these implementations might even meet a real industry need, leading to opportunities for licensing.

New models for the sustainability of research prototypes might also be new models for the sustainability of higher education. Finding ways to leverage the by-products of research, whether data, software, or even writing, might even be a new form of “idea recycling,” helping to de- and re-construct the ideas from academia from the tangible forms we manifest them in. Universities could archive not just the articles we write to disseminate knowledge, but the artifacts we created to generate that knowledge. And it’s not just academia that has this problem: it’s industry too. Sharing software that no longer serves it’s original purpose can be hard, but can also lead to unexpected value.

--

--

Amy J. Ko
Bits and Behavior

Professor, University of Washington iSchool (she/her). Code, learning, design, justice. Trans, queer, parent, and lover of learning.