Grand challenges in program comprehension and software repository mining: my keynote on interdisciplinarity and research relevance
Last Autumn I received a request to give a joint keynote at two co-collocated conference at the International Conference on Software Engineering. The first conference, the International Conference on Program Comprehension, has long been concerned with how software developers perceive and reason about code, performing studies of comprehension and building tools to facilitate comprehension. The second, Mining Software Repositories, is essentially a software data science community, concerned with how to analyze version control histories, bug repositories, and other traces of software engineering activity, and scientific questions that are amenable to data science methods. The organizers of both conferences basically requested that I speak on what the two communities might have to learn from each other.
Upon reflection, this was a nearly impossible task. The communities don’t overlap much either the methods they use or the questions they ask. Moreover, despite investigating phenomena relevant to both fields in my own research, I haven’t really attended either conference regularly. And at some level, trying to bring together communities with a keynote always seems like a fools errand: communities come together around food and drink, not a stuffy keynote!
And yet, something about the challenge of trying to make the best possible argument for common ground between two communities was compelling. When you take two disconnected fields and try to connect them, what intellectual bridges emerge? I wanted to find out, so I accepted the invitation.
I spent several weeks reading papers from both communities, pondering the connections, and playing with different arguments that might appeal to both audiences.
I ended up with the following five points:
- Program comprehension research would be stronger with mining techniques, helping it make discoveries only possible with large-scale, ecologically valid data sets.
- Repository mining research would be stronger if it paid more attention to the experiences and expertise of developers, since the data the community is mining is ultimately a residual of these experiences and expertise.
- Both communities focus a lot on describing and predicting, but not at all on explaining. Theories that explain are a necessary foundation for building a scientific foundation for both areas, but also for bridging the fields.
- Because both communities only describe and predict, the discoveries they make are of limited relevance to practice. Practitioners need to understand why, not what. They already know what.
- All of the above requires interdisciplinary research, because it requires leveraging theories from other fields, leveraging methods from other fields, and combining the expertise of multiple fields.
I gave several examples of this kind of research and advice on how to get started.
The audience response was really interesting. I got questions about the differential value of theories in the natural and social sciences. Others asked about how to better impact practice through education. Others still wondered about the role of professional organizations in incentivizing interdisciplinary research. My sense was that the community was hungry for these bigger ideas and that it was a great way to kick of the week.