ICSE 2018 trip report: 50 years of software engineering
The field of software engineering research is 50 years old this year; the largest, oldest, and best software engineering conference, the International Conference on Software Engineering, is 40 years old. This year’s conference was a great chance for the community to look back on the past half century of research and ask, “What have we learned? What have we forgotten? What are we missing?” I spent the week in Gothenburg, Sweden grappling this question, reflecting on the many insightful keynotes and talks that answered these questions, but also sharing my own thoughts about how to move forward through two invited talks.
I launched my time at ICSE giving a joint keynote at ICPC 2018 and MSR 2018 about the dire need for interdisciplinary work and theory in software engineering research, using the communities that focus on comprehension and mining as examples for these larger points. I wrote about my talk in a previous post, summarizing my arguments. After my talk, and throughout the conference, I engaged in really interesting conversations with both senior researchers who struggled to comprehend what I meant by theory, but also new Ph.D. students fascinated by the potential for theory to make their work more impactful. I had a great group conversation with several of CMU’s doctoral students about what theory is, what it looks like, how it can transform the studies we do, and how it can make our results more profound. I also talked to an engineer from Adobe Analytics who was struggling to get internal adopters of analytics tools. It was a fascinating opportunity to try to impact how the next generation of researchers and engineers to incorporate theory into the work, but it made me wonder how to teach theory use effectively.
On Monday, I spent some of my time in MSR and ICPC sessions, hearing about the latest explorations in error message perception, design pattern comprehension, and other efforts to investigate comprehensibility. One paper replicated an assessment of 121 code-related metrics of complexity to see if they correlated with developers’ self-reported experience of comprehensibility, finding that length and variable names were jointly predictive of developers’ ratings. There’s some really clever use of data in these smaller co-located conferences asking really compelling questions at the intersection of comprehension and mining. As I noted in my keynote, they’re really in need of theory about comprehension, but the patterns they’re finding are a good basis for developing these theories.
In the afternoon on Monday, I had a riveting conversation with Tim Menzies about the relative merits of deep versus shallow models. He responded to my keynote partly with surprise that I hadn’t made deeper critiques of the repository mining community, but also that I hadn’t acknowledged some of the astounding power of simple, shallow models to optimize and scale all kinds of decisions in software engineering. His argument was essentially that in some cases, or perhaps many, we don’t need to explain why tools, systems, or processes work, they just need to work. We hashed out the disagreements, ultimately concluding that we probably need models of all kinds of depth (from theories to unexplained laws to mindless but accurate predictive engines). Such diversity is probably the sign of healthy academic discourse.
Monday night was a banquet for the Mining Software Repositories conference. I had a rich set of conversations with Mei Nagappan, Andy Zaidman, and Michael Godfrey. We talked about everything from tenure and promotion, CS learning at scale, our personal histories around learning to program, and our roles as gatekeepers in teaching CS. For me, conversations like these are the deep substance of academic networking: they’re conversations about our lives, our ideas, and how they interact.
Tuesday morning, Abram Hindle gave a most influential paper award talk about his paper, “What do large commits tell us? A taxonomical study of large commits?” What was significant about this paper is that it was one of the first papers to not just advance mining techniques, but actually ask a question about the content of repositories, moving the field toward more scientific questions about software engineering, not just techniques for mining. What I found really interesting about the work was how opinionated it was scientifically: it made a strong claim that outliers are important and not ignorable, and that large commit outliers were really critical indicators about the nature of software evolution. It also uniquely focused on a detailed content analysis of commits, which was (and still is) a rare method in any data mining research. Abram also made a compelling argument that rejecting papers for not having immediately actionable results hobbles our science and therefore our future, and is counter to what scholarship is about. In Q&A, Abram made some insightful points about the economics of research and how it can warp what questions we pursue and the depth of scientific investigation we allow.
During a break Lutz Prechelt asked me a fascinating question: why is it that, despite software being so complex, and developers being so unprepared, that software nevertheless gets built, adopted and used productively? I reflected for a moment, and shared my grand theory. My explanation was that software, despite having an effectively infinite state space of executions, and being infinitely incomprehensible by developers, actually only has a small relevant space of states used in practice by users. This means that despite all of this complexity, developers are capable of acquiring just enough knowledge about this relevant space of executions and ensuring that software is effective and robust for them. Then, even when software isn’t effective and robust, I suspect users are resilient to most of the failures they encounter, finding workarounds or changing their goals based on the behavior. This theory of resilience explains why software is valuable despite being brittle. That’s not to say that software failures don’t matter: there are severe failures, and they often arise from developers not having an accurate idea of what parts of a complex software system’s state space actually matter in the field. Moreover, developers often don’t have the tools or data necessary to get this accurate knowledge. Moreover, there are many sub-components of systems that are purely automated that we need to be able to formally verify to prevent downstream severe failures. There are also significant human-in-the-loop considerations that need extra special attention to get right, requiring HCI methods. So as resilient as the world is to brittle software, we can and must do better.
I took a break Tuesday lunch to have a riveting conversation with a junior colleague about advising PhD students. He asked excellent questions, which provided a great foundation for me to reflect on my practices. I talked a lot about defining culture and my new strategy of writing an onboarding document that sets expectations. I talked about psychological safety as the foundation of building trusting relationships with my students and in my team. I talked about the critical need of actually enforcing and modeling the norms in my onboarding document to reinforce my lab’s culture. I shared ideas about teaming students together to increase accountability, diversity of ideas, and frequency of feedback. I also discussed the pre-tenure tensions that can arise from needing publications, but also needing to give space for students to learn, and how to resolve those tensions by maintaining a separate thread of first authored research. Most importantly, I reminded this colleague that this learning doesn’t stop. I know senior colleagues who still seek advice after decades of learning.
I had a wonderful dinner Tuesday night with Thomas LaToza and Ph.D. student August Shi in which we had a wide ranging discussion about basic and applied software engineering research, the role of social science in software engineering research, and the need for more honest, theoretically grounded accounts of the underlying assumption’s of the field’s technical work.
Magnus Frodigh, Chairman of Ericsson Research, gave an opening keynote on Wednesday on wireless communications and 5G. He began by predicting a rapid pace of change in our digital experiences, but also a slower pace of change in networking infrastructure. He argued that the stability of the 5G standard would all for all kinds of new transformative infrastructure in IoT, including real-time machine-to-machine communication. He went on a deep dive into the particulars of 5G infrastructure, which I found dry and mostly irrelevant to software engineering, but the compelling vision buried inside of the talk was the unimaginable scale of connectivity between people and machines that comes with essentially eliminating latency. Magnus argued that this will make prototyping new experiences dramatically easier because systems can be composed entirely through low-latency networked services rather than through hardware deployments.
During the Wednesday morning break, Walter Tichy, James Herbsleb, and I had a productive conversation on how to transform the software engineering research community’s use and development of theory. We began by observing how the field does have theories, they’re just implicit, and if made explicit, they might cause us to rethink our assumptions and our research directions. For example, the field has theories of the power of abstraction, notions of error-proneness in programming language design, and how program comprehension works. We just don’t make these theories explicit. James had given a keynote on theory as well, and he and I both had received warm responses to our calls for more theory, so we suspect the field is ready to learn. We reflected on ways of educating the community, including the development of some lightweight materials to teach new doctoral students or interested faculty. We discussed potentially organizing a Dagstuhl to develop and deploy these materials.
I spent the rest of Wednesday attending sessions on the Software Engineering Education and Training track (which I’ll be co-chairing in 2020, but I also think is quite important, and central to my interests in computing education). This track publishes rigorous, peer-reviewed computing education research about software engineering. Chris Parnin kicked off the first session with a talk about using iTrust, a large complex software implementation, to teach software engineering. He found that students appreciated much later after the course the deep engagement with a large complex system, but they didn’t enjoy it all during the course. Working with legacy code was overwhelming. They revised the course by aligning the class activities with the project itself, which led to much more positive sentiments about the course (as should be expected; students need a coherent narrative around class activities to sustain their engagement). Another talk found that active video watching, in which learners comment on the content and review comments increased engagement in video watching. Some talks focused on labs, capstones, and other experiential learning projects. Generally these studies found that experiential learning is really hard to execute logistically, challenging to make authentic, and very difficult to know how to evaluate. Sounds like we need some theories of experiential learning to organize this work.
Reid Holmes presented a nice longitudinal study of Canada’s experiential learning program for high performing computer science students (Undergraduate Capstone Open Source Projects). The study discovered astoundingly positive experiences, with students, highly valuing applying their classroom knowledge to real, novel tasks, for real projects with a community of users, while receiving mentorship from real developers. The dark underbelly of this work is how students are selected: the program explicitly chooses the very best students from multiple institutions, which avoids many of the learning challenges that might arise with less prepared students.
The Wednesday night reception was at Universeum, a natural science museum full of animals, fish, and a massive humid rain forest. It was a really interesting context for a reception, because rather than being a big open event space for conversation, it was full of interactive exhibits that engaged attendees around play and exploration. The exhibits weren’t particularly inviting or compelling, but they were good enough that they triggered all kinds of interesting conversations we probably wouldn’t have had otherwise. I talked to attendees about Gila monsters, rattlesnakes, jellyfish, the software maintenance of software-based exhibits, and a wide range of cynicism that has bled into academic computer science.
Thursday morning I had breakfast with Brendan Murphy, Laurie Williams, and UW Ph.D. student Calvin Loncaric at our hotel breakfast room. We had a wide ranging discussion about two grand challenges dear to my heart: reconciling formal systems like programming languages with human and social systems, and reconciling statistical systems like machine learning with human and social systems. In my view, these are the two most important grand challenges in computer science, and yet most people in computer science are ignoring them. Brendan had a lot to say about the complexities of unifying data analysis and machine learning with real projects, Laurie talked a lot about the same challenges with accounting for security in software development, and Calvin considered these problems in his own work of data structure synthesis, where considerations of the comprehensibility of synthesized code and the learnability of his specification language are open questions.
There were two Thursday morning keynotes. Fred Brooks, Jr., author of the seminal The Mythical Man-Month on software project management, give a retrospective. Fred talked about the evolution of programs, software, software systems, software products. He then defined software engineering as the discipline of making software products. He talked about the big ideas in the history of software engineering, including von Neumann’s programs as data and high-level programming languages like COBOL and FORTRAN. In the 60’s, the software crisis (the challenge of building big systems) led to an idea of software engineering as engineering. The big recognition here was that growth in project complexity was not linear. A lot of this led to systems contributions, like Tom Kilburn’s interactive debugging and Fernando Corbato’s time sharing operating system, database systems, Robert Floyd and Tony Hoare’s ideas of formal verification, and Simula’s object orientation. In the 70’s, David Parnas’s information hiding, Barbara Liskov’s abstract data types, Harlan Mills’ and Niklaus Wirth’s incremental refinement, Michael Fagan’s code inspections, and software project management all emerged. Barry Boehm also asked questions about requirements and requirements validation. He highly recommended Grady Booch’s ACM Webinar on the history of software engineering and Barry Boehm’s lifetime contributions.
The second Thursday morning keynote speaker was Margaret Hamilton, who envisioned the phrase “software engineering.” She was a math student when she decided to intern at MIT developing weather software on the LGP30, and developed interest in software, and she eventually built the Apollo software systems that allowed the United States to land on the moon. Her talk, “The Language as a Software Engineer,” talked about the big problems: integration, evolvability is hard, reuse is hard, and software fails. She asked, why have we made so little progress in 50 years? She argued that there has been some. There was no field before; now there is. We’ve defined terms. But the reality is that software engineering is a distinctly human, distinctly social, and distinctly intellectual work, and we still have no grappled with most of these factors. She gave examples of the fundamental HCI challenges of creating interactive systems between people, software, errors, and error recovery, and how these were central to landing on the moon. She realized that systems are asynchronous, distributed, and event-driven in nature, and that languages used to write software should reflect this. She balanced this with a discussion of the need for planning through architecture through reusable, reliable patterns. I was proud to see in the Q&A the community’s recognition of history, it’s value, and the distant origins of some of the field’s biggest ideas.
On Thursday I chaired a session titled “Studying Software Engineers” that had four fascinating empirical studies, including two journal-first TSE publications. The first, “Understanding Developers’ Needs on Deprecation as a Language Feature” (authored by Anand Sawant at TU Delft) discovered many useful trends about the use and misuse of deprecation features, identifying needs for dates of deprecation, severity warnings, and more diversity of warning types. The second paper, “On the Dichotomy of Debugging Behavior Among Programmers” (authored by Moritz Beller at TU Delft), discovered that in practice, debugging tools are rarely used, that “printf debugging” remains dominant, and knowledge of debugging tools is quite low. The first journal paper in the session, “Measuring Program Comprehension: A Large-Scale Field Study with Professionals” (Xin Xia, Lingfeng Bao, David Lo, Zhenchang Xing, and Shanping Li), found that developers spend a majority of their time comprehending code, that they use web browsers and editors to comprehend code, and that the more experience a developer has, the less time they spend on comprehension. The last paper in the session, “Data Scientists in Software Teams: State of the Art and Challenges” (Miryung Kim, Thomas Zimmermann, Robert DeLine, and Andrew Begel), conducted a survey of 793 professional data scientists, and found a really interesting set of 9 types of data science roles: polymaths, data evangelists, data preparers, data shapers, platform builders, moonlighters of different degrees, and insight actors, who interpreted data and used it to make decisions. This rich deconstruction or different roles seems really powerful for informing data science education.
The last session on Thursday was a celebration of 50 years of software engineering. Brian Randell gave a retrospective on the first software engineering back in 1968. Brian talked about just how little about computing had been invented yet; no internet, no networks, no reuse. And yet all of the issues were there: testing, correctness, management, etc. Brian distinguished between programming and software engineering by defining software engineering as “The multi-person development of multi-person programs.” (he doesn’t remember saying this, but David Parnas insists he did). He concluded that the field has grown more than it has matured, questioning whether we’ve made it far enough to be called an engineering discipline, and chastising the community for inventing yet another language and yet another technique.
Brian’s talk was followed by a panel of four of the original attendees of the 1968 conference. One question they discussed was what they they regret about the past fifty years. They raised the lack of focus on requirements engineering, the lack of attention on misinformation, the lack of attention on software maintenance. They were disappointed back in the 60’s and they’re disappointed now. Some of the panelists were excited about formal methods but disappointed about their lack of adoption. They were also disappointed about the about how little we’ve discovered about how to guide design decisions in relation to software qualities. Overall, however, there seemed to be little consensus on whether things have improved or not. We’re certainly building more complex things, but are they any better, more on time?
The Thursday night banquet was in a shipyard and was a weird pastiche of activities. There was a banquet style meal, an outdoor stage with Ph.D. students singing rock music, and a Swedish-cover band singing 90’s and 2000’s pop songs during dinner, while a hostess celebrated the software engineering community and invited various conference organizers to the stage to say thank yous. During dinner, a slide show played with all kinds of arbitrary software product logos from the history of computing, with the occasional retrospective video with interviews of software engineering luminaries. It was wacky, disjoint, and disorientating, especially when a bunch of attendees carved out a corner of the event space to dance.
On Friday morning, I found one of the panelists from Thursday’s restrospective, Robert McClure, sitting alone during the break, and so I decided to start a conversation. He was one of the original attendees of the 1968 conference, and an active thought leader in industry advocating for progress. I asked him about what’s changed in 50 years, what hasn’t, and what his conception of progress is. We had a fascinating, wide ranging conversation about many fundamental issues in software engineering. He began by discussing some of the critical differences between the design of what software does (which requires understanding a problem and it’s context), engineering design (which requires carefully specifying a solution), and engineering (which is pure implementation of that specification). Robert made comparisons between software engineering and other engineering disciplines, so I asked him what he believed were the fundamental differences, if any. He suggested it was a matter of degree. I speculated that the critical difference was the degree to which a designer or engineering designer could be confident in their understanding of a problem or a specification; understanding the site on which you build a bridge relies on natural sciences, which are predictable to a degree that is not true of the human, social, and organizational systems for which software is typically designed. This lack of confidence creates a need for prototyping, feedback, and evolution that is not as necessary for other engineering disciplines (and also not as feasible). We also talked about the education necessary for all of these skills, and the rate of change he expected. He expected a lot more change in the past 50 years than he’s observed, and speculated that human nature is a lot more resistant to change than he ever believed. I suggested that it might just be a failure of effective education, combined with the rapid increase in the number of developers from about 10,000 back in 1968 to 30 million in 2018. He encouraged me to temper my expectations about change; I told him that as a tenured professor, I was in it for the next 40 years, and would be patient.
By chance, I also found Brian Randell, Thursday’s 50-years of software engineering keynote speaker sitting alone. I asked him about why he believed the 2nd NATO conference was so disappointing, and what effect he believed it had on the coming decades of software engineering research and practice. He argued that much of the problem was that in the 2nd year, there were divisions along two ones. First, some people envisioned a world in which we could ship completely defect free software and others believed such a thing was not possible and we should plan for it. Along a second dimension, some people were interested in deconstructing the problem of software engineering and others were interested in the tools, techniques, and other solutions they believed could improve it. The attendees divided along these lines just couldn’t get along. The idealists and realists didn’t know how to collaborate and the problem-centered spent too much time criticizing the solutions of the solution-centered people, while the solution-centered people were resistant to feedback. I suggested that many of these divisions still exist in modern software engineering research today and thanked Brian for helping illuminate the historical origins of these divisions.
Ivar Jacobson, a major contributor to UML and the Rational Unified Process, gave a talk titled, “50 years of software engineering, so now what?” He began with an anecdote about one of his first software engineering project, where he had to admit, he knew nothing about software engineering. And yet, he still led one of the most successful Swedish products in history. His interpretation of software’s success is ultimately because of business models and developers, not software, and not process. His view is that after 50 years, its still more of a craft than an engineering discipline. In fact, in history, he argues that we’re much more driven by fashion than science: object-orientation, UML, CMMI, Agile, and whatever is next, were and will be all fashion. Ivar’s argument was that all of methods wars have been distractions. The real problem, according to Ivar is that methods are really compositions of practices, and methods are monolithic and trapped in prisons guarded by gurus. In Ivar’s view, this is immature and foolish.
His recommendation was to focus on finding some common ground on methods, modularize methods, and free practices from methods. He talked about a standards body that envisioned a notion of practices that have activities, which have some success criteria, and work products that come from activities that are assessed against these success criteria. His key point, however, is that all of these requires developers to have competence in all of these things. The success criteria boil down to customer needs, the solution being produced, and a team to achieve it. He presented several more detail about the states one goes through in his model. What he described to me sound like a scientific theory of process and a set of process ideas derived from this theory; something to be tested and refined, not gospel. In the end, he actually called it a descriptive theory, and called for researchers to further develop it into a predictive and explanatory theory.
Immediately after Ivar’s talk, I gave my ICSE Most Influential Paper Award talk. In the middle of an award session, I could tell people were tired, and ready for the end of the week. My talk had a somber, reflective tone, but encouraging tone, and although the silence after the talk was deafening, the chatter on Twitter was invigorating, showing a community that really believes and values what I had to say, and is hungry for guidance on how to do it.
Andreas Zeller spoke immediately after me after receiving his SIGSOFT Research Award. He gave three stories about his career, all focused on impact. The first story was about his first project and presentation in which he had contributed a solution looking for a problem. Disappointed with the feedback, he rebounded by focusing on the GNU DDD debugger, which had real practical impact. His first epiphany was that finding real problems was so essential, but also a great way of having impact. His second story was about simplicity. Someone at a conference was disgusted that his idea of delta debugging was so simple. This led to impostor syndrome, a sense of intellectual inferiority. But he realized over time that simplicity was power; complexity was failure. His final story was about the work he started with Tom Zimmermann on mining software repositories. He observed that the fears about the results of their early work simply didn’t matter, because it was the fact that the work was new. Innovation is about studying the dark understudied but relevant parts of the world. Ultimately, Andreas argued that the only thing that really matters is impact. He ended with an inspiring call to pursue our dreams and to persist.
Saying goodbye to Gothenburg, it’s challenging to summarize everything I learned at this year’s ICSE. But let’s try anyway:
- Ultimately, we’re all in this community to improve software. Let’s focus on that, not on short term metrics.
- We need bigger ideas, most likely in the form of theories, to guide us and guide our impact.
- We must think about relevance, not publishability, to achieve the goals above.
- We ignore the human factors in software engineering at our peril.
These are lessons that each and every member of our community must eventually internalize. It’s been 50 years since we’ve realized their importance, and we’re only just beginning to take them seriously.