Dagstuhl trip report: creativity, code, and generative AI
Sometimes I go to a castle and talk about computer science.
I know, that sounds weird. But it’s real: Schloss Dagstuhl is a meeting venue sponsored by the German government to bring together computing researches to work on emerging issues. I’ve been in many different areas of computing, and so I’ve had the fortune of being invited to many, but mostly about people, programming, and the many complex promise that arise when we do it.
My first was in 2007, on End User Software Engineering (2007), where we talked about how to make the software that non-developers make more reliable. A few years later, I had one on Practical Software Testing (2010), where we explored the human aspects of making sure software works as intended. A few years after that was a meeting on Software Development Analytics (2014), exploring the intersections between software development and data science. And so many more: Human-Centric Software Development Tools (2015), Assessing Learning in Introductory Computer Science (2016), Rethinking Productivity (2017), Evidence in Programming Language Design (2018), Notional Machines and Programming Language Semantics (2019), and Educational Programming Languages (2022).
That is a lot of Dagstuhl meetings! And so as I traveled to my 11th meeting on Creativity, GenAI, and Software Development, I began to reflect a bit on what the impacts of all of these meetings have been, on my scholarship, and on me as a scholar in general. There have been all of the intended things, such as new research ideas and directions, books, community resources like the computing education research FAQ that I maintain. But more than that, I feel like the privilege of coming to so many of these has given me some wisdom about the roiling, ephemeral early phases of discovery more broadly:
- Progress fundamentally requires the intimate focus of people, particularly smart, curious people who are very different from one another, but nevertheless eager to learn and grow. Sequestering many of them in a castle with nothing else to do is a great way to catalyze this.
- Twenty years isn’t very long in research. I can look back to my first Dagstuhl in 2007, and point to the many impacts of the research that came from it. But 18 years is a pretty good chunk of my 45 years of life! This all reinforces how much of a privilege it is to have the freedom to think about long term futures, but also how essential it is to progress.
- We have bodies, and that cannot be ignored. Connecting with each other in this way takes long, expensive, exhausting travel. It takes food, sleep, and breaks. So much about this thought work is dependent on the careful maintenance of our wellness, safety, and security.
I tried to bring that wisdom to my 11th Dagstuhl, remembering that while we might imagine the coming decades this week in provocative new ways, it would only happen with an open mind, a lot of patience, a lot of serendipity, and much rest.
Monday mapping
We spent the morning in a lively series of activities: one a kind of speed dating around the topic of creativity in software engineering, another with a panel of three attendees where we all acted as journalists probing ideas, and then a third where we synthesized our sharing and mapped the next discussion.
Rather than try to describe all of these rich interactions, instead I’ll share some of the key insights that emerged from all of this mapping:
- We all had very different perspectives on creativity, including many different scales at which we considered it (individual, team, organization), and many different contexts in which we felt it was present or important (process, outcome).
- There was a strong bias toward utilitarian, capitalist framings of creativity, viewing it as a resource for productivity and proft, even though software development is created in many contexts that are not about profit.
- There were many beliefs about how AI might change creative practices, in both transformative and disruptive ways.
- Many beliefs about creativity being innately human, with some people believing would AI free people to be more human, others that it might take our agency, motivation, and humanity.
- There was a belief that creativity and critical thinking must come together, especially as AI generates slop requiring discernment and evaluation.
These perspectives led many to feel that our subsequent discussions needed to focus on things like how to preserve human dignity and agency when engaging AI, how to develop more principled applications of generative AI, how to map its limitations and boundaries, how to reason about skill loss, and the emotions that come with that. I could also sense an emerging tension between those of us approaching the phenomena from a more humanist perspective, versus those who were approaching it from a more capitalist or engineering perspective.
After lunch, we spent much of the afternoon working through some of the many ways that generative AI might “destroy” creativity in software development, and things we might to do to mitigate that destruction. The risks were many:
- Increased capabilities of models might reduce the need for human creativity.
- The valuing of generative AI creative output might devalue human creative output, reducing motivation to be creative.
- Companies might (and often do) force employees to use generative AI for creative tasks, even though it might not be constructive.
- Generative AI corporate monocultures might result in employees who refuse to use generative AI not being promoted, or even being fired, losing their distinct perspectives.
- Expectation to use generative AI might further reduce people’s unstructured creative time, forcing delegation to generative AI to meet production goals.
- People might use generative AI instead of talking to other people, narrowing the kinds of creative work that might happen.
- Human thought might be increasingly funneled through explicit or implicit prompting interfaces, rather than the diversity of ways it happens more broadly.
The group also thought of many mitigation strategies for the above, such as:
- Focusing on outcomes instead of process
- Exploring and investing in the many kinds of AI literacy necessary to prevent poor management decisions, and to foster more critical thinking about generative AI usage.
- More detailed labeling of the provenance and meaning of data involved in producing outputs, to complicate how people interpret the meaning and purpose of the output
- Injecting productive friction in generative AI tools, such as tools that fatigue, or interfaces that incentivize talking to subject matter experts instead of relying solely on generative AI tools.
There was a sense by the end of the day that we had explored a lot, and had identified many aspects of creativity that might be worth exploring, but that we needed to get down into the details soon to make more progress.
Personally, throughout the day, I appreciated the diversity of frames that I heard in this room. These included humanist views that started from a place of empathy around human dignity, and political views that examined who had and didn’t have the power to shift use of generative AI. I also heard engineering viewpoints, mostly about how to make generative AI better for creativity, and organizational views, about corporate culture, management, and vision.
After dinner, I played board games with several attendees, particularly Superfight, in which we argued over which ridiculous randomly generated superhero would win a battle. Maybe random fun is the best use case for generative AI…
Tuesday talking
In the morning, we began by building upon the knowledge from the previous day, ideating more concrete forms of creativity mitigations from threads of generative AI. I was at a table with the wonderful Mary Shaw, Andre van der Hoek, and Marian Petre. We talked about two ideas:
- How to support software developers in systematically analyzing a situation requiring creative problem solving, like working through possible algorithms to an ill-defined requirement. We imagined a kind of shared knowledge representation, like a diagram, with a naive AI thought partner asking probing open ended questions to catalyze deeper thinking.
- How to help people build practical models of generative AI in particular domains, exploring the metaphor of a reference librarian, who help people understand a domain and what kinds of questions are reasonable to ask in the domain.
Others looked at ways of using generative AI to instill creative friction, models of what human abilities are best fit to creative work, how to analyze the value and cost of LLMs in creative work, thinking about the pedagogies of AI free days, models of offloading non-creative work to generative AI, and ways of growing self-regulation and metacognition skills to support better decision making about AI usage.
After a nice coffee break, we then considered problems and solutions at the organizational level, brainstorming through a framework that considered the structure, trust, actions, and results in an organizational system, focusing on creativity and the role of generative AI in it. We found a wide range of interventions:
- Enforcement of creative thinking best principles, either to constrain generative AI usage, or using it to constrain.
- Thinking of generative AI as a shadow AI workforce that acts almost like the “downstairs” in a British class hierachy, privately judging the worker’s performance and forming recommendations for how the house might work better.
- Using generative AI to notice and alert trust gaps and opportunities to create relationships.
- Using generative AI to critique KPIs and other narrow conceptions of performance.
We ended with a brief design charette to imagine some concrete actionable interventions in our own contexts at work. My own personal idea was to take the design tagged issues in open source projects and have generative AI give me ten bad design ideas for them, to help address cold start ideation and prevent anchoring on a mediore idea.
At lunch, my table talked a lot about organizational culture, the inherently social nature of management, and the many lived experiences everyone at the table had with poor management.
After lunch, we did some mapping between software development workflows through an ecosystem model of birth, maturity, creative destruction, and renewal. There were numerous ideas for intervention, though I found it interesting how many talked about applications in feedback synthesis, requirements engineering, collaboration, and ideations. There were of course the usual explorations of programming and verification, but there was a sense in the room that it was the specifications and higher that felt like the frontier.
Before cake, we turned our attention to the many dimensions of human agency, responsibility, and values interlocked with generative AI usage in software development, and how they interact with creativity. There were many draft principles that the group converged on:
- Human judgment is a necessity; generative AI is a mere supplement.
- Human responsibility is unavoidable and therefore humans should always be in the loop.
- The diversity and richness of human experience is an irreplaceable resource for creativity
- Generative AI seems to be a reasonable support for divergent thinking, and for routine tasks, but little else.
- Transparency, integrity, fairness, credit, and other human values are central, but broadly ignored in current manifestations of generative AI.
After cake, we used a fishbowl format, bringing folks from industry to the center, and academics around them, to discuss ways they had seen generative AI expand and flatten creativity. The stories were telling:
- There were examples of ways that AI could overcome organizational waste, but that didn’t deal with the root cause of the waste.
- There were many stories of playing with AI, asking silly questions, and experimenting with it, sometimes to escape from stress, and sometimes as a form of R&D to see if there was value downstream. Most of the bigger efforts were out of reach, unless a company had significant wealth.
- There were many ways that generative AI was being used to overcome creative cold start/blank page problems.
- Few had found good use cases for requirements engineering, design, or user research.
- Many organizations didn’t see creativity as an explicit goal or value, even despite evidence that it is essential for organizational resilience.
Overall, I appreciated the storytelling of the day, the concrete examples of success and failure and how they expanded my index of what these bullshit machines are to different people in different contexts. I don’t know that it eliminated my skepticism — in some ways it reinforced it — but there were a handful of interesting cases that felt meaningful and worth further exploration.
Wednesday wondering
Wednesday was a half day, as the afternoon was reserved for adventuring in Trier, Germany, or hikes around the grounds. I started off with breakfast, and a lovely conversation about the many ethical dimensions of generative AI that are often not accounted for. We talked about Black Mirror and the former ACM Future Computing Academy’s recommendation around requiring worst case ethical analysis in peer review.
We spent most of the morning generating research ideas, and mapping out areas of a research agenda. I started off with a question of most interest to me, wondeering how do developers at the margins of identity, ability, and culture view the benefits and risks of generative AI in facilitating software development? Others ideas were wide ranging, including:
- Characterizing the role of creativity in software development
- Human-AI collaboration in software development
- AI-supported design environments and tools that facilitate creativity in software development
- Synthesis of research to impact practice
- Critical perspectives on generative AI in software development
I focused on the last one for the rest of the morning, holding down the critical corner of our group. Our group came up with a broad, incisive agenda that centered on intentional use of generative AI, that accounts of equity, inclusion, climate, and more, with implications for not only developers, but organizations, communities, governments, and society more broadly. We tried to imagine all of the specific methods that might be helpful, including microgenetic studies, organizational audits, comparative studies of AI and non-AI use, and even collaborations with climate scientists examining carbon output.
After the morning sessions, we had a great Trier tour and dinner where I had wonderful conversations about language, culture, and food with colleagues from France, India, and Russia.
Thursday thinking
We spent Thursday working through the research areas and ideas we had generated Wednesday afternoon. We started by thinking individually about what specific research ideas we wanted to explore. I wanted to work on understanding the perspectives of marginalized learners on generative AI and their creativity and learning. That includes a wide range of marginalization, from language, ability, class. I wanted to amplify their voices through research, but also imagine more joyful futures, in which marginalized learners have agency to shape generative AI, if they so wish.
There were many other topics that groups focused on:
- Defining creativity in software development, including the purpose, novelty, elegance of work, and the many kinds of work in which creativity appears.
- Richer more complex measures of relationship between creativity and AI usage, to get at the value of creative work in software development.
- Explorations of human-AI collaboration and AI-mediated human collaboration, especially theories and tools that more coherently account for creative human software development tasks.
- Ways of synthesizing research results about all of the above in order to both amplify research efforts but also shape practice in the broader world
I focused most of my time with a group of researchers interested in similarly critical perspectives of AI, as well as alternative visions of AI. My conversations were wide ranging, covering many of our project ideas from yesterday, but also the many risks of model collapse, dead internet, and other emerging trends around the loss of high quality training data. This led to an interest in a particular problem I’ve been noodling on for Wordplay: what kind of generative AI supports make sense for a language that 1) is designed by youth, 2) is multilingual, and 3) has nowhere near the training data necessary for reasonable output with foundation models. We mapped many of the challenges with this problem:
- There needs to be some way of organizing data creation and curation by a community, including incentives to do this work, focused attention on data gaps, and ways of doing quality control, especially if most content is coming from youth.
- There needs to be a benchmark that encapsulates all of the types of interactions we want to be able to support, such as asking conceptual questions about the language, getting examples of how to implement particular behaviors, and getting debugging support. The benchmark likely needs prompts, programs, test cases, and might need to be resilient to a diversity of potentially useful answers, and perhaps even wrong answers.
- It might be valuable to leverage and translate existing benchmarks to Wordplay, as well as even leveraging foundation models with a context of the programming language documentation.
- It might be necessary to synthesize content by creating variations of examples, or even by doing procedural generation of random programs.
- There are many questions about when to retrain, how to manage and communicate performance degradations, and how to manage the potential influence of multilingual localization on performance.
All of the above felt like many interesting challenges for computing education, especially to the extent that we want to train small language models for educational programming languages that have particular properties under the control of learners and teachers. Maybe I’ll write a grant proposal on the idea.
Friday farewells
After a final breakfast, and a bit of final reflection about our learnings, we did an extended collaborative writing session. Our goal was to begin sketching out a manifesto, with the broad theme of that creativity is the new productivity in software engineering, and that GenAI might be one catalyst for this transformation. We broke into five groups:
- Characterizing creativity
- Evidence about creativity
- Design spaces of creativity support
- Critical perspectives
- Synthesizing research
I worked with Richard Li, Guo Freeman, and Jeanette Falk on critical perspectives. We tried to organize the week’s thinking. Here’s a brief summary of what we wrote:
There are many ways of potentially using GenAI in software development to support creativity, but doing so uncritically risks overlooking potential negative impacts on developers, organizations, and society more broadly. A more critical use of GenAI would be intentional, in that all stakeholders who are designing and applying GenAI to software development would do so with a nuanced awareness of who it serves and who it does not, in what contexts, how, and what downstream impacts these uses might have on not only developers and organizations, but end users and society more broadly. Such work will also enable us to imagine more equitable and inclusive futures of GenAI in software development. We detail these many potential stakeholder groups, the many questions might ask about them, and the many research methods that might be necessary to support this inquiry.
After we finished drafting, we made commitments to short, medium, and long term plans, visualizing them in a constellation of written commitments.
Overall, it was a warm, welcoming, and lively week. I wouldn’t say that the topic was exactly in the critical path for my research on computing education, but it was broad enough that I found plenty of space to think about research on learning and teaching computing. I even left with a fairly detailed vision for a grant proposal I want to write about generative AI. And that’s a first for me, as a genAI skeptic! It’s rare that I’ve found anything I want to use genAI for, let alone do research on, but I think the week helped me find a way to recenter power in a meaningful way, around a technology that is usually quite disempowering in its structure and governance.
I also spent some time thinking about this Dagstuhl as a pivot point. Not just because it was my 11th, perhaps the start of another batch of ten for the second half of my career, but also because the world is at such a pivotal point. Here is this technology that is destabilizing our markets, computational world, and politics. Here is this human activity of programming that I’ve focused on for much of my life, being transformed. All around me, my civil rights, my country, my institutions are under attack, probably transformed forever by a thoughtless man and his cult of ignorance and hate. I’ll return home this weekend still struggling to find my way through this increasingly uncertain world, but also secure in the belief that I will find smart, kind folks to navigate it with together.

