Koli is pretty far north. It’s also pretty!

Koli Calling 2019 trip report: Computing education research at the limits

Published in

Bits and Behavior

18 min readNov 24, 2019

One perk of academic life is the endless opportunties to travel the world. I’ve been nearly everywhere, crossing North America, South America, Europe, Asia, and Australasia. And yet despite all of my travels, there’s always somewhere new to travel, full of new people to meet.

One of those places is northern Finland, far from the population center of Helsinki in the beautiful Koli region. This region is home to Koli National Park, which is 30 square kilometers of wooded hills and lake shorelines. Right on the water is the Break Sokos Hotel Koli, a cute little hotel and retreat center with a tasty breakfast buffet, very comfortable rooms with heated bathroom floors, a beautiful spa in the Finnish tradition, and an abundance of nature walks with gorgeous views of the lake. For 19 years, the Koli Calling conference has been held at this hotel, bringing together 50+ people from across the world to discuss rigorous contributions to computing education research.

A photograph of my Finn Air flight departing. — My Helsinki flight leaving without me.

Of course, one of the costs of traveling the world is traveling. My travel was fraught from the beginning. I purchased my ticket some time ago, but then came out as trans and changed my name. I rushed to get my updated passport with my new name, change my tickets (on a reservation that spanned British Airways, Finn Air, and American Airlines), change my name with all three of those airlines in order to change my tickets. That took a dozen hours of phone calls, emails, and faxing, and was only finalized last week.

The day of travel was no less turbulent. It began with British Airways announcing a defect in their flight plan software that led all flight plans to be unavailable. This delayed my departing flight by 2 hours. I broke a nail at the beginning of the flight, my seat mates talked the whole time, and trying to make my connecting flight, I sprinted from terminal 5 to 3 over a mile of Heathrow airport, but still missed my connecting flight to Helsiniki. I even misplaced my passport along the way. But things picked up: someone found and turned my passport in, I was rebooked on a slightly later flight to Helsinki and Joensuu. The conference organizers arranged a late taxi at 1 am. I eventually made it to the hotel and my bed by 2:30 am, and got a bit of sleep before the 8 am breakfast.

A photograph of the opening slide, showing Nick Falkner and “Koli Calling 2019.” — Nick Falkner opens the conference.

The conference started with a clear articulation of the retreat culture: a collaborative and inclusive community, a strong dose of “splendid isolation,” a welcoming place for work of all levels of maturity, and a commitment to Finnish love of spa and outdoor. Oh, and a squeaky pig to tell speakers when time is up. The 69 attendees came from all over, but mostly from Europe, including Finland, Netherlands, Macedonia, UK, Germany, Ireland, Norway, Estonia, Sweden, as well as some attendees from USA, Australia, Japan, Portugal, and Nigeria. For such a small venue, the publication activity was quite large, including 82 submissions, spanning 40 full papers, 40 short papers, and several posters.

Day 1: Sessions, Sessions, Spa

Session: Concepts and understanding

A photograph of Andreas in front of his opening slide. — The opening slide of Andreas’s talk.

Andreas Muehling presented a talk on a way of measuring students’ understand of object orientation. This is part of a larger quest for developing “validated” instruments—context for the scare quotes to come later when I summarize our work on assessment validation—broadening the set of educational measurement tools for supporting research and teaching. The specific focus of this work was on understanding the interaction between objects, such as messaging between objects, inheritance. Their assessments remove almost all procedural programming elements, including a few classes, a small amount of state, and method calls that modify state, and focus on program tracing of state modification in encapsulation contexts. With some standard psychometric modeling, they built an assessment that was well-balanced in terms of item difficulty that is suitable for early formative assessment of object orientation knowledge. This work is a good foundation for studying this learning, including a practical instrument, but it also reveals just how much work there is to do to mature measurement in our community: we need more instruments, better instruments, and for too wide a range of rapidly evolving CS concepts.

A photograph of Gregor’s first slide, which says “It’s like computers speak a different language.” — Gregor’s first slide.

Gregor Große-Bölting presented the next talk on students’ conceptions of CS as a discipline as a factor in dropout. The interviewed 14 students in a first quarter CS class in higher education at the beginning of the class, and solicated definitions of CS from 311 students, investigating the variation in perceptions and how they contrast with professional views on CS. The study was primarily a replication. They found some who defined CS as programming, some as mathematics and logic, some who differentiated between hardware and software, some who thought of CS as a translation activity, and some who had so perception at all. By far, the most common was having no conception. One of the biggest underlying factors is that only about 1 in 4 students persist in their program, making it hard to interpret the relationship between this and dropout.

A photograph of the analysis tool, showing code on the left and time complexity on the right. — Tapani demonstrating his tool.

The last talk was by Tapani Toivonen, who talked about how to teach time complexity analysis of algorithms. He scoped his work to primitive recursive functions, and approached the instruction from a modeling approach, helping students understand asymptotic behavior of these functions. The embodied this in a tool that puts algorithms and time complexity behavior side by side, much like a profiler, but about overall behavior rather than specific test cases. The tool can support a diversity of kinds of analysis, but they haven’t evaluated any of these use cases yet.

Session: Curriculum and Course Design

A photo of the title slide of the talk. — Christine starts the talk.

After a short break, the next session started with Monica McGill and Christine Liebe, who presented an international comparison of K-12 education with a huge list of coauthors. They conducted an international survey across Austrialia, England, Ireland, Italy, Scotland, Malta, and the United States. They had 244 responses from K-12 teachers. The particular question was the difference between intended curriculum (national reports and curricular standards), and enacted curriculum (what actually happens in classrooms, including instruction, learning technology, and pedagogy). They contributed an instrument called METRECC, which can be found on csedresearch.org. They launched a small pilot study, and itself had a few interesting findings. Most teachers identified as women, half had taught for more than 12 years, most were teaching in secondary, and variation across the standards was exceptionally high. Really the only consistent trend was the teaching of programming skills, and that primary school students were taught with visual editors, secondary with text editors.

A diagram showing all of the students, curricula, and concepts. — Vangel motivates his work.

Vangel Ajanovski presented the next talk on a “Body of Knowledge Explorer,” helping students over time grasp the totality of what they’re learning in CS. The work had a fascinating motivation, which was the dramatic, neverending evolution of the programs and curricula. They used the ACM curriculum guidelines as their basis. They did an incredible amount of work to analyze the guidelines, translate them to Macedonian, create a mapping to their curriculum, then visualizing them. They haven’t evaluate it yet, so it’s not clear yet how it might help students, but the content analysis was certainly impressive in scope.

A slide showing a Parsons problem, showing an unordered list of statements. — Barb explaining Parson’s problems.

Barb Erickson presented a continuation of her work on adaptive Parsons problems, investigating students’ experiences with adaptivity. The particular kind of adaptivity removes distractors, provides indentation hints, and combines blocks after it detects that a student is experiencing difficulty, as well as doing “outer loop” adaptivity, making successive problems easier and harder. Barb studied teachers using this to develop CS content knowlege, and found that 1) teachers noticed the adaptivity, 2) that adaptivity helped prevent teachers from giving up, and 3) that adaptivity helped teachers identify errors.

A photo of Monica’s slide on the replication crisis. — Monica describes the replication crisis.

Monica McGill presented another paper, this one about the replication crisis, and how it applies to computing education. She mentioned a journal article that I worked on, which showed that replication is quite rare in computing education, but not more rare than in other fields (and possibly better). But she pointed out that before we can have a replication crisis, we also just need better empirically-driven best practices. Her talk was intended to spur discussion; I wondered about the challenges of trying to replicate in our field, where diversity is inescapable and our phenomena are so socioculturally bound. Which phenomena span all of our contexts, and which are contextually bound? My student Greg Nelson asked about how we can incentivize publishing negative results, and what minimum bar we might set for publishing them. Monica suggested the open science initiative being one solution to this, others suggested ensuring that there’s enough space for details to support replication in our publication venues.

Lunch

I had lunch with Juha Sorva and two Finnish PhD students. We had a fascinating conversation about the parallels between academic writing and program writing. One of the more interesting themes was the challenge that people often have perspective taking on their audience: who will read a research paper or program, what will they know, what are their beliefs? This is not only one of the big gaps in skills, but it’s also behind many of the problems in program and paper writing.

Session: K-12

Judy Sheard presenting in front of her title slide. — Judy Sheard starts her talk on her team’s literature review.

Judy Sheard presented a paper on a global 15 year review (2013–17) on teaching introductory programming in schools. This particular work emerged from an ITiCSE working group two years ago. They followed systematic review guidelines, focusing on all research on introductory programming, finding 1,666 papers concerning higher education, and another 108 on K-12. The big question was how much we have advanced. Some of the trends they found include 1) there’s growth in this topic in both K-12 and higher ed, 2) most focuses in computational thinking but not programming, and 3) that there are serious gaps in CS as a discipline in K-12, with little professional development to fix this, and few resources to support them.

The opening slide, saying “An Examination of Abstraction in K-12 Computer Science Education.” — Christine presenting her opening slide.

The next paper was on abstraction in K-12 education, presented by Christine Liebe (K-12 teacher and a postdoc). They began with a definition of abstraction by Perrenet, Kassenbrood, and Groote (2005), and Armoni (2013), and conducted an interview study of 12 teachers across K-12, but mostly experienced teachers in secondary who were new to teaching CS. She found that: 1) teachers had a sense of abstraction, but did not teach or assess it, 2) those that did tended to have CS backgrounds, 3) some teachers assumed that students would learn abstraction implicitly,

Radu’s slide showing an image being analyzed to detect an object. — Radu describes object recognition.

Radu Mariescu-Istodor and Ilkka Jormanainen presented on teaching machine learning in K-12, in the domain of object recognition. They described an approach to teaching object recognition through a series of metaphors, and experiences running a 2-hour workshop in Romania. The interesting thing about the work was less the instruction, and more the intricate effort to build upon prior knowledge of matrices, geometry, sketching objects on paper, and testing a machine learned classifier. It was an experience report with a surprising amount of instructional richness.

Session: Metacognition and Computational Thinking

James Prather started the next session, presenting a continuation of work on metacognitive scaffolding (in collaboration partly with my PhD student Dastyni Loksa, who also studies metacognition in programming). The prior work, published at SIGCSE, was a qualitative analysis of an intervention that involved having students mentally execute a test case prior to solving a problem to develop awareness of how to solve the problem. The central question in this work was a larger replication (n=976) of that amenable to quantitative analysis. They found that the intervention didn’t result in different completion rates, but it did reduce errors that resulted from from an incorrect mental model of the problem (but led to other errors). This raises some questions about how frequent misunderstanding the problem is, and how consequential misunderstanding the problem is relative to other metacognition issues.

A photograph of Friday on the left, and a title slide on the right. — Friday opening his talk.

The next talk, presented by Friday Agbo, was another systematic literature review, this time on computational thinking (CT) in higher education. This is an interesting phenomenon, because CT has generally been used in K-12, not higher education. The authors found an interesting set of works on applications of CT in non-computer science contexts that I wasn’t aware of at all.

A slide showing eye tracking results in the two domains, including many circles overlaid on programs and music notation. — Natalia discussing eye tracking as a method to understand the parallels between music and programming language notation.

Natalia Chitalkina presented the last talk of the session, investigating the parallels between source code and music notations. The paper presented a fascinating theoretical account of human perception, the role of prediction in the perception of notations, and the numerous parallels between music notation and programming language syntax. It wasn’t clear to me where this analysis leads, but it was an interesting, and thought provoking analysis.

Session: Motivation and Skill Development

A photograph of teachers and small robots used in the courses. — Bianca shows her team of people and robots.

In the last session of the day, Bianca Bergande present on a comparison of Nao Robots and Lego Mindstorms on intrinsic motivation of CS majors (a la self-determination theory). I didn’t observe a particular explanation for why these different robotics platforms would result in different outcomes, but there were differences: the Nao robots were superior at changing intrinsic motivation. The authors (and I) speculate that the anthropomorphization of the Nao robots was the important factor, particularly because the scales were sensitive to students’ self-reported “relationship” to the robot. The Nao talked, it expressed emotions, and danced, whereas the Lego was merely embodied. However, the tasks were different between the two tasks, so it’s possible there were task features that explain the difference.

A photograph of a diagram of the 3 meetings, which happened over time prior to a final presentation. — Richard describes his intervention.

Next, Richard Gault presented a short paper on supporting undergraduates in research. Their goal was to develop a series of meetings that would be more useful to immersing students in a research community. With 8 students, they planned a seminar once a month, each with 3–5 speakers by students, each presenting informally to their peers. Students described a lot of evidence of peer learning and confidence in communication skills.

The last talk of the day was by Doudou Fall, who presented a game that teaches several concepts about developing secure Internet of Things devices. Some of the skills included risk assessment, vulnerability assessment, and communication skills. They followed a competitive card game paradigm, and found that students were engaged and reported learning, though not signficantly more so than other related games. It’s unclear whether they actually learned, but the effects on motivation were more evidence.

A panoramic shot of four doctoral students preparing for lightning talks. — The doctoral consortium students present.

The last session wrapped up with some very quick lightning talks from doctoral consortium students, including my student Greg Nelson, who talked about his work on programming language learning and assessment. Other students talked about concurrency, cybersecurity, and community aspects of employability.

Spa Time

A selfie of me, the Azul game, Nick Falkner, and others. — I played a round of Azul, a weird combination of constraint satisfaction and playing with tiles.

One of the traditions of Koli is having spa time, including hot tubs and saunas. The organizers were conscious of the fact that I might not be comfortable participating, having recently transitioned socially, and offered to arrange alternate activities. This was greatly appreciated—I am so far from being comfortable in a bathing suit in private, let alone public! I happily engaged in the organizers’ alternate plan of paying board games over drinks, so I spent the evening connecting around games, including Azul and That’s a Question.

A photo of several smiling attendees posing in front of board games. — Several attendees got to know each other through a hilarious game of “That’s a Question.”

Day 2: Keynote, hiking, sessions, and posters

After waking up way too early, eating another tasty breakfast, and having an enriching conversation with Judy Sheard and Mark Guzdial about tenure, promotion, feedback, and resilience, I set up for my keynote.

A recording of my keynote, at an ominous angle.

Keynote: 21st Century Grand Challenges for Computing Education

I was very excited to give my keynote, and hopeful it would change some minds about the kinds of research questions we choose. Rather than detail it here, see my separate blog post, which includes links to my recorded presentation, rehearsal talk, and slides. The gist is that I believe CS education is uniquely positioned to address epistemic challenges in the world such as the destabilization of democracy, climate change, disinformation, and marginalization, largely because its positioned itself as the key infrastructure behind these problems. It was wonderful to speak to an open-minded audience about these challenging issues. The response was wonderful, and transformative for some. I suspect many attendees are highly motivated to address these issues, but just don’t know how. That’s precisely why CS education research is critical: we need to invent these ways and understand what works.

Session: Assessment

After a brief break, presentations resumed, this time on feedback and assessment. Apologies for the brief summaries; I gave some of my attention to posting about my keynote.

Anni Rytkönen and Venla Virtakoivu presented a study of assessment in a programming course. They wanted to understand the value of different types of exam contexts with varying time restrictions and space restrictions. They found that students preferred writing code more authentically on computers and not paper.

Sten Mäses presented a method for adding cyber-ethical questions to a cybersecurity course. He used the Oxford Utilitarianism Scale on deontolgical and utilitarian views to get a sense of students’ ethics, and a cybersecurity attitude scale, he taught ethical dilemmas, game theory, different ethical philosophies, and intellectual property. Students didn’t change much on the ethical scales after the course, but qualitative investigations into the labs revealed a lot more of the changes in students’ thinking around risk.

Steven shares some of the clear benefits of peer assessment in teamwork.

Steven Bradley presented work no removing bias in peer evaluations in team contexts. He investigated the reliability and validity of peer assessments relative to instructor assessments. He found that students are reasonably valid, and that students get value of it, but it left open many questions about reliability.

Greg describes the design of our formative assessment.

The last talk in the session was by my PhD student Greg Nelson on contributions of a formative assessment of program tracing skills. The central idea of our work was to develop a granular assessment for tracing, making it easier to provide precise feedback about what they do and do not understand about programming language semantics, rather than all-or-nothing correct or incorrect feedback. We achieve this by decomposing semantics into many granular types of questions, and precisely modeling what learners do and do not get right. These ideas came from an extension of recent new ideas about assessment validity in educational assessment.

Nature walk

A selfie of me bundled and smiling with the lake in the background. — A selfie at the top of the hill, facing east towards Russia.

After the session and a quick lunch, most of the attendees went out for walks of various lengths into the park. The nature, as cold and foggy as it was, was incredible and raw. I had many fascinating conversations with attendees about my keynote, ranging from changing CS curricula, the complexity of accepting social responsibility in CS education, and the need for a wide range of examples about how to reshape CS education around concept of the limitations of computing and data literacy.

Session: Posters

A photograph of desks, posters, and attendees mingling. — The edge of the poster session, crammed into the tiny Koli conference room.

After the walk, we came back for a poster session, and I got in many enlightening arguments with attendees about just how responsible they are for dealing with the complexity of the world. Some said that as long as they only focused on algorithms, they could isolate themselves from any issues with the data or misinterpretations of their program’s output. I argued they were still responsible. Others shared that they were interested in CS because it was a shelter from the complexity of the world. I appreciated that desire, as I shared it when I was younger, but that this didn’t mean we weren’t responsible. I left the session convinced that one of the biggest barriers to changing CS education will be convincing computer scientists to accept their power, their role in society, and their duty to use their power responsibly.

Session: Teaching programming

A photo of Mark describing Vega Lite for data visualization. — Mark gives an example of working with social science educators.

Mark Guzdial presented on task-specific programming languages. His basic argument was about integrating CS into the rest of academia, and an exploration of challenges in doing it. He talked about task-specific programming as a form of highly usble and easily adoptable programming for non-CS classes. Different from domain-specific languages, Mark is focused on much more narrow, specific tasks than entire domains. One example he gave was a Vega Lite, and the environment that supported it, which enabled teachers to start programming visualizations in less than 10 minutes. Another example was an image filter authoring language that uses matrix manipulations. Mark reported on a long journey to iterate, getting insights from teachers about how languages can help them teach and help their students learn.

A photo of a list of programming languages used, including Scratch, Python, Arduino, Mindstorms, and others. — Fenia discusses the long tail of programming languages used in coding clubs in The Netherlands.

Fenia Aivaloglou presented on coding clubs from a gender perspective. They specifically investigated learning barriers and gender differences that teachers reported. Some of the most notable findings was the massive diversity of programming languages taught (though most were Scratch), that most teachers had CS backgrounds but not education in teaching, and debugging (as usual) are the biggest challenges. The work found that teachers reported a range of gender stereotypes, across a range of countries in Europe, as well as India.

A photo of the two programming environments, showing code samples, toolbars, and output. — Mazyar describes the environments.

Mazyar Seraj presented a study about the influence of Scratch and Blockly on girls’ programming skills and attitudes. They engaged girls aged 10–14 with these two environments in informal learning settings after school; both led to learning gains, but not significantly different ones, and both had mixed impacts on a measure of enjoyment of programming.

Viren gave us context about Indian education.

The last talk of the conference was on bilingual learners, presented by undergraduate Viren Abhyankar. With his coauthors, he explicitly investigated the use of natural language in Tamil Nadu in southern India in CS education. Some are taught in English, some are taught in Tamil, and the students who don’t know English struggle significantly more since they have to learn both English and programming at the same time. They wanted to understand whether bilingual education was more effective than just one language at a time. They compared English only to English and Tamil, and used code switching to use English for definitions and facts, but Tamil for deeper instruction. They didn’t end up finding a well matched control group for prior knowledge, and so they observed few differences quantitatively, but qualitatively, students in the bilingual group asked many more in-class questions.

Attendees eating salad, chicken, fries, and a cute little chocolate cake. — We had a lovely three course meal after the last session.

Reflection

The rest of the evening was networking, more spa time, dinner, and checkout, which left plenty of time to discuss and reflect. Since this was my first time to Koli, and there are many people in the computing education research community who probably haven’t been, I have a lot of thoughts.

The venue is excellent. It’s so very cozy, clean, and unique. The rooms are fantastic, the showers are wonderful, the food is good, and the spa and saunas, even though I didn’t visit them (boo to trans insecurities), were a highlight for many. I can see why so many Finns travel hundreds of kilometers for a relaxing weekend.
The community is great. The Finns are so welcoming, so engaged, and so constructive. People presented work in all kinds of stages of maturity and rigor, and every single one of the questions reached for ways to both make the most of the contributions, but also help improve them.
As with many computing education research venues, the quality of the discoveries was highly variable. Some papers were outstanding in their use of the latest methods, approaches, and techniques for understanding learning and teaching. Others were theoretically and empirically sloppy. This is more a reflection of the state of our field, and less a statement on Koli. The same concerns were raised at ICER 2019 in August, which I see as a great sign that there are better researchers joining the community every year.

If it weren’t so far away, and if it weren’t right before the U.S. holiday of Thanksgiving every year, I’d consider coming every year, for the peaceful retreat and wonderful company. I highly recommend coming at least once, and skimming its proceedings each November. And if that doesn’t entice you, come for the stunning, chilly, life-threatening views of Russia, like Greg did :)