Trip Report: Computing education at CHI

the conference was a zoo, but the animals played nicely #myfirstchi

I wandered up to Montreal, Canada for CHI, the ACM conference on Computer Human Interaction. I’ll share summaries and thoughts about papers related to computing education (and programming and learning more broadly), as well as talk about plans for a new special interest group to support work at the intersection of HCI and the learning sciences.

While the theme of CHI this year was engage, I also recognized the emphasis on inclusion. From the gender-neutral bathrooms to live-streaming every session to badges that recognize the 45% of attendees who were first timers to reserved seating for attendees with mobility diversities to sessions on feminist HCI and gender in HCI, CHI 2018 was about making people feel welcome. There’s work to be done (picking better opening keynote speakers?), but I really felt CHI strove to engage with diverse communities.

View from an after-party hosted by University of Toronto. After-parties provided a nice informal setting to converse and connect with others.

Learning to program: towards more diverse engagement

The Learning to Program session featured 4 papers from familiar and respected faces. A common theme to these papers was developing experiences and understanding how to engage more diverse programming learners.

April Wang (Simon Fraser), presented an honorable mention paper on how current learning resources fail conversational programmers, people who wanted to have a conceptual understanding of programming but didn’t need to write code. Conversational programmers tended to consult informal online resources based on recommendations from experts. The learning process was difficult because it took too much time, over-emphasized low-level concepts (syntax, logic), and lacked helpful explanations. April recommended developing content that emphasizes conceptual understanding rather than low-level syntax. How to develop conceptual understanding with minimal emphasis on programming is an open question. It seems like there are patterns of interest in topics (e.g. machine learning, encryption), so perhaps we can start supporting conversational programmers by creating content that addresses conversational programmers from a topic-by-topic basis.

Philip Guo (UCSD) presented on barriers, desires, and design opportunities to support non-native English speakers learning programming. He addressed the key disconnect between the English-centric programming resources and the largely non-English world with an interest in programming. He found that programming-specific “jargon” in learning resources was often lost in translation, searching online for relevant resources was difficult, style-guides in code and styles of code (e.g. identifiers) were often written with embedded English prose. For ELL programming learners, there was a feeling of simultaneously learning English while also learning programming. He recommended using simplified English and removing unnecessary jargon, creating high-quality code examples, supporting interactivity, and providing inline help. I think it’s critical to further consider how the presence of American English prose exists in programming languages, style guides, and code artifacts impacts the experience of programming learners who are also English language learners.

Rahul Banerjee (UW) presented his honorable mention 🏅 paper on developing a text-free programming environment which empowered English language learning families to jointly engage in programming. Currently, all programming environments (even blocks-based ones) have at least some text in them, making it difficult for families with English literacy challenges to engage in coding workshops. They developed BlockStudio, a blocks-based programming environment that is completely text-free. It uses programming by example to create games and teach programming. They then investigated how families mutually engaged with BlockStudio and co-created games. This work is incredible because it supports inclusion to English language learners and also challenges a long-standing design assumption that programming languages must have text!

BlockStudio uses programming by demonstration to create rules: (a) click space-ship to trigger event, (b) drag spaceship up to demonstrate the change, © click check mark to end demonstration, (d) clicking space-ship now moves it up. (from figure 2 from Banerjee et al. CHI 2018)

Sayamindu Dasgupta (UW) investigated how “wide walls” can increase engagement in Scratch. Sayamindu previously created and evaluated Scratch cloud variables, which enabled Scratch users to include data on their previous projects (e.g. blocks used) and community participation (e.g. # likes) into projects. He and Benjamin Mako Hill empirically investigated whether access to Scratch cloud variables could result in more usage of data structures, a challenging concept in Scratch. They found that a policy change (providing Scratch users access to their data) in an online community could result in broadening participation (using data structures). Scratch’s robust community continues to be an excellent environment to provide empirical evidence to theories of learning.

Augmenting learning: learners interacting with AI

My research interest is in augmenting learning by developing experiences which adapt as a learner develops. There were some very cool papers from CHI that used adaptive tools to do just that.

Molly Feldman (Cornell) presented on automatically diagnosing students’ misconceptions in K-8 math, joint work between Cornell, Penn, MSR, and UW. They developed a tool that used program synthesis to identify misconceptions even if they had not been seen before. Given a student’s solved math problems as input, the tool outputs code representing the problem solving process. Molly went even further and created a program that turned this code into a GUI so teachers could use it as well. This paper deserved an award because it is a fantastic demonstration of an adaptive tool that “closes the loop” by identifying a misconception and then making the finding understandable to an instructor. I think the ability to synthesize a problem solving process from examples to identify a misconception demands further investigation! #MustRead

Feldman et al.’s system to identify misconceptions has two major components: a thought-process reconstruction engine which uses program synthesis and a GUI for displaying reconstructed thought processes to an educator. The input is a set of problems solved by a student (possibly incorrectly). The engine attempts to synthesize a computer program from the input problems to try to explain what the student was doing. This program is then passed to the GUI, which automatically produces a step-by-step tutorial explaining the error to an educator. (Figure 1 from Feldman et al. CHI 2018)

David A. Shamma (FXPAL) talked about adding recommendation systems to MOOCs to support professional learners. He and his colleagues investigated the effects of adding a map of resources related to a topic of study for a MOOC (see figure below). David and his colleagues found that professional learners could find resources related to what they were learning more efficiently with the topic map. I like the idea of providing a topic map to learners and am curious about the design of the interaction; topic maps typically show topics that are less understood or unknown to a learner, potentially causing unnecessary confusion.

Bruce McLaren (CMU) presented work on Educational Game and Intelligent Tutoring System: A Classroom Study and Comparative Design Analysis on behalf of his CMU colleagues. This work compared algebra learning outcomes of a popular educational game (Dragon Box) and an intelligent tutoring system (ITS) developed at CMU. They found that students who used the ITS saw improved learning outcomes, whereas students who learned with Dragon Box did not. They theorized that the lack of learning outcomes from Dragon Box is because the game had too much scaffolding or that the tasks in the game were too different from solving algebra problems. They recommended drawing upon the best features from both systems, merging the personalization and feedback of ITSs with the narrative context of games.

Improving Feedback

Tricia J Ngoon and C. Ailie Frase (UC San Diego) presented their honorable mention🏅 work on CritiqueKit (demo), which provides feedback to creative work (UI design) at scale. They developed an adaptive tool to support real-time guidance to improve feedback and make feedback reusable. They defined good feedback as specific, actionable, and justified. I found this work very strong because it supports human augmentation and relevant feedback, and has a thorough evaluation.

Paul Denny (Auckland) presented honorable mention 🏅 work on empirically evaluating the effect of adding gamification (badge system, points) to an optional tool on learning outcomes. This work sought to provide empirical evidence to support Lander’s theory of gamified learning. Paul and his colleagues found a relationship between gamification (badge system vs point system) and learning outcomes, mediated by self-testing behavior. This improvement was most pronounced among high-performing students, which follows because high-performers would likely be the ones to use an optional formative assessment tool the most. They were unable to collect data on the human factors of the learners (e.g. gender, year, mindset, self-efficacy). This study does a good job of using the classroom to investigate a targeted manipulation.

Cool tools: augmentation to make learning to code easier (or unnecessary)

My metric for tools that I could see being integrated into a learning experience are 1) they align with a learning objective, 2) they support pedagogy in a meaningful way, 3) they are magical. While these tools were not explicitly designed with pedagogical implications in mind, I could see them inspiring or supporting the learning of programming or data science. And I’m pretty sure they’re magic.

Elana L. Glassman and Tianyi Zhang (Berkeley) sought to improve the usability of APIs by visualizing the API skeleton of usage examples. The tool they created, Examplore, segments code by parts of an API and provides a visualization to show the skeleton of the API. By looking at different examples with the API skeleton visualized, programmers that were unfamiliar with a Java APIs were able to answer more API usage examples correctly.

Examplore takes an API call that a programmer is interested in, locates uses of that API call in a large corpus of mined code examples, and then produces an interactive visualization that lets programmers explore common usage patterns of that API across the corpus. Figure 1 from Glassman et al. CHI 2018

Andrew Head (Berkeley) and his colleagues developed and evaluated CodeScoop, which interactively extracts examples from existing code. Addressing the difficulty of producing working examples, they created a mixed-initiative tool which takes as input working code, has a user select a segment that they want to be part of an example, and then interactively supports the creation of this example. Code examples are often key part of understanding a new programming concept or tool, so I see applications of this work to curriculum development!

Extracting example code from existing code with CodeScoop. (1) a programmer selects a few line they want to share. To help programmers make complete, compilable examples, CodeScoop detects errors and recommends fixes by (2a) pointing to potentially missing code and (2b) suggesting literal values from the program trace that can take the place of variables. (3) It also recommends code the programmer may have overlooked, like past variable uses and nearby control structures (from Figure 1, Head et al. CHI 2018)

Ethan Fast and his colleagues at Stanford developed Iris, a conversational agent which makes data science a chat and direct-manipulation process. The target audience for Iris is data scientists who know what they want to do (e.g. make a scatter plot with this data), but aren’t familiar with programming for data science. I really like this idea of helping computational social scientists, biologists, journalists, etc. do data science without the burden of program. If this tool were to be used to teach skills like data exploration or statistical testing, I would like to see more opportunities for learners to develop conceptual understanding of what’s happening “under the hood.”

Example dialogue with IRIS agent where a data scientist creates a scatterplot with data from 2 data frames (from Figure 2, Fast et al. CHI 2018)

Computational notebooks: Being able to annotate code doesn’t mean you will

Computational notebooks, such as Jupyter (formerly iPython) notebooks have become popular recently, especially among data science communities. These notebooks have the affordance of supporting annotations of blocks of code. Two papers investigated whether data scientists utilized this affordance to annotate. Spoiler: They don’t.

Mary Beth Kery (CMU) and her colleagues investigated whether data scientists use jupyter notebooks to produce narratives and keep a record of their experiments. This follows an older philosophy of literate programming (Knuth), where every line of code should be annotated. They found that data scientists used notebooks for experimenting, production pipeline, and as a sandbox. They found that annotations were underused by jupyter notebook users, suggesting that notebooks were not typically used to create a narrative. Instead, data scientists used notebooks following an expand-reduce process, where participants would try small snippets of code across different cells, then iterate by merging useful code and eliminating unnecessary code. This work suggests data scientists must iterate and curate their jupyter notebooks to make them presentable.

Adam Rule (UCSD) analyzed how computational notebooks (jupyter notebooks, academic notebooks) used text to describe analysis in his honorable mention paper 🏅. Similar to Mary Beth’s findings, they found that notebooks as a whole lacked text (see figure below). In a qualitative follow-up, they found that notebooks were typically not in a state where they could be shared without iteration. Taken as a whole, Mary Beth’s and Adam’s studies suggest that curating a computational notebook through iteration is typically required for it to explain a data analysis process.

Jupyter notebook length as measured by cells, lines of code, and words of markdown. While only 2.2% of notebooks had no code, 27.6% had no text. (from Figure 2, Rule et. al CHI 2018)

Closing Thoughts: A new SIG on computers & learning?

On the final day of the conference, I invited “anyone thinking about computing education” to meet up at break so people could meet each other and discuss the future of papers on learning at CHI. In total, 10 professors and students from Univ of Washington, Northwestern, New Castle London, Nebraska Omaha, and Auckland (New Zealand) met. Profs. Brian Dorn and Jason Yip noted that CHI is responding to an increase in papers relating to learning and is forming a paper reviewing subcommittee on families and learning for next year. We identified that CHI was an opportunity to disseminate findings on the design and understanding of learning experiences, and that more could be done to support the learning scientists. To better support the learning sciences within the HCI domain, we plan to create a special interest group (SIG) within CHI. Details forthcoming, but please reach out to me if you are interested in learning more! (pun definitely intended)

CHI community joining Choir! Choir! Choir! on stage to sing “Sweet Caroline.” Shivers. Literal. Shivers.

CHI’s large and diverse community has lots of open-minded researchers thinking about learning that you would otherwise miss because learning is not their primary emphasis (so they don’t submit to learning conferences). So to sum up, I encourage researchers thinking about learning and learner interactions with technology to engage with CHI because you will walk away with new perspectives and connections!