A figure from Benji Xie’s 2019 paper “A Theory of Instruction for Introductory Programming Skills”, which has formed the basis for much of our work on the grant in the past 3 years.

Studying programming language learning: a 3-year recap

Published in

Bits and Behavior

10 min readSep 14, 2020

Ever since I started writing proposals to government research foundations—primarily the National Science Foundation (NSF)—I’ve felt a strong civic responsibility to be as public as possible about what I’ve done with the money. NSF itself builds in some of this in its requirements, asking investigators to write short 500 word project outcome statements, which appear on its websites. But that format always seemed so limited and formal, preventing me from telling the full story behind a grant. To supplement this, I started blogging about grants that have ended, both as a way of telling this backstory, but also as a way of marking a milestone in my professional career, as grants tend to have a major impact on my ability to conduct research, mentor students, and have impact. So far, I’ve shared the outcomes of three major grants: an NSF Computing Education for the 21st Century grant, which I used to explore the design of coding tutorials; my NSF CAREER grant, which I used to explore software help seeking and co-found a software startup; and an NSF Human-Centered Computing grant, which I used to investigate tools to support exploratory programming.

Now, a fourth grant for which I am PI is ending, an NSF Cyberlearning grant, which I’ve used to deep dive into theories of programming knowledge and practical tools for developing assessments of programming knowledge. The origins of this project came directly from a prior NSF grant on coding tutorials, which we’d used to create Gidget. We’d focused intensely in that project on learner engagement, trying to discover factors and designs that would keep discretionary learners engaged online. At the time, I was still a novice CS education researcher, mostly approaching problems of teaching and learning from an HCI perspective. This led us to focus so much on engagement that we had left a major gap in our understanding of learning outcomes. I‘d also felt personally guilty about diving into a project with so little understanding of the CS education literature. The result of our grant was an exciting body of work on learners’ attention and the design of CS learning technologies, but little insight into the foundational problems of learning that we had engaged.

Understanding learning then, was my next goal. With my new Ph.D. students Benjamin Xie and Greg Nelson, we read the immense literature on learning to code, understanding its discoveries, but also its limitations. We were enticed by the many empirical observations in the community, showing that learners struggled, and what they struggled with. We also found many creative instructional interventions that were demonstrably effective. But what we didn’t find were foundational theories of basic questions: What is programming? How do we know that someone can do it? How might theories of programming inform instruction and assessment?

While Greg and Benji got to work tackling these questions in preliminary work, both funded by NSF Graduate Research Fellowships, I got to work fundraising to gather more resources to expand our team. Now, I should note that this approach of finding a problem I want to work on, then finding a funding opportunity to support it, isn’t a very efficient way of raising money or supporting a lab. If the goal is maximizing funding, it’s far more effective to monitor solicitations and write proposals for research that are likely to be funded. In well established areas of science, this doesn’t require compromise, because there are usually “core” solicitations that cover the entire span of an established field, so scientists can just propose to do the work they want to do and see if their peers and NSF want to support it. Unfortunately, this is not the case in CS education, where funding is erratic, distributed across many diverse and changing solicitations, and spanning many diverse academic communities. So I’ve found it best to really focus on what I want to do with my students, then find solicitations that might support it.

Back in 2017, when we were fundraising, the NSF Cyberlearning solicitation seemed liked the best fit. It was interested in basic research on learning, but also on computing innovations that would support CS learning. All that remained was trying to conceive of a 3-year project that would contribute meaningfully new learning innovation while supporting basic discovery. I carefully read the solicitation, went to webinars about the program, read abstracts of recently funded projects, and started brainstorming projects that would fit my goals, but also the solicitation’s goals.

Here’s what I came up with: one of the major problems that CS educators were facing was CS assessment. Assessments are hard to write, hard to make fair, and teachers often felt rudderless when creating them. What if we could automatically generate CS assessments that were reliable, valid, unbiased, and personalized? This would disrupt all of the conventions in educational assessment, which assumed existing banks of test items. That would not only address a major pain point in CS education, but doing so would also require foundational discoveries in theories of programming, as part of assessment validity is having some theory of what is being assessed. It would also require entirely new approaches to developing assessments.

Since I didn’t have significant assessment expertise, I went searching for collaborators. I wrote a colleague in our College of Education, Min Li, who had extensive expertise in educational measurement, particularly in science education. We met, I pitched my vague idea, and to my great luck, she was interested. She gave me some reading to do on the latest thinking on measurement and validity. After a bit of dialogue, we were able to write a three year research design that would innovate in CS assessments in a way that might transfer to other disciplines, while also deepening theoretical foundations in CS learning.

Foundations

As tends to happen with NSF grants, our work started off focused, but our discoveries led to many pivots. The most exciting to me were the foundational pivots. We began by trying to understand what programming knowledge consists of. One of our first discoveries was that programming knowledge consists of at least four distinct kinds of knowledge:

Reading programming language notations (e.g., understanding syntax and how it corresponds to semantics). This was related to program tracing skills.
Writing semantics in programming language notation. We found that this was significantly different from reading semantics, in that it was really a translation task, identifying patterns of computation in the mind, and finding constructs that would express them.
Reading templated patterns of computation in a programming language notation (e.g., recognizing that a series of statements was a swap, or that a loop was searching). This most closely corresponds to literature on program comprehension.
Writing patterns of computation in a programming language notation. This mostly closely corresponds to algorithm design.

Our research showed that using distinct methods of direct instruction for each of these activities seemed to help with learning.

But our foundations went beyond this basic programming knowledge. My Ph.D. student Kyle Thayer had joined the lab when the project started and he was interested in studying API learning. He found that API knowledge is distinct from programming knowledge, in that it seems to require three distinct kinds of knowledge:

Knowledge of the domain concepts an API models (e.g., understanding an API that analyzes natural language might require understanding linguistic ideas like “parts of speech” and “grammar”
Knowledge of the hidden semantic rules that govern API execution (e.g., what happens when update() is called on a React component).
Knowledge of patterns of API composition used to achieve particular goals (e.g., what is typically found scatted across Stack Overflow).

Kyle’s dissertation went on to explore ways of automatically identifying the third kind of knowledge to support learning. (This work is to appear in ACM Transactions on Computing Education).

Another new Ph.D. student, Alannah Oleson, began exploring the intersection between CS knowledge and Design knowledge, finding that there were actually two forms of design that learners engaged:

Program space design, which involved finding algorithms that would meet requirements. This was much like Benji’s fourth type of knowledge.
Problem space design, which involved finding ways of framing what problem is being solved in the world.

Alannah’s work observed that these two are often confused and interleaved in CS education, making them both harder to learn. (This work is to appear in ACM Transactions on Computing Education).

These three basic theoretical ideas about programming knowledge formed the basis of much of our other work, helping us focus our interventions and innovations on particular discrete types of knowledge. In fact, along the way, Greg got really interested in how we were using theories to inform our work, and even wrote an award-winning paper about the use of theories in CS education.

Learning technologies

While we were working on these foundations, we were also building prototypes to teach programming languages, which helped us shape our theoretical ideas. Greg worked on PLTutor, a JavaScript tutorial that taught syntax and semantics by focusing specifically on program tracing skill, finding that very low levels of granularity in direct instruction promoted rapid learning, especially by students with particularly weak prior knowledge.

Benji worked on Codeitz, a Python tutorial that embodied the distinctions he’d discovered with Greg on reading and writing. Benji recently shared Codeitz in a Learning at Scale paper, in which he investigated the extent to which promoting learners’ agency in selecting learning tasks influenced learning (it didn’t).

Greg built upon these ideas, and seminal work on validity in educational measurement by Kane, developing a technique for procedurally generating valid assessments of program tracing knowledge. What was particularly notable about this paper was the extent to which the work bridged theories of validity with theoretical ideas of programming language semantics.

Assessment Validity

As we developed these foundations and explored innovations, Min’s Ph.D. student Matt Davidson began to explore the application of our theories and methods in educational measurement to CS. With Benji, he published a paper demonstrating the use of item response theory to explore validity. He recently submitted a similar paper demonstrating the use of DIF analysis to identify item bias. And his ongoing work considers the impact of item features on validity.

Throughout all of this work, the lab explored numerous aspects of sociocultural context and their impact on assessments. We studied CS transfer student experiences, finding that academic performance of transfer students often took an initial drop while students adjusted to their new social and academic environments. We studied informal CS peer mentorship amongst high school students, finding that assessments were a distraction from interest and identity development. We studied career aspirations, finding that assessments can warp how students see authentic practice (this work is in review). And we’ve recently studied how the COVID-19 pandemic is shaping the self-assessment that adult learners are doing when learning programming languages online (this work is ongoing).

Impact

The academic impact of this three years of work has been substantial. We’ve published 17 peer-reviewed conference and journal papers, with 5 more in review; we’ve published 3 book chapters; two of my Ph.D. students have graduated and secured faculty positions; five of our undergraduates have pursued doctoral studies; and, to date, the publications emerging from this project have been cited over 200 times in just 3 years.

But the impact has gone well beyond academia. As part of our research, we have taught three summer camps as part of the federally-funded Upward Bound program. Codeitz was also used by a team of first generation UW students of color to introduce dozens of high school students of color to programming. These efforts have have reached more than 100 students to date, mostly low-income student of color who are the first in their family to attend college from Asian, Middle Eastern, Hispanic, and Black communities in Seattle. We have also collaborated with Code.org on many efforts to adapt our innovations and discoveries to it’s teacher professional development, which has the potential to reach millions of K-12 students learning CS around the United States.

And for me personally, this grant has been hugely impactful on my expertise and network. I know far more about CS learning and assessment than I did three years ago; I have new collaborators throughout the College of Education; and the project has been the basis for most of my contributions to CS education over the past three years. It’s given me the experience and confidence to feel like a genuine CS education expert, rather than a dilettante. It’s also given me the confidence to start thinking not only about CS learning and teaching, but CS teachers as well. In fact, that preparing CS educators will be the subject of my newest NSF grant, Justice-Focused Secondary Teacher CS Education.

What’s always striking to me about summarizing a grant’s impact is just how little money all of this cost. The project budget was a mere $16K a month, funding a bit of my salary, the stipend and tuition of two Ph.D. students, and the hourly pay of several undergraduates. This is small by NSF standards, and yet enhanced the learning of a hundred people directly, possibly millions indirectly, and is shaping the work of our research community. These impacts remind me just how significant even small research investment can be for society and the world.

References

Greg Nelson, Benjamin Xie, and Andrew J. Ko (2017). Comprehension First: Evaluating a Novel Pedagogy and Tutoring System for Program Tracing in CS1. ACM International Computing Education Research Conference (ICER), 2–11.

Amy J. Ko and Katie Davis (2017). Computing Mentorship in a Software Boomtown: Relationships to Adolescent Interest and Beliefs. ACM International Computing Education Research Conference (ICER), 236–244.

Kyle Thayer and Amy J. Ko (2017). Barriers Faced by Coding Bootcamp Students. ACM International Computing Education Research Conference (ICER), 245–253.

Harrison Kwik, Benjamin Xie, Amy J. Ko (2018). Experiences of Computer Science Transfer Students. ACM International Computing Education Research Conference (ICER), 115–123.

Greg L. Nelson, Amy J. Ko (2018). On Use of Theory in Computing Education Research. ACM International Computing Education Research Conference (ICER), 31–39.

Amy J. Ko, Leanne Hwa, Katie Davis, and Jason Yip (2018). Informal Mentoring of Adolescents about Computing: Relationships, Roles, Qualities, and Impact. ACM Technical Symposium on Computer Science Education (SIGCSE), Research Track, 236–244.

Benjamin Xie, Greg Nelson, and Amy J. Ko (2018). An Explicit Strategy to Scaffold Novice Program Tracing. ACM Technical Symposium on Computer Science Education (SIGCSE), Research Track, 344–349.

Greg L. Nelson, Andrew Hu, Benjamin Xie, Amy J. Ko (2019). Towards Validity for a Formative Assessment for Language-Specific Program Tracing Skills. ACM Koli Calling International Conference on Computing Education Research, 1–10.

Benjamin Xie, Dastyni Loksa, Greg L. Nelson, Matthew J. Davidson, Dongsheng Dong, Harrison Kwik, Alex Hui Tan, Leanne Hwa, Min Li, Andrew J. Ko (2019). A Theory of Instruction for Introductory Programming Skills. Computer Science Education, 49 pages.

Benjamin Xie, Matthew J. Davidson, Min Li, Amy J. Ko (2019). An Item Response Theory Evaluation of a Language-Independent CS1 Knowledge Assessment. ACM Technical Symposium on Computer Science Education (SIGCSE), Research Track, 699–705.

Dastyni Loksa, Benjamin Xie, Harrison Kwik, Amy J. Ko (2020). Investigating Novices’ In Situ Reflections on Their Programming Process. ACM Technical Symposium on Computer Science Education (SIGCSE), Research Track, 149–155.

Benjamin Xie, Greg L. Nelson, Harshitha Akkaraju, William Kwok, Amy J. Ko (2020). The Effect of Informing Agency in Self-Directed Online Learning Environments. ACM Learning at Scale (L@S), 77–89.