Scalable teaching principles from educational psychology and cognitive science

Learning how to write programs, or at least how to “think computationally,” has gained national attention in the last year. Last September, New York City Mayor Bill de Blasio announced that all public schools in NYC will be required to offer computer science to all students by 2025. In January, the White House released its “Computer Science For All” initiative, “offering every student the hands-on computer science and math classes that make them job-ready on day one” (President Obama, 2016 State of the Union Address).

Schools like MIT, Stanford, Berkeley and the University of Washington have expanded to accommodate hundreds or thousands of students in a single programming class. Many more complete exercises on tutorial websites like Codecademy, Kahn Academy, and Code School or enroll in coding schools and bootcamps like General Assembly and Hackbright Academy. One-on-one tutoring is considered a gold standard in education, and the teacher-to-student ratio is only getting worse.

As we develop new tools, techniques, and curriculum to serve this incoming wave of students, it is important to be grounded in, or at least knowledgeable of, the work that researchers in educational psychology have been doing for decades: teasing apart what it takes to learn something efficiently and well.

There are many factors inside and outside the classroom that have significant effects on learning but are beyond our consideration, e.g., peer groups and home environment (see Walberg for a more complete list, with effect sizes). There is an entire literature on developing sustainable communities of practice that foster student development and mastery, much the way a traditional judo dojo operates. There is also great work on how identity formation can help budding experts persist and thrive in a learning environment. For a high level treatment of both those literatures as they pertain to engineering education, read this brief handbook for practitioners: Education theories on learning: an informal guide for the engineering education scholar.

However, I will focus on a few theories and concepts that specifically help teachers develop more effective presentations of information and exercises for practice. This short-list of ideas, techniques, and benchmarks from educational psychology have guided my own development of tools for teachers teaching hundreds or thousands of programming students at once.

A Gold Standard

The intuition that a good personal tutor is best was solidified when, in 1984, Prof. Benjamin Bloom published a collection of evidence from his lab that confirmed its superiority. His graduate students found that elementary and middle school students who received tutoring in groups of one to three students were, on average, two standard deviations better than the after student in a conventional 30 student class after just 11 sessions of instruction in probability or cartography. Prof. Benjamin Bloom challenged the academic community to find a method of group instruction that was as good as one-on-one tutoring. That challenge, issued over thirty years ago, still stands as a benchmark that modern systems and techniques compare against.

One-on-One Tutoring Practices

What are tutors doing that makes them so effective? Wood and Tanner have written a summary of what we currently understand about the mechanisms that propel tutored students to their positions two standard deviations above their conventionally taught peers. Effective tutors often have many of the characteristics of Lepper and Wolverton’s INSPIRE model (summarized in this table): superior domain and pedagogical content knowledge, nurturing and encouraging relationships with students, Socratic styles that prompt students to explain and generalize, progressive content delivery, and full of feedback on solutions not students. One insight stands out: the value of self-explanation for fostering student learning.


Effective tutoring elicits self-explanations — explanations generated by the student for themselves (Chi et al. ‘94). These self-explanations foster the integration of new knowledge. Students of tutors who fostered self-explanations by asking prompting questions like “Why?” and “How?” had learning gains similar to those whose tutors gave explanations, feedback, and extra information (Chi et al. ‘01)

Deliberate Practice

Prof. K. Anders Ericsson is one of the foremost experts on how learners can efficiently acquire domain-specific knowledge and skills, like those necessary for becoming an effective programmer. Based on his work, Malcolm Gladwell popularized the “10,000-Hour Rule” in his book Outliers. However, if you ask Ericsson (as the hosts of Freakonomics did), Gladwell simplified it too much. Ten thousand hours or ten years is a ballpark figure for how long it can take to reach expert level, given that the student practices “deliberately.”

Deliberate practice is generally accepted to be goal directed, effortful, not enjoyable, repetitive, accompanied by rapid feedback, and only sustained as long as the learner can be fully concentrated on the task, i.e., no more than a few hours (Deliberate Practice). For example, rather than just playing pickup basketball games in the neighborhood, an aspiring professional player might design specific drills to work on his/her weaknesses. Teachers help facilitate deliberate practice, because they can design appropriate exercises and provide feedback until the student can differentiate between good and bad performance and provide that feedback to themselves.

Recent work incorporating deliberate practice in large classrooms has demonstrated great benefits. A recent study of undergraduate physics classrooms found that, with deliberate practice as a base of the instructional design, improvements can approach and exceed Bloom’s 2-sigma threshold.

Zone of Proximal Development and Scaffolding

The concept of the zone of proximal development or ZPD was first introduced in the mid 1920’s by the Soviet pyschologist Lev Vygotsky. It refers to the gap between what a learner can do without help and what a learner cannot yet do, no matter how much help they are given. It is implied that an object of learning strictly outside the ZPD is either too easy or too hard, and little or no learning will occur.

Roughly fifty years later, Wood et al. introduced a complementary process called scaffolding. Scaffolding enables a learner to “solve a problem, carry out a task or achieve a goal which would [otherwise] be beyond his unassisted efforts” because the teacher controls the aspects of the task that are initially outside the learner’s abilities. Recent work suggests that the maximum learning gains come from giving students the hardest possible tasks they are able, with the assistance of scaffolding, to complete.

Beyond Tutoring to Ways Of Thinking

Studies of tutors and their students help us identify characteristics and styles of interaction that help explain the effectiveness of tutoring. Some of these can be successfully deployed in large classrooms. However, the way we frame content can also have large effects on how students understand, generalize, and transfer their learning to new contexts.


Concrete examples of an object of learning — like how to apply an appropriate statistical test in a statistics word problem — vary in ways that are superficial, e.g., irrelevant, and fundamental, e.g., relevant. In the language of educational psychologists, these are often called surface and structural features. (See Quilici and Mayer’s study on this very challenge.) A simple compare and contrast exercise when solving equations (Rittle-Johnson & Star) or examining case studies in negotiation (Loewenstein et al.) can bring this variation to the fore, and yield learning benefits.

Learning in the presence of variation in these features helps learners generalize and transfer their knowledge to new situations, such as better transfer of geometric problem solving skills (Variability of Practice). Several educational models, e.g. Variation Theory and the 4C/ID Model, build on the value of variability by suggesting specific ways for how it should be deployed in the classroom.

Analogical Learning

Analogies are central to human cognition; they can help learners understand and transfer knowledge and skills to new situations. Analogical learning is at play both when learners have a base of knowledge that they bring with them to a novel target and when they compare two partially understood situations that can illuminate each other, serving as both a source and recipient of information (Kurtz et al., Loewenstein et al.). Kolodner suggests creating software tools that align examples to facilitate analogical learning.

However, in order to reap the full benefits of analogical learning, learners must engage deeply. Reading two cases, serially, in a session is not enough; learners will not necessarily make the necessary connections unless there are explicit instructions to compare (Loewenstein et al., Catrambone and Holyoak).

Analogical learning can be very difficult. For example, the structural features may be aligned between a base and the new target situation, but large differences in surface features will hurt the learner’s ability to see any connection (Kurtz et al.). This may be explained by how our memories work. It appears that, for novices, the most reliable form of retrieval is based on surface similarity, not deep analogical similarity; experts can more easily retrieve situations that are structurally similar and therefore more relevant for a new situation at hand (Loewenstein et al.).

Variation Theory, discussed next, is specifically designed to help students more deeply appreciate structural features, which may help them transfer their learning to new situations instead of feeling lost, confused by superficial differences.

Ference Marton’s Variation Theory

“He cannot, England know, who knows England only.”

This aphorism, found in the forward of Mun Ling Lo’s book Variation Theory and the Improvement of Teaching and Learning, captures the ideas at the heart of Variation Theory (VT) well. VT is relatively new, and is still being investigated for its usefulness in a broad range of disciplines, including mathematics and computer science.

VT is concerned with the way in which students are taught from concrete examples. It asserts that human learning can suffer from overfitting for some of the same reasons that machine learning algorithms do. Overfitting is a term I am borrowing from the machine learning community. If a machine learning algorithm “thinks” the key difference between photos of cats and dogs is the color of the sky (which is obviously unrelated to distinguishing between photos of cats and dogs), then a possible explanation is that the algorithm was trained on photos with insufficient or biased variation. If all the cat photos it ever saw were taken on a cloudy day, and all the dog photos it ever saw were taken on a sunny day, could you blame this naive program for latching on to this obvious differentiator of housepet species?

Humans can make the same inferential mistake when not exposed to a sufficient variety of examples of an object of learning. VT catalogues a hierarchy of patterns of variation designed to immunize the learner to this kind of mistake.

More abstractly, VT is built on the understanding that learning is not possible without being able to discern what the object of learning is (Marton and Booth). Discernment is not possible without experiencing variation in the object of learning and the world in which its situated (Marton et al.). This variation is described in terms of aspects and features. An aspect refers to a dimension of variation, and a feature is a value of that dimension of variation (Lo). Some features are irrelevant, while critical features collectively define the object of learning.

For a more concrete discussion of variation, consider the following examples:

  1. The phrase “a heavy object” might not make sense to the reader unless they have interacted with objects of various weights.
  2. Consider a child who recently learned how to add numbers, but always starts with the larger number: 2+1=3, 4+2=6, etc. Asking the child to add the numbers in the opposite order, smaller number first, and verify that the result is the same introduces the commutative feature of addition.
  3. No matter how wildly a cup diverges from a prototypical example of a tea cup, if it does not have the critical feature of being able to hold something, it is not a cup.

Marton et al. identify four patterns of variation: contrast, separation, generalization, and fusion. If a child is learning the concept of “three”, then contrast refers to being introduced to three apples, as well as a pair of apples, or a dozen. Generalization refers to being introduced to different groups of three: three apples, three dogs, three beaches, and three languages. This clarifies that it is not the apples that give “three” its meaning. Separation refers to a pattern of examples that helps the learner separate a dimension of variation from other dimensions of variation. A child could be introduced to a litter of nearly identical puppies that only differ in coat color, for example. Fusion is the final pattern of variation, where the learner is exposed to examples that vary along all the dimensions of variation at once, since this is most commonly encountered in the real world. These patterns of variation are intended to reveal which aspects of a concept or phenomenon are superficial and irrelevant and which are innate and critical to its definition.

VT is a framework that has guided teaching materials and been used as an analytic framework in a variety of contexts, including lessons on critical reading, vocabulary learning, the color of light, mathematics, chemistry, Laplace transforms, supply and demand and computing education. It has been the subject of a government-funded three-year longitudinal study in Hong Kong, with promising results. While we do not yet have data that directly compares the effectiveness of VT-inspired lessons to those delivered by personal tutors, programming teachers can keep Variation Theory in mind when they teach concepts, like recursion.

Wrap Up

Learning is, and probably always will be, hard. Learning to program is no different. Helping students master an an empowering skill like programming can be immensely satisfying. However, teaching or creating tools for teachers and students without a firm grounding in the rich literature from educational psychology and cognitive science can make learning more difficult than it already is. More generally, whether you are mentoring a new employee at work, teaching a classroom full of students, or teaching your child a new skill, these insights may help you share humanity’s knowledge and skills.