Blended assessment through retrieval practice: an equity imperative

Equity in education is one of the most compelling moral issues of our time, but the gulf between our approach to equity and the strategies that actually promote equity appears to be growing. One can scarcely spend an hour at a staff meeting without having a “courageous conversation”, “unpacking” something, or constantly re-litigating identity, privilege, bias, and race issues. My experience with this framework is that it rarely leads to productive discussions about what we’re actually doing to improve teaching and learning for the students who most need our attention. Indeed, the greatest affront to equity is that we continue to promote the use of ineffective “feel-good” instructional and assessment strategies (Kirschner, Sweller, & Clark, 2006) in the schools serving our most disadvantaged student populations.

I teach chemistry at a high school where the poverty rate is nearly 90%. I survey my students at the beginning of every school year to get to know them as individuals, but also to construct a profile of the student body as a whole. The statistics are striking: they report working an average 7.5 hours per week for pay, and 11 hours on unpaid duties like caring for siblings, both of which are significant risk factors for school failure (Hammond, Linton, Smink, & Drew 2007). Many of my students have out-of-school responsibilities totalling 40 or more hours every week. Despite these and other formidable obstacles, 85% plan to pursue higher education after graduation. If we are to keep that door open for students, it is essential that we hold them to high academic standards while also improving the graduation rate; one must not come at the expense of the other. How, then, can we best support meaningful academic success for our most at-risk students?

What is “blended assessment”?

Blended learning — a mode of instruction in which some elements are delivered face-to-face and some online — yields superior student outcomes relative to either traditional classroom-based instruction alone or online learning alone (Means et al. 2009), but it remains unclear specifically which component is responsible for these gains. I contend that supplementing effective in-person instruction with a robust online assessment platform is the combination most likely to produce measurable growth without compromising academic standards. I call this approach “blended assessment”.

With so many existing online learning models, why introduce yet another? I am concerned that the current blended learning paradigm effectively serves as a smokescreen for practices that undermine equity: increased class sizes, reduced student access to expert instructors, and funneling of public education funds to for-profit software and curriculum companies (McRae 2015).

I do not believe that blended learning models with heavy emphasis on online instruction are effective. Despite a paucity of evidence, this approach is a common feature of alternative high school and “credit recovery” schemes. I recently had a new student transfer into my chemistry class midway through the fall from a charter school with all online delivery and no science teacher. Not surprisingly, he told me he’d learned nothing there, and a quick diagnostic quiz proved him correct. Nevertheless, after 20 minutes of tutoring, he had a firm grasp on most of the foundational first-quarter content. If the instruction delivered by computers is not the piece most responsible for student growth from blended learning, then it suggests that online assessments may be doing the heavy lifting. This view is consistent with an abundance of literature from the past 20 years on the science of learning. Despite the frequent lofty claims for blended learning (personalization for everyone, 21st-century skills, etc.), something much less sexy — but much more powerful — is at work under the hood in a blended assessment classroom. It’s called retrieval practice.

Characteristics of the online assessment platform and why they work

My quizzes and tests are on a Moodle server and include the following features, which are absent from many commercial software packages often used for online learning:

  • They end the distinction between formative and summative assessment, and between assessment and practice. Traditional education theory demarcates formative assessment, which is intended to give student and instructor a snapshot of current performance and to guide instructional decisions, from summative assessment, which is done to evaluate a student’s learning and assign a grade. This distinction has outlived its usefulness in the digital age. Students who take online quizzes can re-attempt them many times over without any extra time commitment on the part of the teacher, and they can receive immediate feedback rather than waiting for papers to be graded. Scores on these quizzes can and should be used to calculate a student’s grade, with the explicit understanding that the grade is fluid and may always be improved if the student invests additional effort and demonstrates growth. This makes every quiz attempt low-stakes for the student in that quiz failure does not have adverse grade consequences. This model can properly be described as “retrieval practice”. The US Department of Education’s Institute of Education Sciences recommends “quizzing with active retrieval of information at all phases of the learning process to exploit the ability of retrieval directly to facilitate long-lasting memory traces” (Pashler et al., 2007), noting that quiz-based retrieval practice is one of the few learning strategies for which there is “strong” evidence of efficacy. Agarwal (2017), in a review of three studies ranging from middle school to medical school, noted a significant benefit to long-term retention, with effect sizes greater than d = .80. Although the quiz/test format is not the only way to promote retrieval practice, it has the advantage of encouraging maximum effort from students (who naturally want to perform well on anything that’s considered an assessment) and providing data that can inform grading and response to intervention.
  • Multiple assessment attempts on the same course material. More and more schools are implementing the practice known as standards-based grading (or “proficiency-based grading”, or “assessment for learning”), which awards no grade points for homework and behavior but also requires that students be given multiple attempts to demonstrate proficiency. Clymer and Wiliam (2007) contrast this modality with traditional grading systems:
Grades based on the accumulation of points over time are counter-productive for several reasons. First, this approach encourages shallow learning. In most classrooms, if students forget something that they have previously been assessed on, they get to keep the grade. When students understand that it’s what they know by the end of the marking period that counts, they are forced to engage with the material at a much deeper level. Second, not altering grades in light of new evidence of learning sends the message that the assessment is really a measure of aptitude rather than achievement. Students who think they will do well will engage in the assessments to prove how smart they are, whereas students who think that they are likely to fail will disengage. When assessment is dynamic, however, all students can improve. They come to see ability as incremental instead of fixed; they learn that smart is not something you are — it’s something you become.

Here we see a student’s first attempt at a quiz on what is considered by most chemistry teachers to be very difficult material for most high school students — the quantum mechanical model of the atom:

The student improved his score significantly after a few more attempts, and the timestamp indicates this one was probably completed at home:

Additionally, the data afforded by multiple reassessments provides for analysis of interventions or changes in instruction; for example, in 2017 I used this data to evaluate a new approach I used for teaching chemical formulas:

  • Considerable variation in assessment items between students and between attempts. Numeric questions make use of Moodle’s “calculated” question type, in which random numbers within set limits are generated for each question, and student responses are evaluated using a teacher-supplied formula and a margin of error. Further variation is incorporated by using an item bank with multiple variants of each question type, which are sampled at random every time a new assessment attempt is generated. Two students sitting next to each other will have completely different quizzes, and a student re-attempting a quiz will have it regenerated from scratch.

In this example, the subscripts (5, 11, and 3) are replaced with random numbers every time the question is attempted. Additional variation is generated by asking for the percent mass of hydrogen or oxygen instead of carbon.

Another example illustrates some of the unlimited variation built into a different question type:

I have seen students attempt this problem a dozen (or more!) times throughout a school year without ever encountering the same question, or even the same format.

  • Throwback questions. Every quiz and exam is cumulative, with a typical assessment consisting of around 20% items from past units. This results in assessments that are both interleaved (involving a diverse set of problems from across the curriculum) and spaced through the year on an expanding schedule (i.e., with more elapsed time between subsequent reviews of a given concept, because the chances of any particular question type appearing diminish as more content is included). This combination of spaced rehearsal and interleaved practice is supported by robust findings in cognitive science (Mozer et al. 2009; Rohrer 2014; Cepeda et al. 2006)
  • Teacher feedback involving worked examples. Students are encouraged to ask for help following an unsuccessful quiz attempt. When they do, I provide them with a detailed solution to each problem they solved incorrectly. Students know that re-attempted quizzes will contain entirely new problems, so memorizing the answer will be no use — they must understand how the solution is arrived at.
  • Lab work. Blended assessment can be applied to practical exercises as well. I have written “quiz questions” that actually serve to check students’ pre-lab calculations and results. In this example, I prepared potassium chloride solutions with a range of concentrations and used a conductivity tester to generate a standard curve; then, I programmed Moodle to use the curve to check students’ empirical values. This allows every student to work on their own unique lab problem, and to receive instant feedback on their results — and most importantly, to re-attempt the procedure an unlimited number of times (with a different random assigned concentration each time) until they are satisfied with the results.

Students then receive feedback depending on the accuracy of their results, ranging from this:

To this:

Another implementation of this strategy provides students with immediate feedback on each step of their lab calculations, and checks their final results for accuracy:

  • Accessible anywhere. Students can log on to the class website from any device; it works well with smartphones. They have the option to work on assessments at home, at the library, on the bus, while waiting in line, etc. In the past, fears of the “digital divide” exacerbating equity issues may have prevented serious consideration of blended learning, but today, 92.5% of my students report owning their own smartphone, and 95% are able to access the Internet at home. Many lower-income students are saddled with responsibilities or inconveniences (public transit, etc.) that prevent them from meeting with teachers outside the school day to re-attempt assessments. The blended assessment strategy bridges this gap.

How It Advances Equity

Blended assessment should be considered a viable equity strategy in secondary education for a number of reasons. Most importantly, additional learning time and multiple assessment opportunities can be provided to a much greater extent than is possible with traditional pencil-and-paper assessment. This disproportionately benefits disadvantaged categories of learners, including English language learners, immigrants, refugees, students with IEPs, and students who are below grade level in their math skills. (My own anecdotal classroom experience is that immigrant students often score below proficient on first attempts, but enthusiastically seek help and retake quizzes, sometimes accumulating ten or more attempts on a single learning objective.) Using retrieval practice as both a learning strategy and a low-stakes assessment strategy also promotes a growth mindset, which has a protective effect against poverty (Claro et al. 2016). Students with lower working memory capacity also benefit disproportionately from retrieval practice (Agarwal 2017), and so do students with test anxiety (including, potentially, stereotype-threat-induced anxiety) (Smith 2017, Agarwal 2014).

The benefits of retrieval practice and proficiency-based grading are clear, and we can use free and open-source technology to do it; no expensive contracts or proprietary software are necessary. Equity demands that we start leveraging this high-yield strategy immediately.

References

Agarwal, P. K. (2018). Retrieval practice improves learning more than reviewing classroom content. Retrieved from https://www.retrievalpractice.org/research/

Agarwal, P. K., D’Antonio, L., Roediger, H. L., McDermott, K. B., & McDaniel, M. A. (2014). Classroom-based programs of retrieval practice reduce middle school and high school students’ test anxiety. Journal of Applied Research in Memory and Cognition, 3(3), 131–139.

Agarwal, P. K., Finley, J. R., Rose, N. S., & Roediger III, H. L. (2017). Benefits from retrieval practice are greater for students with lower working memory capacity. Memory, 25(6), 764–771.

Cepeda, N. J., Pashler, H., Vul, E., Wixted, J. T., & Rohrer, D. (2006). Distributed practice in verbal recall tasks: A review and quantitative synthesis. Psychological Bulletin, 132, 354–380.

Claro, S., Paunesku, D., & Dweck, C. S. (2016). Growth mindset tempers the effects of poverty on academic achievement. Proceedings of the National Academy of Sciences, 113(31), 8664–8668.

Clymer, J. B., & Wiliam, D. (2007). Improving the way we grade science. Educational Leadership, 64, 19.

Hammond, C., Linton, D., Smink, J., & Drew, S. (2007). Dropout risk factors and exemplary programs: A technical report. National Dropout Prevention Center/Network (NDPC/N).

Kirschner, P. A., Sweller, J., & Clark, R. E. (2006). Why minimal guidance during instruction does not work: An analysis of the failure of constructivist, discovery, problem-based, experiential, and inquiry-based teaching. Educational psychologist, 41(2), 75–86.

McRae, P. (2015). Myth: Blended learning is the next ed-tech revolution. Alberta Teachers Association Magazine, 95(4).

Means, B., Toyama, Y., Murphy, R., Bakia, M., & Jones, K. (2009). Evaluation of evidence-based practices in online learning: A meta-analysis and review of online learning studies. US Department of Education.

Mozer, M. C., Pashler, H., Cepeda, N., Lindsey, R., & Vul, E. (2009). Predicting the optimal spacing of study: A multiscale context model of memory. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, & A. Culotta (Eds.), Advances in neural information processing systems 22 (pp. 1321–1329). La Jolla, CA: NIPS Foundation.

Pashler, H., Bain, P. M., Bottge, B. A., Graesser, A., Koedinger, K., McDaniel, M., & Metcalfe, J. (2007). Organizing Instruction and Study to Improve Student Learning. IES Practice Guide. NCER 2007–2004. National Center for Education Research.

Rohrer, D., Dedrick, R. F., & Burgess, K. (2014). The benefit of interleaved mathematics practice is not limited to superficially similar kinds of problems. Psychonomic Bulletin & Review, 21, 1323–1330.

Smith, A. M., Floerke, V. A., & Thomas, A. K. (2016). Retrieval practice protects memory against acute stress. Science, 354(6315), 1046–1048.