Beware the NAEP Overreactions: 4 Reasons Why Education Pundits Should Rein in the Rhetoric This Week
By Matt Barnum
O n Wednesday, the latest results of the National Assessment of Educational Progress (NAEP) — the gold standard, low-stakes test given to a representative cross section of 4th and 8th graders across the country — will be released. Virtually every news outlets, including The Seventy Four, will cover the numbers, which makes sense since the scores are important and have some limited value for policymakers. But if history is any guide, for one day the educational world will collectively lose its mind over NAEP, engaging in what Mathematica researcher Steve Glazerman calls “misNAEPery”.
Here are four reasons why NAEP results should be interpreted very cautiously:
Raw NAEP data can tell us NOTHING about which education policies are effective and which aren’t.
When the 2013 NAEP scores were released, Secretary of Education Arne Duncan pointed to the relatively large student “gains” in Washington, D.C. NAEP scores, saying, “Leaders in D.C. have shown tremendous courage and taken bold steps that are resulting in strong growth.” Commentators across the country, including the editorial boards of the New York Times, Washington Post, and Wall St Journal, picked up on this to suggest that D.C.’s NAEP gains (as well as Tennessee’s) were showing that school reform was working.
The basic reason is a wonky, boring but crucially important one: NAEP scores, on their own, offer no comparison (or “control”) group by which to judge specific policies or even packages of policies. Remember eighth-grade science class? To make causal inferences, there must be both a treatment group and a control group.
As an example, let’s say Wednesday brings good news in the form of higher NAEP scores. Reformers will claim their policies are working — but how do we know? Maybe scores would have been even higher if a different set of policies were pursued. Maybe scores went up for reasons entirely unrelated to reform policies. We simply can’t say.
Lots of things besides schools and education policies affect NAEP scores.
Student achievement is based on everything that has happened in a student’s life before taking the test.
We tend to think of schools as driving test scores because students take tests and formally learn academic content in schools. Indeed, schools have an extremely important impact on student learning, but out-of-school factors have an even greater effect on student test scores. This is yet another reason we can’t use NAEP to judge school policies. The many out-of-school factors driving achievement — the economy, access to healthcare, etc. — mean we can’t even be sure that changes in NAEP scores had anything to do with changes in schools.
Changes in NAEP scores are not actually “growth.”
In the coverage of NAEP scores, we will almost surely hear about some state whose students “showed the most growth.” For example, in 2013, the Washington Post reported that “the District [of Columbia]’s fourth- and eighth-graders made significant gains on national math and reading tests this year, posting increases that were among the city’s largest in the history of the exam.” This is not quite right, because the fourth- graders who took the test in 2013 are not the same fourth-graders who took the last NAEP years earlier. In other words, all we can say is that one group of students has a higher average score than a completely different group of students from a couple years ago.
This may seem like an academic point, but it raises yet another problem with trying to make inferences about policy based on NAEP: demographic changes among students tested may contribute to changes in average test scores. What look like ‘gains’ may just be differences in which students were tested.
Most people will use NAEP data to reiterate what they already believe — no matter what the data say.
I can guarantee that the NAEP results — regardless of what the actual data are — will be used by commentators to reinforce their previously held policies positions. That people will use the same data to reach opposite conclusion is an indication that we shouldn’t read too much into said data.
Advocates will surely declare “[State X, which had ‘good’ results] did [Policy Y, which I already like]; therefore everyone should do [Policy Y].” If scores show improvement reformers will say, “This shows our policies are working — full speed ahead!” If there aren’t improvement reformers will say, “This shows why our schools are in desperate need of reform — full speed ahead!”
Similarly reform skeptics will gleefully point to disappointing results as evidence that reform policies are failing. But if scores rise, they will declare that NAEP scores shouldn’t be taken seriously and that tests don’t much matter.
People believe what they believe; NAEP scores won’t — and frankly shouldn’t — change this. But can we just drop the charade?
This is not to say that NAEP scores are useless. They are genuinely important indicators about whether students across the country are learning more math and reading than past students. And although raw data cannot be used to judge specific policies or policymakers, it is absolutely reasonable to make hypotheses about policy that can then be tested rigorously.
In turn, NAEP scores have been used by researchers with careful, statistically rigorous designs to test the efficacy of certain policies. (For example, much of the research on No Child Left Behind uses NAEP data, but does so by creating controls and applying careful statistical analyses.) The key words here are statistically rigorous — an eyeball test does not count.
So, yes, although some rumors suggest that they’ll be lower, I hope NAEP scores go up on Wednesday. It will be nice to see and a hopeful sign for education reform and our country. But no, I won’t be using raw NAEP scores to judge the success of policies or politicians or to support the things I already believe — however tempting it might be.
Originally published at www.the74million.org.