Why Do We Measure? And When Did It Start? A Brief History of Public-Sector Measurement and Evaluation

Harvard Ash Center
Innovations in Government
5 min readMar 10, 2016

Data-driven policy. Managing for results. Evidence-based programming. We hear these phrases in the policy community constantly but what do they mean? And, when did this drive for evidence become so popular among social scientists and policymakers?

What Is Public Sector Measurement and Evaluation?

The policy community uses all three phrases above to refer to data collection systems that allow policymakers to continuously learn from the social programs they run and to make incremental improvements. Tracking metrics allows government and nonprofit managers to make sure the social programs they run are having their intended effect. Metrics also allow for mid-course corrections, so that public managers can adjust programs to achieve better results for beneficiaries and to deliver a higher return on investment for taxpayers.

Measurement (sometimes called “monitoring”) happens during program implementation itself, whereas evaluation happens after the program has concluded, often to see if the policy intervention had a statistically significant effect on social outcomes. Sometimes you’ll hear measurement and evaluation referred to as M&E, which refers to both the measurement tools used and to the learning feedback cycle itself. To learn more about how measurement and evaluation fit into today’s public-sector program cycle, check out this video from USAID, which goes over the basics in layman’s terms.

What Is Evidence-Based Policy?

Measurement sounds great in theory, but how can we use data to move the needle on societal outcomes? Evidence-based policy means that policymakers are gathering and using rigorous evidence to decide on which social programs to support — evidence that either statistically supports (or disproves) a given policy intervention.

Data-driven organizations use metrics to ensure their current work is effective, whereas evidence-based programming requires policymakers to move beyond internal data into external program evaluations and research studies. And, not just any evidence will suffice! Social scientists consider the best policy evidence to be random — meaning that program participants are randomly selected from a population and then sorted into treatment and control groups. The treatment group participates in the policy intervention or program and the control group does not (both groups usually consent to participate). This type of evaluation design is called a Randomized Control Trial, or RCT, and is considered the gold standard research method in proving causal impact.

RCTs have long been popular in the medical community, to test everything from vaccine effectiveness to the first scientific treatments for tuberculosis and scurvy. But, when did rigorous evidence collecting through RCTs become the norm for the public sector, too?

A Brief History of Evidence-Based Policy

Since the early 20 thcentury, social science researchers have modeled their research designs after the RCTs of their medical counterparts — and have sometimes even anticipated the work of more laboratory-based statisticians!

W.A. McCall, an education professor at the Teachers College of Columbia University, was one of the first social scientists to consider using the scientific method to improve society. McCall’s groundbreaking book, How to Experiment in Education (1923), paved the way for progressive-era educators looking to increase efficiency in schools. McCall and other education researchers of this era created a series of mental acuity and achievement assessments, which just goes to show how old the idea of standardized testing is! McCall was at the forefront of using measurement to achieve social goals, in this case improving students’ educational achievement. In his view, “all the abilities and virtues for which education is consciously striving can be measured… better than they have ever been [before].” In other words, measurement is here to stay.

In the 1920s and 30s, social scientists became increasingly integrated with government institutions and with private charity. The Rockefeller Memorial, an influential humanitarian institution of its time, helped to start the Social Science Research Council, an organization committed to public measurement for policy solutions. Charles Merriam, political scientist and first president of the SSRC, helped write an analytical report for President Hoover, titled “Recent Trends in the United States,” in 1933. Many later SSRC reports heavily influenced President Roosevelt’s New Deal programs. FDR also built the professional foundation for public measurement: by the end of World War II, the federal government employed more than 15,000 social scientists.

In 1969, applied social scientist Donald Campbell coined the term “ the experimenting society,” referring to the discourse between social scientists and policymakers as they identify effective social reforms suitable for wide-scale use. Campbell is considered by many to be the grandfather of program evaluation.

Fast-forwarding to 2008, President Obama fully committed his administration to using measurement to promote social progress. In his new book Show Me the Evidence, Brookings scholar Ron Haskins describes how the Obama administration not only made funding decisions based on existing evidence, but also demanded that current federal programs undergo rigorous evaluation to ensure that every dollar spent generates more evidence for future policymakers. Results for America, founded in 2012, is another bipartisan effort that brings together nonprofit managers and policymakers from all levels of government to move evidence to the forefront of policy decisions.

Where Do We Go From Here?

Public-sector measurement is clearly a much older concept than we tend to think it is! You may be wondering why the evidence agenda hasn’t gained more traction in Congress or why all social programs aren’t evaluated through rigorous RCTs. The truth is that rigorous evaluations are expensive to run, tough to manage, and sometimes politically unpalatable. If you want to learn more about the challenges of public measurement, stay tuned for my next post!

About the Author

Melissa Bender is a current MPP student at Harvard Kennedy School focusing on innovation and program evaluation in the education space. She’ll be writing on what works in K-12 education, both domestically and internationally, as well as on the growing demand for policy-relevant evidence. Prior to HKS, she worked at a USAID contractor where she supported program evaluations on a range of international development issues. She also ran a literacy tutoring program at an elementary school in Washington, D.C. for Reading Partners, a national literacy non-profit. She considers herself a closet wonk, given that her favorite book is Moneyball for Government and her favorite podcast is Vox’s policy show, “The Weeds.” She tweets in between classes (and sometimes for live HKS/HGSE events) @mbender17. Outside of HKS, she loves hiking, traveling and volunteering with K12 students. She is a proud AmeriCorps alumna and holds a BA in both International Politics and Russian from the University of Virginia.

Originally published on the Ash Center’s Government Innovators Network.

--

--

Harvard Ash Center
Innovations in Government

Research center and think tank at Harvard Kennedy School. Here to talk about democracy, government innovation, and Asia public policy.