Data Science in Public Policy

4 min readJun 24, 2019

I recently accepted a Data Science Fellowship at the Flatiron School in Washington D.C. For 4 months, I will be immersed in data science and machine learning, honing my skills in Python and SQL and deepening my understanding of linear algebra and statistical analysis. But, why would a policy analyst need to study data science?

Over the course of my public policy graduate program at the University of Chicago, I learned not only what public policy is and what good policy looks like, but — more significantly — I discovered a new way to organize information and think about the world.

The ability to collect and analyze data in an unbiased fashion is important when determining a course of action. Ensuring that you truly understand your data — including the biases that it reflects — becomes imperative when your decisions have the power to impact people’s lives.

My interest in data science stems from my need to ensure that — as a policy analyst — I have a way to systematically collect, process and analyze information so that I can do the most good while hurting the least number of people.

So, what is public policy?

Public policy — in its simplest form — is a collection of rules that govern our daily lives. A simple(ish) example of a public policy issue is the question of what members of the public are and are not allowed to do in a national park. National parks are for everyone’s enjoyment, but preferred activities vary from person to person. For example, many people enjoy riding ATVs through parks while others enjoy taking peaceful hikes. Many hikers dislikes hearing the sound of ATVs rumbling through the parks and would like the ATVs banned. The question of whether we allow the use of ATVs in national parks is a public policy issue. Whatever decision we make will govern what everyone can do within our national parks system. So, what is the correct decision? It is this question — how to choose the best policy to govern everyone — that we focused on at Harris.

What role does data science play in public policy?

“Social impact, down to a science.” This is the motto at the University of Chicago Harris School of Public Policy. My classmates often debated the merits of our motto. Some criticized the school’s emphasis on statistics and economics to the neglect of political theory and policy courses. According to one classmate in particular, Harris was at best attempting to “teach us empathy through math”, and at worst, trying to turn us into number crunching automatons.

I always appreciated the motto. Empathy is incredibly important when writing policy, but one cannot rely on empathy alone to decide the best course of action when deciding policy issues. In order to dive a bit deeper into this question, let’s take a look at a very simplified version of a policy issue that is commonly discussed.

Prison Recidivisms Intervention Programs

The difficulties that prisoners face when re-entering the community are well documented. There is a wealth of information on best-practices for helping to ensure that people that are released from prison have both a smooth reintegration into the community and avoid returning to prison.

Let’s say — for the sake of our example — that a number of policy analysts in the fictional state of Crystalport have taken a look at the research on re-entry programs in other states and have decided to implement a program to help people in their state. Because of limited resources, the state is unable to place all prisoners into reentry programs, so the analysts need to come up with criteria to decide which prisoners will be enrolled in the new program.

In our case, the policy analysts — in concert with statisticians — have built a model that predicts a “recidivism risk score” (note: this on its own is no small feat). With this model we are able to rank those from most at risk to least at risk of recidivism and separate them into three groups:

1. High Risk: The majority will return to prison without an intervention.

2. Medium Risk: More than 50% will return to prison without an intervention.

3. Low Risk: A small number will return to prison without an intervention.

Now, the question is who should be enrolled in the program? One common response is that the group at the highest risk for recidivism should receive the intervention. This is a reasonable response from the information that we were provided, but let’s say that the analysts dig a bit deeper into data from programs in other states that have been running these programs for a number of years, and find the additional information provided below:

1. High Risk: The majority of people in this group responds poorly to the re-entry program.

2. Medium Risk: The majority of people in this group responds well to the re-entry program.

3. Low Risk: This group is unlikely to return to prison even without the intervention.

Now — with the knowledge from the additional data — which group would you say should be enrolled in the reentry program? It looks as if providing the intervention to those in the medium risk group may directly help the greatest number of people avoid returning to prison, while the majority of those in the high-risk group may return to prison regardless of their participation in the reentry program.

Closing Remarks

So, what does this mean for the Crystalport reentry program? Should the state avoid enrolling those prisoners who are at highest risk? It depends, many programs provide services to those that are at highest risk, knowing the outcome numbers may be lower. Some programs measure success purely on the number of people that they successfully keep from returning to prison within a certain number of years. No choice is strictly correct, but, when we are making decisions that will affect people's lives, it is important that we understand the likely outcome of those choices.

Data Science in Public Policy

Written by Misha Berrien