Upgrade and become a member of Medium to listen to an audio version of this story.
Learn more

Musings at the Intersection of Data Science and Public Policy

Source: The Economist

“Interesting… What does data science have to do with public policy?”

This is a common response when I tell people that I am a Master’s student in Computational Analysis and Public Policy. I’m still working on crafting the perfect, concise answer to this question. In the meantime, here are a few thoughts and observations on the subject that I have gathered during my ventures thus far into this world.

Note: I will be referring to “data science” in its broadest sense, covering all aspects of the practice of drawing meaningful insights from large amounts of data.

Data has been doing plenty of good in the policy world

Just look at Data Science for Social Good (DSSG), the summer fellowship run by the great people at Center for Data Science and Public Policy at the University of Chicago. In collaboration with governments and non-profit organizations, DSSG project teams apply data science methods to policy areas such as education, public health, and social services.

Despite significant challenges, such as data quality concerns and the existence of siloed departments, practitioners have found ways to integrate data science into the policy arena. City governments have expanded the scope of their analytics teams to include more data science capabilities. With New York, Chicago, and Boston in the lead and other cities following suit, these data teams are helping city agencies focus their efforts to serve constituents more effectively. Young companies, such as Civis Analytics and BlueLabs, offer data science consulting services to clients in government, advocacy, and politics. Members of the civic tech community (e.g. Chi Hack Night) are making use of increasingly accessible open government data to contribute to projects on their free time. The list of civic data use cases is quite extensive, but in short, data is helping us tackle pressing policy issues in a way that was not possible before.

But data science is not the solution to everything

At a recent data science conference, the keynote address seemed to set the atmosphere of the room— that data science is the best thing to ever happen to an organization and that other “outdated” analytical methods are just getting in the way. As much as I believe in the merits of big data and machine learning, I refuse to believe that data science is the silver bullet solution to all the world’s problems, especially not in the policy space.

The policy problems that data science attempts to address are interdisciplinary and complex. People have been working on these issues for generations, and we should aim to learn about the historical, legal, socioeconomic underpinnings of the policy area in question before presenting a data science solution. The nuances of reality are difficult to capture in words, not to mention models, so the best we can do is attempt to understand the subject matter from both a qualitative and quantitative perspective.

Let’s remember to think about the big picture before working out the details in the code. “So what” questions are often hardest to answer: How is data science helpful in this situation? What inherent assumptions and biases will be built into the model? Who will use it, and what will they do with it? Whose lives will be impacted by the outcome? Only then can we decide whether data science is the right tool and how it can best be leveraged.

The model is only the first step

As tempting as it is, we cannot bury ourselves in the numbers and then hand off the output to the decision-makers of the organization. Communicating the model accurately and effectively is just as important as building a strong analytical product. Model interpretability is a tricky topic that continues to get trickier as data science capabilities become ever more advanced. Nevertheless, non-technical collaborators need a base level understanding of the methods and conclusions of our models if we are to make any meaningful impact in public policy. We cannot expect people to blindly buy into what they don’t trust or comprehend.

We also must be able to speak the same language as statisticians, economists, and other social scientists, who are and will continue to be an important part of policy organizations. We should explore how data science aligns with practices of causal inference, program evaluation, and more traditional methods of policy analysis. Doing so will help data science gain more credibility in the public policy space.

Data science is a privilege

Let’s talk about representativeness in the field of data science. Data science helps us scale human mental models and decision making in an unprecedented way. These algorithmic models, when implemented in public policy, could potentially influence the livelihoods of millions of constituents. It is therefore crucial that the individuals designing them can represent and empathize with the people affected.

More representation of females and minorities in the field of data science will help identify and overcome unintentional biases in the algorithms. As we move into a world where technology reigns and innovations flourish, we need more representation to ensure that the digital age does not leave entire groups of people behind.

I hear that data science is a growing field, and I am grateful for the privilege of being able to study it as part of my pursuit to influence public policy and social change. And as they say, with great privilege comes great responsibility. So let’s keep on learning, sharing, questioning, and coding, without forgetting about the humans behind the numbers on our screens.