How data science failed us

When I was working toward my degrees, there was an unspoken belief that permeated all of the research and researchers I encountered: numbers reign. Even I bought into this belief, trusting that it was true, and focused on becoming a quantitative researcher. I thought that, by doing so, this made me a better, stronger researcher. But I soon learned that numbers aren’t objective; it may seem obvious to experienced statisticians, but as a young aspirational researcher, I wanted to do what the best researchers were acclaimed for. There was a clear divide, too, among the majority of my professors: the men in my field were more likely quantitative, and the women were more likely qualitative. Since this was in the realm of education, the men were usually those with administrative backgrounds, or had worked almost exclusively in higher education. The women often had experience as teachers in the K-12 system. The elitism and preferences were clear, even if it wasn’t ever challenged.

On a societal level, we all cling to numbers and stats as if they are more reliable and powerful than qualitative and mixed-methods research.

This bit us in the ass last night, especially for those of us who have spent our lives understanding and learning and appreciating the importance of data in all its beauty. I still believe in the research process above all else, but this collective “shock” that so many feel stems from an overreliance on methodology that wasn’t as thoughtful as we believed.

This isn’t to chastise my fellow data scientists; I think we’re all culpable, researchers and general public alike. However, we need to take a look at who the vast majority of data scientists are — overwhelmingly men, and white men, at that. Our data was flawed from the start, because methodology, despite the rigor of the process, is still created by humans. And the humans whom we relied on for accurate data didn’t take into account the many nuances in this election.

How do you quantify fear? Hate? Joy? Identity? Polls should be qualitative, coding tone and word use and context, rather than quantitative; reducing complex answers and reason down to binary answers.

Our valued researchers didn’t know how to quantify these nuances. It’s likely they weren’t even considered. As a result, we severely overestimated progress that many desire, and underestimated anger that many feel throughout the country. We believed the numbers told the truth, and were more accurate than what we heard on the streets and what we read on Facebook. We siloed ourselves and our data. I was worried when I drove through the town I went to high school in, and saw only Trump signs. But I assured myself that my observations, my privately-acquired qualitative data that I’m trained to collect and analyze, wasn’t substantiated by numbers. The beauty of research is that we should have studied what we saw, what others felt; we have systems that let us do that.

My plea, as a fellow researcher, is this: diversify research, both in people and methodology. Not solely in academia, either, but on a global level. A public effort to educate on how data and science works is long overdue. Methodology needs to be held accountable. Numbers need to have context. Mixed-methods needs to be the way of the future. We need to stop trusting that even the most intelligent white men have everyone’s best interests at heart, even when they yield data for good. Researchers who are women, who are people of color, who are queer, have a unique understanding of how to ask certain questions.

Because if we’ve realized anything, it’s this—numbers aren’t just abstractions on a page. They represent people. And too much is lost when we value some of those numbers over others.