The Data Science Smell Test
[disclosure: I work for the Open Society Foundations. All views my own. Co-drafted with Patrick Ball, Research Director of the Human Rights Data Analysis Group.]
data science is a big deal. Civil society data doesn’t always make the grade for data scientists to work with. I think there’s some ecosystem thinking to be done about how to better collaborate between willing data scientists, and the implications data work has for areas of social justice — in particular, the protection and promotion of human rights. Here’s a first pass at a “smell-test” to see if using data science is a good idea.
1. How can you assure that the data you have is adequate to answer your question?
2. Does the method you are choosing to use give you an answer that answers a substantive question?
3. Does the combination of your question, data, and methodology connect to a relevant human rights question?
4. F-measures that are considered excellent in academic and industry contexts still leave many, many false positive and false negative classifications in the results. How do to those errors impact the human rights argument you are making? How does the precision and recall measure of your outcome affect the human rights argumentation you are trying to make? How will imperfect precision and recall affect the real life human rights problem? Can you make an effective and compelling case for mitigating imperfect f-measures?