I Want to Learn More About Responsible Data Science: A Recommended Reading List

Emily Hadley
RTI Center for Data Science and AI
6 min readNov 21, 2022

So you’re interested in responsible data science and want to learn more? Whether you’re a newcomer to the space or a seasoned data science practitioner, these resources can help deepen your knowledge and broaden your understanding.

I’m new to the responsible data science space — what should I start with?

Weapons of Math Destruction by Cathy O’Neil (2016) is an entry point and common reference in the responsible data science space. This text is accessible for both technical and non-technical contributors, and highlights a number of use cases where algorithms have been developed in a manner that maintains or amplifies various harms in society.

If films are more your style, check out Coded Bias (2020). This film features the journey of Joy Buolamwini and her research that uncovered racial bias in facial recognition algorithms.

I want to learn more about best practices for incorporating responsible data science approaches into my own work

The National Institute for Standards and Technology (NIST) is leading the way in the US on developing standards and toolkits for practitioners with Towards a Standard for Identifying and Managing Bias in Artificial Intelligence. Although a comprehensive legal framework for regulating algorithmic bias in the US remains likely remains far off, these tools are the best standardized approach for individuals and institutions interested in best practices related to responsible data science. The voluntarily AI Risk Management Framework may be of particular interest for practitioners using algorithms for decision making in high-consequence areas.

Human-Centered Data Science by Cecilia Aragon et al. (2022) is an informative text that considers numerous ways in which centering humans in the data science process can lead to more responsible development. Each chapter includes case studies with real world examples. The text thoroughly investigates the data science workflow and provides tangible suggestions for practitioners. It could be an appropriate choice for a course textbook.

Finally, check out Dealing with Bias and Fairness in Data Science Systems: A Practical Hands-On Tutorial (2022). This free resource is available on GitHub and has been presented for the Association of Advancement of Artificial Intelligence (AAAI) and Knowledge Discovery and Data Mining (KDD). This resource includes videos, slides, and interactive code tools to learn about bias and fairness in data science.

I’m concerned that (irresponsible) algorithms are harming society

Your concern is not unfounded. Check out The Black Box Society by Frank Pasquale for an investigation of the relationship of algorithms, reputation, search, finance, and more. First published in 2015, this book identifies major areas of concern that have been amplified in the last five years.

Tell me more about algorithms and public policy

Try Automating Inequality by Virginia Eubanks (2018). This book is an exploration of three case studies in the public sector related to automation, algorithms, and inequality: an algorithm for automating decisions related to Medicaid eligibility in Indiana, a system for matching LA homeless with housing, and an algorithm to predict likelihood of child abuse in Pittsburgh. Eubanks provides commentary on how automation and algorithms can violate human rights and privacy, including increased surveillance and disparate treatment of low-income individuals. This book coined the term “digital poorhouse”.

I do * a lot * of Google searches but I haven’t thought about how Google (or other search algorithms) might be biased

Check out Algorithms of Oppression by Safiya Noble (2018). This text challenges the widely held view of a Google search as neutral or similar to using a library, as well as highlighting the ways in which search algorithms can return racist and sexist results.

I’m concerned about how data visualization can propagate misinformation

Data visualization is a critical and often overlooked component of data science. How Charts Lie by Alberto Cairo (2019) explores how data visualization and figures have been used to mislead individuals and propagate misinformation. It provides real-world suggestions to readers and developers on how to critically interrogate and create better visualizations.

Another great resource for improved data visualizations is the free Do No Harm Guide: Applying Equity Awareness in Data Visualization produced by Urban Institute (2021). This resource dives into how equity can be considered in specific visualization choices such as color, order, and icons and includes broader commentary on incorporating equity awareness in the data visualization development process.

I am specifically interested in responsible data science related to gender and sex or race and ethnicity data

Race and ethnicity data and gender and sex data require particular attention in the field of responsible data science due to historic discrimination among these attributes. There are numerous examples of algorithms that have reproduced or amplified these inequities and contributed to discriminatory decisions that have impacted the lives of individuals.

Race After Technology by Ruha Benjamin (2019) explores how automation has the potential to hide, speed up, and even deepen discrimination, and how discriminatory designs in everyday apps and complex algorithms can amplify racial hierarchies. This text challenges readers to question the technologies they both use and develop.

The free Centering Racial Equity Throughout Data Integration toolkit prepared by the Actionable Intelligence for Social Policy center at the Annie E. Casey Foundation contains examples of positive and problematic practices for centering racial equity throughout the data science lifecycle. The toolkit includes activities, worksheets, and examples of the tool in action.

Invisible Women by Caroline Criado Perez (2019) exposes the data gender gap with numerous examples of real-world impact. This thoroughly researched and comprehensive text illuminates the need for intentional and robust data collection with attention to gender disaggregation.

If you’re interested in technical approaches for improved collection of gender identity, check out the free report Measuring Sex, Gender Identity, and Sexual Orientation (2022). This consensus study produced by the National Academies of Sciences, Engineering, and Medicine reviews major challenges in collection of sex, gender identity, and sexual orientation data, and suggests guidelines including specific questions that can be used within the general population to assess sexual orientation, identity, sex assigned at birth, and gender identity, as well as to identify people with transgender experience and intersex traits.

Other Recommended Reading Lists

Check out these other awesome recommended reading lists:

Thanks for taking the time to read this list. Feel free to recommend other resources in the comments.

This blog post is part of a Deep Dive into Responsible Data Science and AI series.

Disclaimer: Support for this blog series was provided by RTI International. The opinions expressed by the author are their own and do not represent the position or belief of RTI International. Material in this blog post series may be used for educational purposes. All other uses including reprinting, modifying, and publishing must obtain written consent.

--

--

Emily Hadley
RTI Center for Data Science and AI

Data Scientist | Enthusiastic about data, nature, and life in general