Ethics + Data Science

4 min readSep 10, 2018

How much has data changed our lives over the past decade? Just over 10 years ago the iphone was launched. Back then, our phones took grainy photos and video was just wishful thinking. It was still weird to buy shoes over the internet and we still had to carry stacks of maps when we visited a new city. And Netflix was only a DVD company.

Now, your phones take photos and videos, with more than 4,000 photos uploaded to Facebook every second and more than 400 hours of video uploaded to YouTube every minute. We worry more about having connectivity than ensuring we have a map. And our mapping apps give us real-time traffic and options to navigate through traffic. Don’t want to drive? No problem, use a ride-sharing app that leverages trillions of data points. The fundamental shift behind this radical change is a combination of massive increases in computational power, storage, and data. And of course data scientists, designers, and other technologists that make these ideas real.

The transformation due to data is just starting. We’re about to go from sequencing the human genome to enabling tailored medical treatments (precision medicine). Autonomous vehicles have started to appear on our roads and we’ll see efforts to build cargo ships and airplanes. And artificial intelligence has shown new ways to think about games as they’ve beat the best humans.

At the same time, we’ve seen data used to cause harm though a combination of negligence, naivety, and sophisticated attacks. From the U.S. elections, Brexit, accidents from self-driving cars, to racist algorithms; we must expect to see the harm from data to increase. And we only beginning to come to grips with the social impacts due to job displacement from automation.

The question that we need to address is what can we do to ensure that data and technology work for us rather than against us?

The stickers that White House Chief of Staff Team would hand out

There are the regulatory approaches from the European Union (GDPR) and California (CCPA). And hearings from the U.S. Congress (with no action). There have been books that have highlighted the risks ahead, such as Weapons of Math Destruction and Automating Inequality. And there have been new think tanks such as the Partnership on AI, the AI Now Institute, and the Center of Humane Technology that have launched to begin to understand the broader implications to society.

I’m a fan of these efforts. They are very much needed. The question I want to ask is: what about the data scientist and the rest of the team that are responsible for building these technologies? What is their role in implementing “good” data science? (Also discussed in this post.)

When Hilary Mason and I published Data Driven: Creating a Data Culture, we realized that little was being done to empower the people who want to do what’s right. These technologists, designers, and product managers might have the right natural instincts, but are often sidelined due to business pressures or suboptimal practices. And in other cases don’t even know what questions to ask.

Together with Hilary Mason and Mike Loukides (who has been our editor on every book and we finally convinced to be a co-author) we took a look at the best practices that we’ve seen across the data scientists and have released a new e-book: Ethics and Data Science.

Given how much we expect to change on this topic, we think of this book like an open source project and this is the 0.1 release. We’ve also make sure that it will always be free and under a Creative Commons Licence (so you can take it and put it with your own efforts). We also want others to consider contributing and we’ll be posting those updates on O’Reilly Radar’s Ethics Series. We’ve also intentionally kept it as short as we possibly could with the hope that you’ll be able to share it with other teams. (You can get all of our other ebooks for free here.)

What can you expect to find in the book? We cover what a model checklist (modeled after the Checklist Manifesto from Atul Gawande) for building data products is (if you’re working on something similar, we definitely want to hear from you on what works and what doesn’t). Ideas about how to implement more ethical behaviors in product development process including a dissent channel if you disagree with the team. How we can start interviewing talent for cultural fit as well as ethical fit. Also what we call the 5 C’s — five framing guidelines help us think about building data products (consent, clarity, consistency, control & transparency, and consequences & harm). Finally, we’ve included a set of case studies from Ed Felten’s team at Princeton for you and your team to work through.

Most of all we want to hear from you. The impact of data is data is happening now, and we need to get head of it. It starts with us. Those that are building these technologies. Get the ebook here.

Let’s roll. No one is coming, it’s up to us.

-dj

Ethics + Data Science

Written by dj patil