The Ethical AI Startup Ecosystem

A deep dive into the startups devoted to detecting and mitigating bias within the data science lifecycle — and four predictions for the future.

Abhinav Raghunathan
7 min readJan 20, 2022

--

Ethical and responsible AI is all the rage these days, and rightfully so. We’ve seen somewhat of an explosion in the number of companies focusing on related issues in recent years, perhaps spurred by literature and talks from people like Michael Kearns (UPenn), Cathy O’Neil (ORCAA), Joy Buolamwini (MIT), Latanya Sweeney (Harvard), and many others.

Increasing interest in algorithmic bias and responsible AI / explainable AI (XAI). Data from Google Trends. Created by author.

The companies involved in this space appear at all stages of the data science lifecycle and beyond. Bias can, of course, originate in many places. There are even more niche startups that focus on mitigating bias in specific verticals (think neobanks or HR analytics companies).

Here’s a breakdown of the ethical AI/algorithmic bias aversion space and some of the key players within.

Examples of bias/XAI startups involved at each stage of the data science life cycle. Created by author.

Data Sourcing

Data defines models and, therefore, impacts action on a large scale. One of the most meaningful tropes in this space is “garbage in, garbage out.” Companies in this area of the data science lifecycle focus primarily on bringing in good data and prioritizing ethical data mining. Airbloc is a startup from South Korea that allows enterprises to share data for mining/sourcing purposes in a way that preserves privacy. Co:census is an SMS survey tool that prioritizes ethical data collection and minimizes sampling bias. Datomize is part of a subgroup of sourcing called “synthetic data generation,” in which data isn’t collected, it’s artificially generated. Tools like these are the first step in building ethical algorithms — they ensure that we are not simply dumping garbage into our models.

Data Processing

After the data is sourced, it must be processed in a way that both ensures privacy and gives good model results. There are a plethora of companies focusing on anonymization (think of converting an “age” variable from a number (e.g. 22) to a range (e.g. 20–25)). Some of…

--

--

Abhinav Raghunathan

Engineering @UTAustin. Data Science @UofPenn. @TEDTalks speaker. I’m all about ethical data science and its many applications (www.abhiraghunathan.com).