Should we build Affirmative Algorithms?

Anna Marie Clifton
5 min readAug 8, 2017

“Affirmative action: an action or policy favoring those who tend to suffer from discrimination, especially in relation to employment or education; positive discrimination.”

For the past several months I’ve been considering the ethical implications of some technical advancements. I’ve written this post to elucidate the questions I think we need to ask, and why they matter. This post was written for non-technical and technical audiences alike.

If you, or anyone you know, has been thinking about these questions as well, please let me know in a response below or on twitter. Thank you.

Affirmative Action acknowledges that humans are flawed and biased. We are likely to favor people who look like historically successful people (straight, white guys). So large, organizing bodies (usually governments) set policies that help those who are different have equal opportunities.

The way Affirmative Action functions:

  • Governments (or other organizing bodies) set up restrictions for how…
  • Entities (like corporations and schools) make decisions to enforce that they give opportunities to…
  • Individuals who are otherwise disadvantaged.

But something has been subtly shifting over time…. Now there’s an additional piece:

  • Algorithms — the brains we program into our computers to make decisions.

Instead of making their own decisions, entities use algorithms (often designed by outside organizations) to make decisions about individuals.

You may say: Well, governments should tell entities to use “fair algorithms” and make sure they give opportunities to individuals.

But what makes an algorithm fair?

Sidebar 1: A brief note on Algorithms

Algorithms make things more efficient.

Netflix can either pay a few engineers to maintain an algorithm that instantly suggests movies to watch or pay thousands of people to hand select that for you.

Amazon can either pay a few engineers to maintain an algorithm that instantly suggests books to buy or pay thousands of people to hand select that for you.

Lyft can either pay a few engineers to maintain an algorithm that connects you with drivers nearby or pay thousands of people to hand select that for you.

Because Netflix, Amazon and Lyft use computer brains to make these suggestions, they can make a lot more money from selling to you than companies that pay human brains to make those decisions.

With great efficiency, comes great opportunity to make money.

Now, on to our hypothetical “fair” algorithm

Say you’re an enterprising person, and want to take advantage of the economic gains made possible by algorithms. You may want to build one and make ALL THE MONEY™ … but how do you start this process?

Well, you have to get data to build the algorithm in the first place.

Here’s a common path:

  1. Have an idea for a product.
    Example: a recommendation engine to effortlessly surface qualified job candidates to recruiters.
  2. Build an intermediary product.
    Example: a platform where you pay “curators” to look at job posts and scour the world for qualified candidates. Then send resumes for those candidates to recruiters.
  3. Collect data from this process.
    Example: build a database full of (1) job descriptions, (2) the candidate resumes you picked, and (3) which of those resumes the recruiters liked.
  4. Use that data to make the algorithm you wanted to make in the first place.
    Example: a recommendation engine to effortlessly surface qualified job candidates to recruiters.
  5. Fire all your curators, and capitalize on the profit from your cheap computer brain making high-value decisions.

Sidebar 2: How does this training data become a computer brain?

Engineers give computers lots and lots of data about which resumes recruiters picked and which ones they didn’t.

What you might think a computer would do is make a list based on this data:

  • If they have 80% of the keywords in their resume, give them 2 points.
  • If they have at least three jobs with related experience, give them 5 points.
  • If they went to one of these 14 colleges, give them 2.8 points.
  • … etc.

Then engineers can tally up and see which is the best resume to show a recruiter.

What computers actually do with this information is make themselves a secret brain that can take a resume and in one, single calculation tell you if a recruiter will want to see it or not.

But there’s a catch

All along, we were building this based on what recruiters did, and recruiters are biased… because they’re human.

Here are just two examples:

  1. Recruiters looking for engineers are less likely to call a woman with the same resume as a man (source).
  2. Recruiters are less likely to call someone with a stereotypically African-American name than someone with a more Caucasian sounding name (source).

Without realizing it, we have included this bias in our computer brain.

So, I ask you, is the algorithm fair?

Hard to say.

Technically, we built our algorithm based on what recruiters told us to show them.

Is it our fault that recruiters prefer to look at white, male resumes?

Most people would agree that a system that didn’t show recruiters any candidates named “John” is unfair. But what about a system that only slightly downgrades people named “John”?

What if resume #32905 would have been the first one on the list if it were “Jane”, but because it’s “John” we show them on the second page?

Some questions start percolating in this space:

  1. How do we even determine that our system actually is biased against “John”s?
  2. Once we know that our system is biased against “John”s, how do we measure by how much?
  3. And, trickiest yet, how do we even know where to look for that bias against “John”s? We have lots of intuition and studies around gender bias and racial bias, but what about the bias we aren’t even looking for?

Training data is everywhere, and so is our bias

Back to the beginning: Should we build Affirmative Algorithms?

Interestingly enough, we already know how to modify our systems, and we can use this technique to mathematically offset for known bias. (See this paper or more information on how one team approaches that.)

But should we?

Should we offset our inputs to account for bias that we know about? By how much? Should we simply squelch where we can see an unfair system, or should we make our systems “unfair” to the benefit of disadvantaged people?

The algorithms that decide today are based on data of biased humans from yesterday.

Should we intentionally modify our algorithms to build for a tomorrow that we want?

And if so, who should decide what we want tomorrow to look like?

--

--