Unintended Consequences

Justin Wetz
IBM Design
Published in
7 min readJul 11, 2018

“Ill-conceived mathematical models now micromanage the economy, from advertising to prisons.”

– Cathy O’Neil “Weapons of Math Destruction”

On the trail

Michelle McNamara was zeroing in. She was writing a book about a series of assaults, rapes and murders that happened in California during the 1970s and 80s. She doggedly pursued clues for years and was close to naming a suspect. She renewed interest in the monster she named “The Golden State Killer” which would eventually lead to his arrest a couple of years later, but Michelle wouldn’t live to see the killer come to justice.

In the middle of writing the book, after spending countless hours and sleepless nights researching and writing, she needed to rest. She took her prescribed drugs (Xanax, Aderall and Fentanyl) and went to sleep. Her husband took their daughter to school the next morning, intending to let his wife get the rest she needed.

After picking up a coffee for her on the way back home, he set it on her night stand and expected she would wake up soon. After a couple of hours he checked on her again and found her unresponsive and not breathing. Michelle had died at age 46 from a combination of how her prescription drugs had interacted with an undiagnosed heart condition.

Fired by an algorithm

In her book Weapons of Mass Destruction, Cathy O’Neil shares the story of fifth-grade teacher Sarah Wysocki. Parents and colleagues agreed she was a great teacher.

Regardless of what seemed to be above average performance from her students on standardized tests, she was targeted for dismissal because a newly instituted algorithm placed her class scores in the bottom tier, largely based on test results from the previous year.

O’Neil explains that this algorithm did take some variables into account, but not enough to accurately determine who was a “good teacher” and who was not. There was also evidence that teachers in the previous classes had edited the students tests before they were scored (by literally erasing and re-filling answers) and the algorithm definitely did not account for this possible variable in it’s outcome.

What’s the connection?

Both of these women were effected in life changing ways by an unintended consequence of a well-intentioned effort to help. In McNamara’s case, her doctor had prescribed these drugs in good faith trusting she would take the right dosages (there is no evidence to show she took more than the prescribed amounts) not knowing of course about her undiagnosed heart condition.

In Wysocki’s case, an effort to increase performance in underperforming schools by identifying poor teachers zeroed in on a teacher who was by all accounts performing very well, because the algorithm didn’t take into account all possible variables.

In each case an unintended, life-changing consequence occurred because of unknown variables. The difference between the two is how the consequences can be reported and what action can be taken to fix them.

Largely untested and unquestioned

Like gods, these mathematical models were opaque, their workings invisible to all but the highest priests in their domain…

– Cathy O’Neil “Weapons of Math Destruction”

In Wysocki’s case she had an extraordinarily difficult time finding out how the algorithm that recommended she be fired actually came to that recommendation. No one in her school district knew how it worked, just that it was “very complicated.” It was built by a company that had no stake in whether it performed correctly or not, yet the district still trusted it without a shadow of a doubt. There was no recourse for Sarah Wysocki to appeal her case or even report what happened and why she thought it was wrong.

Here’s where the two cases differ.

For McNamara, even though she had lost her life, the event of her prescription drugs interacting unexpectedly with her undiagnosed heart condition can be reported, logged and tracked. In the drug safety industry this is called an “adverse event” and they are sent to the maker of the drug who has a legal responsibility to process the report, determine if the event was serious and not expected, and then to report that event to a regulatory body.

Even further, these reports must be aggregated and pre-emptive attempts to find possible new adverse events must be taken and ideally stop them before they happen.

Pharmacovigilance

Inside each drug company there is a unit called Pharmacovigilance (PV). This business unit is responsible for understanding the benefits and risks of their drugs, intake and processing of adverse event reports and communicating this information to regulatory bodies and the public (through a drug’s label directions and recommended uses).

Even though this unit is a mandatory requirement for all drug companies, the people working in it have a vested interest in keeping patients safe and actively seek out ways to get more reports of adverse events so they can better understand the benefits and risks of their drugs.

The PV reporting workflow

PV is not a perfect system by any means, but it does offer a model to apply to how algorithms, AIs and the companies that make them can be more accountable. Both industries ask for a level of trust from the people using their products, both industries make products that can have life-changing consequences but right now only one must offer a viable feedback channel and is expectd to improve their products based on real data.

Feedback

“But you cannot appeal to a WMD. That’s part of their fearsome power. They do not listen. Nor do they bend. They’re deaf not only to charm, threats, and cajoling but also to logic — even when there is good reason to question the data that feeds their conclusions.”

– Cathy O’Neil “Weapons of Math Destruction”

In the Wysocki case, the company that made the algorithm for assessing teachers did a poor job of accounting for the variables that should go into a decision like this. However, I think it’s an unrealistic expectation that a group of people can think of every possible scenario of something of this complexity before actually introducing it to a larger audience.

Most algorithms and AIs are tested with a small set of data, sometimes real, sometimes made up before it’s released on the world. This is in stark contrast with how drugs are developed. Drug scientists come up with an initial idea for a potentially beneficial molecule that then requires testing through increasingly larger groups of patients. Of course even with these tests, the drug will have unexpected reactions with the potentially billions of people who could take it when it’s released to the world at large.

Proposed feedback and maintenance for algorithm makers

Imagine if a drug company could release a drug and no matter how many patient deaths resulted from taking it, would never have to address the problems or take it off the market. They would never even have to hear from the patients who took it and had a minor problem like a rash or a headache. They could just deliver the drug and drop out of the discussion.

This is the situation companies creating complex algorithms and AIs are in today. They can deliver their contracted work, collect their check and never have to update a thing.

Transparency

Drug labels are confusing. There is too much information printed in too small print and when you pick up your prescription there’s always a booklet of at least 10 pages stapled to the bag that covers every single possible side effect. If you read all of it you may not want to take your prescription at all.

However, it does have valuable information. How much you should take, and how often. How many refills you get. Who it’s prescribed for (you, hopefully). What drug is contained in the bottle. What the intended effects are, and more importantly the warning signs of side effects you should call your doctor about.

When someone is interacting with an algorithm (or one step further, an AI) most of the time they’re not even aware of it. A pill bottle (even one without a label) has a structure and experience around it that let’s you know you’re taking a drug. With AI, we’re letting someone else grind up our pills and slip them into our oatmeal without our knowledge, consent or a way to report feeling weird after eating it.

For algorithms that have life-changing consequences we should do everything in our power to let people know what is an expected outcome of their interaction and what is an “adverse event”. And especially what they can do about it if they become a victim of an AI.

Testing

Most makers of algorithms and AIs don’t do sufficient testing before releasing them into the wild. There should be more testing done at earlier stages with increasingly complex data sets, not just to understand the possible pitfalls of what’s being made but to also understand possible positive effects.

A well-known instance of this in the drug realm is Viagra. Initially, Viagra was hoped to be a cure for baldness. While it had marginal success at that, it was discovered through clinical trials that it was more adept at the things we know it for today and the intended purpose of the drug was changed. Now the benefit and risk of the drug can be tracked against the new intended outcome.

We can’t be expected to account for everything before releasing an algorithm into the wild, but built in feedback mechanisms can help makers continue refining until the algorithm has a level of expected consistency.

Conclusion

These are not meant to be perfect parallels, but a lot can be learned from how drug companies are held responsible for the life-changing effects of the products they make.

Companies creating AIs don’t have the same incentive right now, but we should investigate some of the practices in pharmaceuticals and PV and how they could mitigate the negative outcomes of unexpected AI consequences.

--

--

Justin Wetz
IBM Design

Senior Design Lead at IBM Watson Health, focused on AI for drug discovery and patient safety