What we talk about when we talk about fair AI

Published in

BBC News Labs

17 min readDec 11, 2017

BBC News Labs’ AI research engineer, Fionntán O’Donnell, has been thinking about how fairness and machine learning relate to his own work. Here he shares some of his personal views on ethical AI and its implications for news.

Over the coming years, we’ll see the BBC develop and roll-out more and more machine learning algorithms for storing, retrieving, tagging and possibly even creating content. Also the movement towards a more personalised service means our audience can navigate and discover the BBC output that matters most to them.

Though these will be hugely beneficial, both internally and to the audience, they also raise questions in regard to our responsibilties as a public broadcaster. What if we lock our audiences in to what they just want to hear, never challenging them? What data are our audiences comfortable with us collecting? Should the algorithms we use be open to public scrutiny? How do we maintain ethical standards across our products? The answers to these are not simple and they’ll affect teams all across the BBC from software to design to marketing to communications and even legal.

New technologies have always posed their own particular concerns. What’s different though with AI and machine learning is the rate, pace and sheer scale of the changes. From a niche market interest ten years ago, machine learning now affects the news, social lives, work and shopping habits of hundreds of millions, even billions, of people, with no letup in sight. And as these technologies continue to spread into new domains, we’ll see a growing number of unintended consequences and controversies system developers could never have predicted.

A few of the many, many examples out there include:

While getting angry about these news stories is often justified, we should also take a pause and think about the deeper questions they raise.

Take the case of Louise Kennedy, an Irish vet in Australia who failed an English oral test because of a speech recognition system.

Irish vet's English fails Oz visa test

An Irish vet with two university degrees has been told by a computerised test that her spoken English is not good…

www.bbc.co.uk

Obviously it strikes us as unfair that a native English speaker be denied a visa by a computer. But we can also ask further questions:

What other people have failed this test?
Are there troubling patterns among the false negatives?
Why was it decided that an automated system should be used at all?
What criteria were used to to choose the outsourced company?
What data did this company use to train their speech recognition system?
Can Louise bring a legal case against the government?

Through questions like these we can not only inform our own thinking but also help push for fair AI and demand better systems are built in future.

And that’s the aim of this article, to help you come away with a more involved idea of the AI fairness question. It’s not to tell you what is and is not fair but to give you some mental frameworks to think about how these problems should be addressed. To do this, I’ll review some of the research and discussion on what is meant by fair AI. Inspired by the FATML community, I’ll split the discussion into three sections: fairness, accountability, and transparency. While I try to be wide-ranging, these topics involve many domains — maths, law, engineering, government — and no one article can do it justice. The views in this field are extensive, wide-ranging, in-flux and even sometimes conflicting. This article is simply my attempt to give an introduction to a fascinating next stage in our relationship with AI.

Fairness

Courtesy of Laura Amaya from the Noun Project

We all want to be treated fairly don’t we? But what does it mean when it comes to AI systems? Does it mean Spotify should give me recommendations that are just as good as my friends’ recommendations? Does it mean Alexa should recognise Gaelic speakers as accurately as English speakers? Should Facebook examine what type of ads it shows to people of different incomes?

An excellent paper by researchers Michael Skirpan and Micha Gorelick in the FATML workshop looks at exactly this. They define the criteria of fairness as follows:

A machine learning system can only be fair with a contextual justification for the choice of a fairness construct and offering a channel for affected parties to actively assent or dissent to the fairness of the system.

The fairness construct they mention is a three-step process in which you ask a series of questions when building your machine learning system so that you can be confident you are building something fair. We’ll deal with each of these steps in turn.

1. First, is it fair to make this at all?

Often we see tech companies boast a “fail-fast” culture as they aim to be first to release new products and features. While admirable for speeding technological progress, the lack of ethical due diligence in their approach means these systems produce large and often disturbing side-effects.

This first step encourages teams to ask what are the ethical concerns of building this ML system. Questions like:

Does the system fit within the company’s ethics?
What would be the fallout if something went terribly wrong?
How do cultural differences affect the product?
What about privacy of the individual?
What about the cost of false negatives? Who would they tend to fall on?

All of these questions should be discussed, justified and documented.

An important part of this is to look beyond the engineers and invite different stakeholders across the organisation to the discussion. Legal, marketing, product and UX will all have their own unique insights into these topics.

The difficulty of this step varies greatly depending on what the machine learning product is trying to solve. For instance, reducing electricity bills is unlikely to harm any particular human but an AI that spots suicidal people on social networks is in extremely complex territory.

2. We’re going to make it, is there a fair technical approach?

Now that the organisation has decided the system will go ahead, the next part aims to include fairness in to the system during its design and production.

Here are just some of the questions to think about as the software and algorithms are developed:

Can we agree on who the affected groups are? (race? class? gender?)
Can we ensure balanced data on all affected groups?
Can we use interpretable AI models that explain themselves?
Can we guarantee users a right to an explanation of decisions?
Should we get external oversight for this project? Who?
What people internally should decide whether this system is fair?

Again, documenting the discussion and decisions on these is important. Were anything to go wrong it’ll help greatly during a retrospective analysis.

3. We’ve made it, how do we test if the system is fair?

This is the investigation of a finished machine learning product, checking it for fairness failures or bias. Testers can look at what gets misclassified — does it affect any group in particular? They can create qualitative reports on how the system performs and even add tests in to the software itself. These would be iterative and periodic as new data is collected and functionality changes. The hard part here is knowing what to assess for fairness. While you may have already agreed upon affected groups, there may be other unfair biases lurking you haven’t thought of.

AI and Accountability

Courtesy of Dan Hetteix from the Noun Project

Algorithms are often described in news headlines as if they have risen zombie-like out of Github and are now lurching around the earth forcing decisions on us hapless humans. Algorithms are sending us to prison!, algorithms are grading our teachers!, algorithms are the cause of populism!

There’s something very important to get across here: algorithms are not responsible, people are.

Algorithms exist within a code base, which exists within a software team, which exists within a company, which exists within a market, which exists within a legal framework within a system of governance. Much like we don’t hold chickens responsible for salmonella outbreaks, we should stop assigning blame to algorithms and shift it to the humans and organisations that are in charge of the software.

Independent Oversight

There are many industries where we’ve developed a legal accountability framework and created bodies to oversee and enact its laws. For centuries you could make and sell food without worrying about who ate it or what happened to them after. But thankfully, we now have food safety laws and governmental bodies like the Food Standards Agency. And indeed, we’ve seen similar agencies in areas as diverse as medicinal drugs, consumer protection, urban planning and airline safety. Each of these markets differ and have their own specificities but the best laws and practices are honed and improved over time. This is something we’d hope to do with the large-scale deployment of algorithms.

The important thing to note here is large-scale. Much in the way you can still have your own bake sale, this government body wouldn’t stop you cooking up your own neural network for research. But if you plan to release this product to millions of people, there needs to be some oversight body looking out for the public’s safety and harmony of the market. We’d see it as criminal if a construction company stuck up a skyscraper in a city without approval or a new drug was released without any testing. And so we should expect a similar level of oversight for algorithms that are deployed to the public at large.

In a recent talk at the Turing Institute Ben Schneiderman outlined how an algorithmic accountability body could work in practice. He discusses the points during a product’s lifecycle where a company would come into contact with this agency and gives examples of how this currently works in other domains. We’ll go through each of these now.

Image and accountability thoughts adapted from Ben Shneiderman’s excellent talk

Planning Oversight — Here you’d apply to a board, looking for a stamp of approval to go ahead with your AI system. The board would review the documentation, see where it could affect the public, raise possible problems and decide on whether to give the project the go-ahead. Perhaps they’d have their own stamp of approval that would be used in marketing — “AI safety rating of 10!”. We’d imagine some of the board’s work would be using the first step of the fairness construct we described earlier.

No doubt a lot of tech companies would see this as a slow and expensive process but it’s worth remembering that these boards protect both the consumer and the company. Stopping a drug company from releasing an unsafe drug is obviously good for the public but it also saves the company from potentially fatal lawsuits down the line. For instance, if an algorithm was detected as deeply gender biased before release, it’d save the tech company from future bad press, service downtime, massive data overhaul or other negative consequences.

Continuous Monitoring — Consider this the food-safety inspection step. A company has an AI product in place and they get a periodic visit to check its safety from their friendly AI safety-inspection team. The inspectors would come in and perform some standardised tests on the AI setup, similar to step 3 of our fairness construct: check if the data is fair, what or who is being unfairly judged by the algorithm, test the model with its own prepared data. If the company failed any of these tests, penalties would be enacted.

Retrospective Analysis — Like an airplane’s black box, this is the safety board going in after a disaster and investigating what went wrong. They’d inspect the code, audit the design decisions made, analyse the data and try and figure out why this disaster happened and who is responsible. The board would also work with law enforcement to decide who should be compensated and how offenders should be punished.

Transparency

Courtesy of David Swanson from the Noun Project

Or how the AI interacts with you. Take your typical automated news feed on Facebook, Google or Twitter. Do you feel you have a clear idea why you’re being recommended certain content? Do you feel you have control over what is being recommended? Do you know what personal data is being used? Have you given consent to how they use this data?

Sadly, the answer for users is quite often “No”. It’s common that people are not even aware there’s an algorithm being used at all. And when they do, they concoct all sorts of pet theories as to why certain content shows up.

There was a wonderful study asking participants why they think certain items appear in their Facebook feed. The paper’s authors defined a number of theories along with supporting interviewee quotes:

Theory of Loud and Quiet Friends: “If someone posts a lot on Facebook, then [you] will be more likely to see their posts.”
The Narcissus Theory: “I feel like people that I have sort of the least in common with are the ones I tend not to see very much.”
The Eye of Providence Theory: “Because I do news searches on world news, but that’s been through like CNN and Fox, not Facebook. But if Facebook is linked to Google, then Facebook is getting that search saying, ‘Hey, [they’re] looking for political news.”

What’s going on here is a chronic imbalance of information. These systems collect massive amounts of data on people, who in turn have no idea how this data is used and have no way to find out. This information imbalance is often referred to as as black-box algorithms. In popular culture, this usually means we have no idea what is going on inside of some AI software and can only see its output. But if we take the phrase black-box from other parts of software, it means a system whose internals we can’t view but whose inputs and outputs we can. This is certainly not the case in most large-scale AI systems. We often have no idea what data about us goes in to the algorithms to make its judgements. We only see the decisions that come out.

While we would want to make the workings of these systems more transparent, there is an important question here. Even if we entirely exposed the internals of the black-box, data and all, could we even understand how it works?

Algorithmic Opacity

The reasons for the lack of transparency with algorithms can be quite nuanced and differ greatly between different companies. In a research paper, Jenna Burrell of UC Berkeley breaks down the types of opacity into three categories:

Intentional Opacity: Frequently an organisation will deliberately conceal the workings of their AI system and block access to scrutiny from outsiders. This is often in the name of trade secrets, which can be extremely valuable intellectual property and a significant market advantage over competitors. It can also be governmental organisations who conceal their algorithms in the name of national security. Companies may also fear that people could “game the system”, manipulating the algorithm to work in their advantage. For instance, Google keeps its search algorithm protected as they don’t want undeserving sites to climb the rank of search results. In this way, they try to protect the quality of the service.

Illiterate Opacity: Even if we were to open up Facebook’s code to the public, would it be understandable to us? Not really. Often the inner workings of machine learning systems are opaque because only those with years of technical expertise understand how they work. Also, experts would need to investigate the huge amount of data that was used to train the system. The inner logic of the system is opaque to all apart from machine learning researchers with the time and skill to do a deep analysis.

Intrinsic Opacity: Similar to the Wittgenstein quote “If a lion could speak, we could not understand him”, Burrell discussses how there is sometimes an intrinsic mismatch between how humans and AI systems reason about the world. Deep neural nets are, to put it simply, an alien intelligence. While the human world is mostly that of anecdote and simple statistics, the AI system operates on a stupefyingly huge scale. It has so many millions of parameters and billions of data points that no human could ever understand how neural networks makes a specific decision at a specific time. Someone asking why Facebook didn’t show their cousin’s baby photo on February 12th is simply unanswerable by any human, no matter their efforts.

All of this may sound grim…

But there is some some good news! One is legal, the other is from the computer scientists themselves.

Interpretable Algorithms — A rapidly expanding field of machine learning research, this is the study of algorithms that help explain what is going on inside neural networks. The main intended applications are algorithm debugging, safety, and scientific enquiry for its own sake. Interpretable algorithms come in various forms, from simpler models explaining themselves (e.g. Random Forests), algorithms explaining other algorithms (LIME), visualisation tools (TensorBoard) and even algorithms that try to manipulate data so as to break other algorithms (DeepXplore).

It’s worth saying that some people doubt this is even possible, or even necessary — we would never get doctors to explain every neuron in their heads when they make a judgement, so why should we expect this for algorithms? For me, this ignores an intangible human factor: trust. While we may never fully understand how they operate, we need to have a justifiable sense of faith in the algorithm to do the right thing.

An important caveat to the academic work is that it is mainly focussed on explaining machine learning systems to machine learning researchers. The user or the layperson has been mostly left out. As such, there is a huge gap in UX research — understanding how people feel about AI explanations and what they expect from them. We’re not going to explain every unit in the neural network to every user but it might be that a simple one-line approximation is sufficient. Sadly, software designers and machine learning researchers mostly work in complete isolation from each other, having completely different goals and world views. But if more research were done around creative ways of communicating decisions to software audiences, it would be a great benefit. So if you’re a UX Designer or a User Researcher, consider getting involved in AI!

The GDPR — Drafted into UK law as the Data Protection Bill and coming in to effect next year, this piece of legislation will overhaul the EU’s laws on personal data. It gives the individual more control over how their data is used both by the data collectors and data processors (these can often be different companies). It also gives regulators the ability to impose large fines on non-complying actors.

The extension of rights to the individual include:

Right to be informed — You have the right to a clear, concise explanation over how and why organisations are using your personal data.
Right to access — You have the right to access all personal data collected about you free of charge within one month of the request.
Right to be forgotten — You have the right to withdraw your consent for your data to processed and for your data to be deleted.
Right on automated decisions — If a decision is made on you that has legal or other signifcant effects, you have the right to order that a human is involved in the decision making process.
Right to rectification —Should an organisation hold data on you that is inaccurate or incomplete, you have the right to ensure its correction.

How successful the GDPR will be in practice remains to be seen but it is a huge step in what we can and should demand from companies, both individually and as a society.

FAT AI and the Beeb

The BBC is driven by its public purposes, the first of these is “to provide impartial news and information to help people understand and engage with the world around them”. Technological advancements, for all their benefits, are simply a means to those ends.

Which is why we need to consider these purposes as we adopt new technologies. A future of machine learning and personalisation in our products is no doubt coming but we’ll also have to proceed with thought and care through the changes. Along the way, we should ask ourselves questions such as:

Should our personalised recommendations be transparent? Is this even technically feasible?
What tasks within journalists’ and editors’ workflows are okay to automate away?
How do we balance being a national broadcaster for everyone with individual personalisation of services?
If we use A/B testing in our digital products, should the audience have a right to know when they’re taking part in these tests?
How can we ensure that all AI systems we build pass our ethical criteria?
Do we set up an internal ethics committee?
How do we create metrics of success, beyond clicks or hours watched, that align with our public purposes?

No-one says answering any of these will be easy but it is something we’ll have to tackle in the coming years. We’ve started on this journey through our research, our engagement with the AI community, our journalism, our organisational strategy and more. We expect it of ourselves and so does our audience.

While initially there was much excitement about social media companies’ ability to connect people, in the past year or so we’ve seen a large shift in public opinion against these black-box AI systems that massively influence our lives. This has mainly reached the public in the form of scare and outrage stories in the popular news. The global scale and opacity of these algorithms often leaves us with a feeling that there is nothing to be done (except perhaps complain on these very same social platforms). But seemingly insurmountable problems in other areas, such as tobacco and food safety, have brought us great benefits like anti-smoking laws, health inspections and drug testing. There’s no reason to believe we can’t achieve similar victories in machine learning and AI. Here in the BBC, we certainly aim to do our part.

References

If you’re interested in digger deeper into this topic, I’ve collected all the references for this post (and more) below. I’d also encourage you to read the research papers coming out of great conferences like FATML, WHI, AI Now and ML and the Law. Many of them are well-written and very approachable.