Sign in

CS @Harvard | I write about fairness & ethics in AI/ML for @fairbytes | Storyteller, hacker, innovator | Visit me at

A technique to explain how black-box machine learning classifiers make predictions

It’s needless to say: machine learning is powerful.

At the most basic level, machine learning algorithms can be used to classify things. Given a collection of cute animal pictures, a classifier can separate the pictures into buckets of ‘dog’ and ‘not a dog’. Given data about customer restaurant preferences, a classifier can predict what restaurant a user goes to next.

However, the role of humans is overlooked in the technology. It does not matter how powerful a machine learning model is if one does not use it.

Elmo, Bert, and Marge (Simpson) aren’t just your favorite TV characters growing up — they’re also machine learning & NLP models

Bart. Elmo. Bert. Kermit. Marge. What do they have in common?

They’re all beloved fictional characters from TV shows many of us watched when we were young. But that’s not all — they’re also all AI models.

In 2018, researchers at the Allen Institute published the language model ELMo. The lead author, Matt Peters, said the team brainstormed many acronyms for their model, and ELMo instantly stuck as a “whimsical but memorable” choice.

What started out as an inside joke has become a full-blown trend.

Google AI followed with BERT, an incredibly powerful and now widely used Transformer-based language model…

A guide for college students on factors to consider and options for what to do in the next school year

In the last month, many US universities have announced their fall reopening plans.

Some universities have proposed a hybrid model, with some students returning to campus for in-person classes and others staying home, or offering a select number of small classes being held in-person. Others have proposed for in-person or entirely virtual classes.

This week, Harvard announced that they plan to charge their regular $50K tuition for all students to receive 100% virtual classes.

While I truly believe that virtual classes still have merit, there are many aspects that are difficult to replicate via remote or virtual education.

One alternative…

Why many gold-standard computer vision datasets, such as ImageNet, are flawed

Even though it was created in 2009, ImageNet is the most impactful dataset in computer vision and AI today. Consisting of more than 14 million human-annotated images, ImageNet has become the standard for all large-scale datasets in AI. Every year, ImageNet even hosts a competition (ILSVRC) to benchmark progress made in the field.

There’s no denying ImageNet’s influence and importance in computer vision. However, with the growing evidence of biases that lie in AI models and datasets, we must consider the curation process with awareness of ethics and social contexts to improve for future datasets.

A recent paper by Vinay…

Papers, books, and resources to learn about fairness in vision, NLP, and more

Recent discussion in the machine learning community has brought to light the importance and necessity of understanding not just machine learning, but all the considerations of bias and fairness behind every algorithm’s usage.

“This isn’t a call for ‘diversity’ in datasets or ‘improved accuracy’ in performance — it’s a call for a fundamental reconsideration of the institutions and individuals that design, develop, deploy this tech in the first place.” — Vidushi Marda

For newcomers to this field of fairness in AI, here is a compilation of helpful papers, books, and resources for learning more about the field and specific applications…

An overview of how to use counterfactual fairness to quantify the social bias of crowd workers

Crowdsourcing is widely used in machine learning as an efficient form of annotating datasets. Platforms like Amazon Mechanical Turk allow researchers to collect data or outsource the task of labelling training data from individuals all over the world.

However, crowdsourced datasets often contain significant social biases, such as gender or racial preferences and prejudices. Then, the algorithms trained on these datasets would then produce biased decisions as well.

In this short paper, researchers from Stony Brook University and IBM Research proposed a novel method to quantify bias in crowd workers:

One Line Summary

Integrating counterfactuals into the crowdsourcing process is a new method…

Reason #5: Just because you CAN develop an algorithm, doesn’t mean you SHOULD

In recent years, jobs across all levels require understanding and usage of technology. As a result, computer and digital literacy is the #1 entry-level skill needed in the job market.

Computer literacy allows us to engage with society — finding a job, ordering takeout, searching an answer to a question — in ways previously unimaginable. Similarly, AI literacy is becoming increasingly necessary as well, as artificial intelligence systems become more integrated into our daily lives.

Jordan Harrod, a Harvard-MIT Ph.D. student and AI education YouTuber gave an excellent talk on the importance of AI literacy at TEDxBeaconStreet last year.

Despite its impressive performance, the world’s newest language model reflects societal biases in gender, race, and religion

Last week, OpenAI researchers announced the arrival of GPT-3, a language model that blew away its predecessor GPT-2. GPT-2 was already widely known as the best, state-of-the-art language model; in contrast, GPT-3 uses 175 billion parameters, more than 100x more than GPT-2, which used 1.5 billion parameters.

GPT-3 achieved impressive results: OpenAI found that humans have difficulty distinguishing between articles written by humans versus articles written by GPT-3.

Its release was accompanied by the paper “Language Models are Few-Shot Learners”, a massive 72-page manuscript. …

Curricula, projects, and even fiction books to empower students to learn about AI ethics

Artificial intelligence is a flourishing field, and its presence in the K-12 classroom is growing too. This article compiles resources for introducing AI and, in particular, AI ethics in the K-12 setting:

  • Big Ideas
  • Curriculum Materials
  • Hands-On Projects and Demos
  • Books (for educators)
  • Books (fiction)

Big Ideas

What does it mean for a machine learning algorithm to be “transparent”?

When looking at fair and ethical algorithms, transparency is a key concept often brought up in discussion. But what exactly does it mean for a machine learning algorithm to be “transparent”?

Like “fairness” and “privacy”, it sounds important and useful, but the concept of transparency is quite ambiguous and is worth exploring in more detail.


When we seek transparent algorithms, we are asking for an understandable explanation of how it works. For example:

  • What is the algorithm doing?
  • Why is it outputting this particular decision?
  • How did it know to do this?

Types of Transparency

Have you ever read an article or watched…

Catherine Yeo

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store