Overcoming the Limitations of Learning from Data

and all that Jazz

Sherol Chen
Aug 1, 2017 · 6 min read
Image for post
Image for post

It seems like AI/ML is the big thing in silicon valley. If you check out how the VC money is moving, what the TechCrunch articles say, and what’s being posted on HackerNews, it’s like ML is the greatest thing since social networks. If Machine Learning is so great, why don’t we use it everywhere? Why not get rich building prediction models for the stock market? According to this NVIDIA blogpost, Deep Learning, responsible for the current ML boom, wasn’t so great until recently:

Over the past few years AI has exploded, and especially since 2015. Much of that has to do with the wide availability of GPUs that make parallel processing ever faster, cheaper, and more powerful. It also has to do with the simultaneous one-two punch of practically infinite storage and a flood of data of every stripe (that whole Big Data movement) — images, text, transactions, mapping data, you name it.

In a previous article, I talked about the differences between ML, deep learning, and AI. It’s important to understand that each one behaves differently, and has their own gotchas. For example, the non-rule-based nature of how Deep Learning builds intelligence can also produce harder-to-detect and unintended consequences. A famous example is the Twitter chatbot, Tay, from Microsoft, headlines as “Microsoft silences its new A.I. bot Tay, after Twitter users teach it racism.” Here’s another headlining example from Google, “Google Apologizes For Tagging Photos Of Black People As ‘Gorillas’” So, while the algorithms are sound, we’re still bound by garbage-in / garbage-out. Let’s take a look at some other edge cases.

Machine Learning Falling Short — Example: ML for Games

Subtle Biases of Data — Example: Data Driven Curation

The Grey Areas of What’s Fair or True — Example: Search Results

Image for post
Image for post
(May 30th, 2017)

So what are results that I’d like? If you noticed, the top result is not a video, it’s a Mix. The composition of the videos in that mix are more appropriately curated and a great sample of jazz music.

Image for post
Image for post

So which of the three types of results are “jazz,” the Results, Top Tracks, or the Mix? The real question is whether my opinion of jazz results is more meaningful that what most of the world thinks. It’s not an easy question. If we go by search results, jazz could be what everyone else thinks it is. I, on the other hand, would rather jazz be strictly more informed. I’d love for the rest of the world to meet my standards for jazz, but is that fair? There doesn’t seem to be a straightforward answer, but as data drives our technology, we have to be mindful of such things. It all comes back to garbage in, garbage out. When you’ve got an AI being trained on human-provided data, you’re going to get skewed results. One perfect example is a project on crash test dummies.

Discrimination in Data and Design — Example: Crash Test Dummies

Algorithms in themselves long predate computers. An algorithm is simply a sequence of instructions. Law codes can be seen as algorithms. The rules of games can be understood as algorithms, and nothing could be more human than making up games. Armies are perhaps the most completely algorithmic forms of social organisation. Yet too much contemporary discussion is framed as if the algorithmic workings of computer networks are something entirely new. It’s true that they can follow instructions at superhuman speed, with superhuman fidelity and over unimaginable quantities of data. But these instructions don’t come from nowhere. Although neural networks might be said to write their own programs, they do so towards goals set by humans, using data collected for human purposes. If the data is skewed, even by accident, the computers will amplify injustice. (The Guardian, 2016)

A study in 2011 showed that seat-belted female drivers had a 47% higher chance of serious injuries than a belted male driver in comparable collisions. This was due to the lack of female crash-test dummies. For the 2011 Sienna vehicle, the federal government replaced averaged sized male dummies with average sized female dummies to test the discrepancy. They found that, “when the 2011 Sienna was slammed into a barrier at 35 mph, the female dummy in the front passenger seat registered a 20 to 40 percent risk of being killed or seriously injured, according to the test data. The average for that class of vehicle is 15 percent.” The difference in statistics was even greater for minor injuries. (Washington Post, 2012)

Discrimination may be as subtle as inappropriate auto-completes for “why are women….,” or as life-altering as the under-diagnosis of diabetes in Asian Americans. Anything driven by data, whether medical research or Artificial Intelligence should be thoughtfully and ethically driven by accurate representation. Just as a doctor ought not to knowingly misdiagnose a patient based off of race or gender, we should make our best attempts to build technologies that suit all people.

Image for post
Image for post
https://www.searchenginepeople.com/blog/google-autocomplete-fails.html

Augmenting Fairness instead of Automating Bias

  1. Abstract: What sort of prediction model are we building?
  2. Problem Statement: What problem are we trying to solve?
  3. Previous Work: How are we currently solving the problem?
  4. Methods: How will we collect data and build the model?
  5. Data Analysis: How do we debug and monitor good and fair data?
  6. Results: How will we measure and vet the success of the model?
  7. Discussion: Who are we helping and who benefits most?
  8. Limitations: How could this work be exploited or fall short?
  9. Conclusions: How will this technology be marketed and used?
  10. Future Work: How will this model be improved and maintained in the future?
Image for post
Image for post

Finally, here is a very cool interactive demo and visualization based off of a paper on Supervised Learning practices for Equal Opportunities: https://research.google.com/bigpicture/attacking-discrimination-in-ml/

As machine learning is increasingly used to make important decisions across core social domains, the work of ensuring that these decisions aren’t discriminatory becomes crucial.

Image for post
Image for post

The Eliza Effect

ELIZA was a chatbot developed in 1966.

Sherol Chen

Written by

AI, Games, and Education

The Eliza Effect

ELIZA was a chatbot developed in 1966. The ELIZA Effect is the tendency to unconsciously assume computer behaviors are analogous to human behaviors. Here you’ll find articles on Artificial Intelligence, Machine Learning, Believability, and Procedural Thinking.

Sherol Chen

Written by

AI, Games, and Education

The Eliza Effect

ELIZA was a chatbot developed in 1966. The ELIZA Effect is the tendency to unconsciously assume computer behaviors are analogous to human behaviors. Here you’ll find articles on Artificial Intelligence, Machine Learning, Believability, and Procedural Thinking.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store