Chryons, Racist Machines, “Gender Gap”, Getting Values from ML

Weekly Reading List #1

(Photo Credit)

Issue #1: 2018/04/16 to 2018/04/22

This is an experimental series in which I briefly introduce the interesting data science stuffs I read, watched, or listened to during the week. Please give this post some claps if you’d like this series to be continued.

The Differences in How CNN, MSNBC, and Fox Cover the News

This visualization uses data from the Third Eye Project (which captures chyrons via OCR). Chyrons between August 25, 2017, and January 21, 2018 were processed and transformed answer three questions:

  1. Which network uses which words the most?
  2. What the use of a certain word over time?
  3. What are the words that were most often used with a certain word in the same chyron?
To answer question #1
To answer question #2
To answer question #3

It’s really impressive. I especially appreciate how they have written the “Data and methodology” section.

Can a Machine be Racist or Sexist?

Slides from a talk given by Renee M. P. Teate, creator of Becoming a Data Scientist blog and podcast, give an overview of the bias that could be hidden inside machine learning algorithms.

  • What is already going on with AI and Machine Learning that should concern us?
  • Are the impacted people and communities aware of what’s already happening?
  • Are the people who design these systems aware of the possible impact of their work on peoples lives?

The presenter then discussed typical types of machine learning models (regression, classification, and clustering) and how things can become problematic. I find the “Crime Forecasting Using Spatio-Temporal Pattern with Ensemble Learning” example very intriguing for its subtleness.

Basically, predictive model development involves a lot of decision making, and all these decision are made by human, thus the potential bias injection. The presenter gave quite a few examples of this kind of flawed process, including:

  • Incorrect data
  • Manipulated data
  • Not representative data
  • Data with historic biases
  • Imbalanced data (when incorrectly evaluated)
  • Dropping missing data without investigation
  • Bias amplification (in training)
  • (Biased) feature & algorithm selection
  • Problematic selection of evaluation metric
  • Models that are easily gamed
  • And more…

So a machine CAN indeed be racist or sexist. What can we do about it? I recommend reading the slides for answers (page 86–90). They are definitely worth your time.

An excellent book on this topic — Weapons of Math Destruction:

(The three key ingredients of weapons of math destruction: 1.Opacity 2. Scale 3.Damage.)

Too Many Men

(For some reason I wasn’t able to embed the link) A great piece from the Washington Post on the huge gaps of male and female populations in India and China.Not exactly a data science article, but I just love its interactive charts. Here are some sneak peeks:

Getting Value from Machine Learning

In short, the gap for most companies isn’t that machine learning doesn’t work, but that they struggle to actually use it.

This article argues that the main problem why machine learning haven’t made as much impact in the business world as we’d anticipated is not the model under-performing, but the difficulty of deployment.

It describes an “AI project manager” they built to predict red flags of ongoing projects. And they found the biggest requirements for this product were:

… a robust software engineering practice, automation that allowed domain experts to come in at the right level, and tools that could support comprehensive model testing.

Finally they propose a new machine learning paradigm, with key steps described in this paper, and supported by open-source tools.