Machine Learning with Humans in the Loop — Lessons from StitchFix

Published in

Kaizen Data

3 min readFeb 14, 2017

In this video, you’ll meet Brad Klingenberg (VP of Data Science at StitchFix) and learn about Machine Learning with humans in the loop and the many lessons Brad and his team have learned. The lessons include:

Lesson 0: Humans and machines are a winning combination
Lesson 1: You have more than one feedback loop
Lesson 2: Human selection changes your objective function
Lesson 3: Even humans need feature selection

Speaker: Brad Klingenberg, VP of Data Science at StitchFix

“It works really well, but it’s complicated.” — Brad Klingenberg

*Note: StitchFix is hiring data scientists! See openings at StitchFix here.

Meet the Speaker: Brad Klingenberg, VP of Data Science

Brad Klingenberg leads a team of 20+ data scientists working on human-in-the-loop machine learning at Stitch Fix.

Brad’s team develops the recommendation algorithms that guide StitchFix stylists, the human experts who curate the items selected for clients. His team also matches our clients and stylists together and measure, monitor and optimize the role of human selection in our recommendation system.

Episode Notes:

Lesson 0: Humans and machines are a winning combination

Human Judgement:
- helps leverage unstructured data
- provides a human element to the process (empathy, creativity) & builds relationship
- frees the algorithm developer from edge cases (can think more about the average case instead of the worst case)

Lesson 1: You have more than one feedback loop

StitchFix recommendation => human stylist (middle layer) makes selection => Client/Customer receives
- Extra layer of feedback

Lesson 2: Human selection changes your objective function

What should you predict?
Let’s say you are going to train a model at StitchFix. You can use a traditional supervised learning model; based on past data, let’s make predictions for the future.

Naive Approach: ignore selection and train on historical shipment data
Advantages:
- traditional supervised problem
- simple historical data

Problem 1: human selection can censor your data

“It’s ironic that the rules that are most censored are often the most important.” — Brad

Problem 2: success probabilities can make for terrible recommendations

Important metric: probability the stylist will be able to send this to the right client

Example: A low coverage item.
Edgy dress. Not many people like it, but easy for a stylist to identify these clients. High score!

Example: A high-coverage item
More neutral item. Many clients will like it, but it is hard for stylists to identify these clients. Average Score!

In both cases, compelling recommendations require understanding human selection.

Generally, selection data will be much larger and more complicated to collect and work with
- Negative cases: logging the set of things that was available to be selected but was not selected
- Presentation effects: similar to the way search engine results are studied

Lesson 3: Even humans need feature selection

Recap:
Algorithm recommendations => human curation => recommendation to client

Problem: Unstructured data can be overwhelming for humans

Feature Engineering: creating useful summaries for human consumption

Creating Features for the Human Classifier:
At StitchFix, the stylists use a tool (a UI) to help with making decisions. It’s important to focus on:
- interpretability
- compelling evidence
- orthogonallty (complementary)

A/B testing
Ultimately, this is an empirical question.
Experiment with:
- production systems
- simulations for human classifiers

Things to Think About:

The lessons above are just the tip of the iceberg.

Introducing human selection changes the way you think about your data, the way you think about what you are optimizing, the way you think about evaluating the combined system.

Here are some additional problems to ponder:

- Balancing exploration and exploitation with humans in the loop
- Making decisions when humans and machines disagree
- Randomness in human decision making
Have ideas / thoughts on these problems? Chat with Brad :)