Machine Learning with Humans in the Loop — Lessons from StitchFix
In this video, you’ll meet Brad Klingenberg (VP of Data Science at StitchFix) and learn about Machine Learning with humans in the loop and the many lessons Brad and his team have learned. The lessons include:
- Lesson 0: Humans and machines are a winning combination
- Lesson 1: You have more than one feedback loop
- Lesson 2: Human selection changes your objective function
- Lesson 3: Even humans need feature selection
“It works really well, but it’s complicated.” — Brad Klingenberg
*Note: StitchFix is hiring data scientists! See openings at StitchFix here.
Meet the Speaker: Brad Klingenberg, VP of Data Science
Brad Klingenberg leads a team of 20+ data scientists working on human-in-the-loop machine learning at Stitch Fix.
Brad’s team develops the recommendation algorithms that guide StitchFix stylists, the human experts who curate the items selected for clients. His team also matches our clients and stylists together and measure, monitor and optimize the role of human selection in our recommendation system.
Episode Notes:
Lesson 0: Humans and machines are a winning combination
Human Judgement:
- helps leverage unstructured data
- provides a human element to the process (empathy, creativity) & builds relationship
- frees the algorithm developer from edge cases (can think more about the average case instead of the worst case)
Lesson 1: You have more than one feedback loop
StitchFix recommendation => human stylist (middle layer) makes selection => Client/Customer receives
- Extra layer of feedback
Lesson 2: Human selection changes your objective function
What should you predict?
Let’s say you are going to train a model at StitchFix. You can use a traditional supervised learning model; based on past data, let’s make predictions for the future.
Naive Approach: ignore selection and train on historical shipment data
Advantages:
- traditional supervised problem
- simple historical data
Problem 1: human selection can censor your data
“It’s ironic that the rules that are most censored are often the most important.” — Brad
Problem 2: success probabilities can make for terrible recommendations
Important metric: probability the stylist will be able to send this to the right client
Example: A low coverage item.
Edgy dress. Not many people like it, but easy for a stylist to identify these clients. High score!
Example: A high-coverage item
More neutral item. Many clients will like it, but it is hard for stylists to identify these clients. Average Score!
In both cases, compelling recommendations require understanding human selection.
Generally, selection data will be much larger and more complicated to collect and work with
- Negative cases: logging the set of things that was available to be selected but was not selected
- Presentation effects: similar to the way search engine results are studied
Lesson 3: Even humans need feature selection
Recap:
Algorithm recommendations => human curation => recommendation to client
Problem: Unstructured data can be overwhelming for humans
Feature Engineering: creating useful summaries for human consumption
Creating Features for the Human Classifier:
At StitchFix, the stylists use a tool (a UI) to help with making decisions. It’s important to focus on:
- interpretability
- compelling evidence
- orthogonallty (complementary)
A/B testing
Ultimately, this is an empirical question.
Experiment with:
- production systems
- simulations for human classifiers
Things to Think About:
The lessons above are just the tip of the iceberg.
Introducing human selection changes the way you think about your data, the way you think about what you are optimizing, the way you think about evaluating the combined system.
Here are some additional problems to ponder:
- Balancing exploration and exploitation with humans in the loop
- Making decisions when humans and machines disagree
- Randomness in human decision making
Have ideas / thoughts on these problems? Chat with Brad :)