Generating Labeled Training Data for Your ML/AI Models featuring Angie Hugeback

TWiML Talk 006

The TWIML AI Podcast

Published in

The TWIML AI Podcast

2 min readMar 3, 2017

Subscribe: iTunes / SoundCloud / Google Play / Stitcher / RSS

My guest this time is Angie Hugeback, who is principal data scientist at Spare5. In this show, Angie and I discuss the real-world practicalities of generating training datasets.

This week’s podcast is sponsored by Spare5 (now Mighty AI). Spare5 helps customers generate the high-quality labeled training datasets that are so crucial to accurate machine learning models.

Angie and I talk through the challenges faced by folks that need to label training data, and how to develop a cohesive system for achieving performing the various labeling tasks you’re likely to encounter. We discuss some of the ways that bias can creep into your training data and how to avoid that. We explore the some of the popular 3rd party options that companies look at for scaling training data production, and how they differ. And, Angie gives us her top 3 tips for folks tasked with generating training data for AI.

Thank You Spare5!

Spare5 has graciously sponsored this episode. If you’re struggling with generating labeled training data for your machine learning or AI based products you should definitely take a look at what they’ve got to offer.

Above all, I’m just very grateful to Spare5 for helping to make this podcast possible for all of you, and I really encourage you to show them some love back: Reach out to them on Twitter at @spare5 and thank them, visit their web site, or request for a demo. All of those things let them know how much you appreciate this podcast and their support for it.

Finally, they’ve got a special offer for 25 lucky TWiML Talk listeners. Learn more on the podcast and sign up here: spare5.com/podcast.

UPDATE 1/10/17: Spare5 is now Mighty AI. The company announced its new name in conjunction with the close of a $14M financing round led by Intel Capital, with GV, Accenture Ventures and others.

About Angie Hugeback

Angie Hugeback on LinkedIn

Mentioned in the Interview

Spare5 | Training Data as a Service
Metropolis–Hastings Algorithm
Importance Sampling
Rserve — Binary R server
Machine Learning: The High Interest Credit Card of Technical Debt Note: In the interview, Angie referred to a Microsoft paper that she recommended. After the interview she realized it was this one by Google Research.
Seven Rules of Thumb for Web Site Experimenters
ModelTracker: Redesigning Performance Analysis Tools for Machine Learning [PDF] [Youtube]
A cautionary tale about humans creating biased AI models | TechCrunch

IMAGE CREDIT: UMN