Building a ML keynote demo for 100,000+ people
Did you miss the AutoML announcements and demos during the Cloud Next ’18 keynote? I‘ve got you covered! In this post I’ll provide an overview of the AutoML products launched and the demos I showed during the keynote. I’ll also share some insights on the demo building process so that you can apply it to your own demos, if that’s your thing.
What is AutoML?
AutoML is a new ML offering on Google Cloud that lets you train custom machine learning models on your own data. The best part? You don’t have to worry about writing any of the model code, just press a train button and you’ll magically get access to your custom predictions through a REST API endpoint.
If you think of machine learning tools across a spectrum, AutoML falls right in the middle:
AutoML is currently available in 3 variants, all of which launched in beta at Next:
- Vision: build image classification models trained on images from your dataset to do things like classifying the type of cloud in an image.
- Natural Language: build custom text classification models to classify sentences and text documents into your own categories.
- Translation: build domain-specific translation models to improve translation for industry-specific jargon and linguistic nuances.
During the keynote I showed a demo of AutoML Vision and NL. If you want to watch me be loaded into a 14x14-foot rotating cube (there’s something I never thought I’d say), you can watch the recording here:
One of my favorite things about AutoML is that it gives you a domain-specific model, trained on your own dataset. This also makes it difficult to demo since any AutoML example will be niche and therefore not generally applicable. We cycled through lots (seriously lots) of different datasets for AutoML keynote demo ideas, with the goal of finding something that would inspire people to think about how they could apply AutoML to their own problem. I’ll outline both demos below.
You won’t be-leaf the AutoML Vision demo
To demo AutoML Vision I built a model that would take an image of a leaf and predict its species. But let’s start from the beginning — how did I land on leaves? Behind any good machine learning model is a high quality set of training data.
Finding an image dataset
The size of a training dataset depends on the type of model you’re building. If you’re building a model from scratch you’ll need on the order of thousands of image per label for high accuracy. Luckily AutoML makes use of a technique called transfer learning, which involves using a pre-trained model as a starting point for training a model on a similar task rather than starting from scratch. As a result, training a model with AutoML doesn’t require as much training data. You can start with as few as 10 images per label, though you’ll probably want a few more for a high accuracy model.
With that in mind, I needed to find or create an image dataset specific enough to highlight the value of AutoML Vision. There are a lot of great publicly available image datasets out there, but many of them use higher level labels which we could get from something like the Cloud Vision API.
Here’s another kicker: I needed to make sure I had the rights to use the dataset for this demo. That meant it needed to either be public domain, have a creative commons license that allowed for commercial use, or comprise images taken by folks (like me) who released all rights to the images. Along the way I discovered two handy tools for filtering datasets and images by license:
- CC Search: this searches across multiple sources for images, and you can filter by licenses that allow for commercial use or modification
- Kaggle: my favorite place to find all sorts of datasets. You can filter by licenses in the dropdown on the Datasets page. Or, you can look at the license for any individual dataset on the Overview page:
Note that there are many different types of creative commons licenses. Anything with “NC” in the name (like “CC BY-NC-SA 4.0”) means non-commercial, so you can’t use these datasets for any commercial purposes (and an enterprise cloud conference is pretty much as commercial as it gets).
After weighing all these requirements we settled on the Leafsnap dataset, which comprises over 30k images of leaves from 185 different species.
Preparing the data and training a model
To keep things simple I used only 10 types of leaves from the original dataset. After narrowing down the images I was ready to upload my data to AutoML Vision. There are a couple ways to do this (more details in this video):
- Upload them directly to Cloud Storage and create a CSV with the filepath of your image in one column, and the associated label(s) in another. Once I got my images in GCS, I wrote a quick Python script to generate the CSV
- Put your images in folders with the name of their label and create a zip. When you upload the zip AutoML will assign labels based on folder names
Once you’ve imported all your images you’ll see them in the UI:
Training a model is really as simple as pressing a button:
Evaluating the model and generating predictions
Once your model is trained, you can look at some common ML metrics to evaluate its accuracy. My favorite is the confusion matrix, which shows the percentage of images from the test set that a model was able to classify correctly. Ideally you want to see a strong diagonal down the top left, like this:
You can click on any of the squares to see which images the model found confusing. Next, it’s time for the best part — generating predictions on images your model hasn’t seen before.
You may have noticed that the leaf images in the LeafSnap dataset are pretty homogenous: they’re all taken on white lab-like backgrounds. Since this is the entire “world” our model was trained on, we shouldn’t expect it to do a great job classifying images of leaves with busy backgrounds, like this:
I was, however, impressed to discover that the model was able to classify some images of leaves “in the wild” where the shape of the leaf was more clearly defined in the image:
If I add more varied images to the training dataset, I can expect higher accuracy prediction on leaf images taken in the wild.
The screenshot above shows how to generate a prediction in the AutoML UI. But chances are you’ll want to build an app that dynamically generates predictions on your trained model. That’s where the AutoML API comes in. Here’s all you need in your JSON request to the AutoML API:
And here’s how you’d make a request to your custom model with curl:
AutoML Natural Language demo
The same dataset selection criteria from Vision also applied to NL. I was looking for a dataset specific enough that you couldn’t take the text and run it through the Cloud NL API’s content classification method (which is great in a lot of cases). I also needed to filter by dataset license. I did some fun experiments while looking for NL dataset options, like building a model that can predict the region a wine came from based on the description, a model for predicting the source of an article based only on its headline, and a model to classify text messages as “spam” or “not spam” (but think about showing spammy text messages on a giant screen at a livestreamed event 😂).
For the demo we choose to go with a dataset on Kaggle provided by the non-profit DonorsChoose:
DonorsChoose matches teachers who need resources for their classroom with donors. They currently have a team of volunteers that manually screens and categorizes each submission, so that donors are matched with projects they care about. The categories are specific to their application (things like “Lab Equipment,” “Art Supplies,” and “Trips”), so a generic pre-trained API wouldn’t work. DonorsChoose released a public domain dataset with over 1 million teacher requests to Kaggle to see if a community of ML experts and data scientists could build them a solution. I wanted to see if AutoML could help.
Preparing the data and training a model
Because AutoML NL has a limit of 100,000 examples in a dataset, I wrote a script to gather a subset of the original 1M+ dataset, limiting it to 9 categories. Uploading text data to AutoML NL is as simple as creating a CSV with your raw text in one column and the associated category in another (you can also build models with multiple labels per text input). Once the data has been imported, you’ll see something similar to AutoML Vision:
To train the model, you guessed it — just press a button! Note that training NL models currently takes considerably longer than Vision. I’ve found NL models take about 3–4 hours to train (you’ll get an email when it completes).
Evaluating the model and generating predictions
To evaluate the accuracy of single-label NL models, we can also look at the confusion matrix:
Here we also see a pretty strong diagonal from the top left, which means our model categorized the majority of our test data correctly. Now time to generate a prediction on text input our model hasn’t seen before:
In this example our model predicted the correct category of “Lab Equipment” with 98% confidence for this particular text input:
My students need aquarium lighting for our environmental clownfish aquaculture project.
Like AutoML Vision, we have access to a custom API endpoint for generating predictions:
It’s important to note that only I have access to this model, or any developers I’ve shared my project with. None of the training data for these models will be used to improve the Cloud ML APIs.
Presentation time
Once you’ve built the demo, what could possibly go wrong? Turns out lots of things. Live demos of developer products are always high risk, but my team are big believers in them. It’s infinitely more authentic to show a product being used live than to show a slide and talk to a product’s features. Even if something goes wrong, that’s real and relatable to developers.
I was amazed at how thorough our A/V team was in helping me think about backup plans for every possible type of demo failure:
- Tech (AutoML) fail: What if the prediction request failed? If this happened I had screenshots of a successful prediction ready to go in a separate browser window. For some demos a backup screen recording is best.
- Hardware fail: If the machine I was demoing from stopped working, we had a second machine with the demo configured and the A/V team was ready to switch to it if necessary. Not only was the demo configured on the backup machine, one of my teammates was backstage mirroring everything I was doing on screen in near realtime so that if we had to switch to this machine it would be in the same place where I left off.
- Human fail: Humans are not perfect, and so in the event that I failed to show up or deliver the demo I had another teammate backstage ready to present.
Before I presented I had a few hours to freak out backstage and convince myself that no one would actually watch this thing. I spent most of that time listening to Hamilton. Luckily I didn’t find out until after I presented that there were over 100k people on the livestream 😮
My biggest takeaway from all of this is it takes a village to build a four minute demo. I was on stage but tons of people made this happen.
Train your own models
If you made it this far, maybe you’ve got some ideas for your own AutoML models! Dive into the docs for AutoML Vision, Natural Language, and Translation (I’ll cover Translation in a future post). Want to share what you’re building with AutoML, or have suggestions of interesting datasets? Find me on Twitter at @SRobTweets.
More where this came from
This story is published in Noteworthy, where thousands come every day to learn about the people & ideas shaping the products we love.
Follow our publication to see more product & design stories featured by the Journal team.