Pidentifier, the Computer Vision Model for your Sweet Tooth
How We Pidentified Your Breakroom on Pi Day with AWS Serverless and Machine Learning
Written by Matt Paterson and Nick Galvez
It was a cold and dark night in February and I was on a Zoom chat with coworkers. As Data Science Instructors, we were going over some class details while the students were in breakout rooms building their first linear regression models. I suddenly got excited for no reason at all.
“Hey wait — it’s Februrary! DO you know what that means?” I said. Crickets. “Only six more weeks until PI DAY!” I said. Even amongst nerds I got shaking heads and eye-rolls. Out of nowhere, I told my colleagues that I wanted to build a computer vision model that you could use to identify pies.
“You’re a total nerd, huh?” said the lead instructor. I told her she was right.
The next morning I told my boss my idea. As a Machine Learning Engineer at a small agile software consultancy, I was salivating to build more demo models that I could talk about publicly–projects that were not cloaked by NDA agreements with clients.
“We could use this as a marketing tool,” I told Chris Miller, Founder and CEO of Cloud Brigade in Santa Cruz, California. “Think of all the geeks who will take pictures of their pie on Pi Day! And what a great example of what we can do for [the companies that we were pitching for computer vision projects at the time].”
At first, Chris was not enthralled by my idea. It wasn’t something that we could bill customers for, and we had other work that needed doing. I returned to my IoT work with the RaspberryPi and told myself I’d have to try building it in Tensorflow on the weekend.
The next day, in our daily stand, Chris surprised me. “OK so this pie idea is good, but let’s identify the kind of pie. I want the app to use a photo of a slice of pie and tell the user if it’s Apple Pie, or Peach Pie, or Pumpkin Pie…that’s what we’ll do.” I was at once over the moon that he liked my idea, and skeptical that we could build a model that differentiates between the kinds of pie. After all, in black and white, a peach pie and an apple pie are identical. It’s only the colors that separate them.
He insisted we differentiate the flavors, and I’m glad that he did. What a great exercise for our clients and future clients on just how good the current technology is that is available to regular Small-Medium sized businesses!
With a general architecture drawn up, we next broke down each step into more detail to determine what we’d do. In my opinion, there is still no substitute for the original whiteboard for this purpose.
First things first, we had to build an Image Classification Model that could discern between various types of pie. I started with a Convolutional Neural Network (CNN) using Tensorflow. To do this I pulled several hundred photos from the internet and created a dataset of Pie versus Not_Pie. Tensorflow has built in functions that will label your dataset for you provided that you save the images into folders whose names are those of your labels. Easy peasy.
As you’d imagine, a few hundred photos in a made-from scratch CNN does not a perfect model make. However, the exercise was important to get a baseline. In Data Science, it is important to get a minimal model working with a small dataset in order to see the fail points before you move too far into a huge project. Also, the improvement in your model performance gives you a little faux ego boost as you watch your metrics soar.
Since we didn’t have a dataset of tens of thousands of photographs of pie slices, we wanted to use a pre-trained image classification model and fine-tune it, rather than using a made-from-scratch CNN model. There are many pre-trained models on the market, some that even specialize in food, but Amazon actually has a microservice that is already optimized to do exactly this.
Amazon Rekognition Custom Labels is a microservice that allows you to use your own curated image dataset to build a fine-tuned CNN model for either image classification (our use) or object detection (those bounding-box models).
Rekognition is not cheap. At about $4/hour, a live inference endpoint can get pretty expensive pretty fast, but luckily we were able to get AWS to sponsor our model endpoint for this project (Thanks Amazon!).
In order to use Rekognition, I first collated a dataset of about 1500 images of slices of pie. I made sure to put like images into their own data folders. There was an apple_pie folder, a peach_pie folder, a whoopie_pie folder, etcetera. I then saved the 14 different folders to an s3 bucket with a common root folder. Finally, using Rekognition Custom Labels was as easy as filling in a few fields with the datapaths of these folders within the s3 bucket, clicking a button, and making a cup of tea while the model trained.
In order for us to put this model into production, it was necessary to create a RESTful API which could serve our model. For this, we also didn’t have too much heavy lifting. Amazon takes care of the inference API for you by providing an ephemeral endpoint server and some code to access the RESTful API. It even lets you choose between Python code and the command line interface code. From there, I just had to program the API Gateway.
Since we don’t want to expose our AWS account to the world and risk hacking or cyber attacks, we utilized the Amazon API Gateway to shield the Rekognition Custom Labels API. This also allows us to use the API without passing our AWS Credentials in a place that can leave that data open to attack. API Gateway allows us to write our own RESTful API where our front-end application can send a request, and then the API gateway activates Lambda functions to either hit the Rekognition API, save data to S3, or return a response to the front-end application without opening the door to our own account.
Programming the API Gateway does take a bit of care and practice. Unlike Rekognition Custom Labels, the API Gateway requires many layers of steps to make sure that both the Python and the JSON is written correctly, and that the api schema is programmed correctly to accept requests and return the correct error messages, and to finally utilize the correct lambda functions depending on the information that is sent to the API from the client side.
The result is that we now have three separate Lambda functions that respond in their own way to different formats of requests sent to them, which will be described further below.
The GUI Application
Next he made a Webpack project with Babel for transpiling and a simple templating engine, choosing to use Handlebars he’d used it before.
According to Nick, “It turned out that all this setup saved me time in the long run. Being able to run Webpack dev server and set up a proxy to the backend allowed me to work without running into CORS issues. Also being able to quickly plug new datasets into handlebars when the API responses changed meant I didn’t have to fiddle with manipulating responses that much.”
Nick goes on to tell me that “The most challenging part of the front-end was getting the canvas image manipulation to work cross-browser and cross-platform. Sadly, I didn’t realize there were some very good node libraries that I could just pop into webpack until I was almost done figuring it out for myself!”
Once we were ready to deploy, Nick created a production Webpack config and added minification and optimization plugins and created a build that could be pushed with the AWS CLI. Next time, it would be awesome to set up some command line build system that would build and deploy automatically when changes are commited.
Projects like this are great because they give us the chance to do things we’ve wanted to learn about but haven’t had a reason or the budget to dig into!
The Final Workflow
With all of the pieces in place, the following workflow happens in the app:
- The user takes a photo of the slice of pie
- A unique filename is assigned to the photograph
- The photograph is saved to s3
- The API gateway receives the photograph and inferences it against the Rekognition Custom Labels API
- A response is sent to the application with the image classification
- That response is also saved to the data store to classify the image
- The user is asked to verify if the pidentification was correct
- If the user fails to respond, a negative, or “false” response is sent to the data store and the original prediction stays in the data store
- Else if the user responds and verifies, their answer is logged and sent to the data store
- If the user indicates that our prediction was wrong
- then they are asked to tell us what the name of the pie should have been
- Their answer is sent to the data store
- The user is then given the chance to take a new photo of another slice of pie
In this paper We’ve walked you through the process to
- Create and label a sample dataset
- Train and save a deployable Machine Learning Model using Amazon Rekognition Custom Labels
- Build and deploy a RESTful API with Amazon API Gateway and AWS Lambda
- Build and deploy a Front-end application to showcase this to our customers
Whether you’re a little nerdish, or a full-silicon-jacket nerd like me, I hope that you’ve enjoyed this read and that you’ll consider the expertise that Cloud Brigade can lend to help your business move into a place of modernization through the embrace of AI.