Taking Selfies (and More) to the Next Level with Open Source Deep Learning Models
Everything I’ve Learned About Machine Learning I’ve Learned From Machines.
I’ve done some experimenting with using machine learning in apps before. It usually produces some impressive results, but training a model to be able to make meaningful predictions can be a lot of work. One of my first attempts at this was during my time as an intern with the Watson Data Lab @ IBM, when I built an app to predict a Reddit post’s potential “karma” score. I remember having a lot of fun with that project, but it involved giving myself a crash course in data-science, delving deep into Python, and struggling to choose machine learning algorithms from a pretty intimidating sounding list.
Over time my knowledge of the algorithms and tools available to developers improved, which led me to a new approach on my next project: Transfer Learning. This is a machine learning concept that builds on a very human idea, which suggests that skills or knowledge learned while solving one task should help when solving a different, but related task. With humans, we would expect an expert race car driver to have a comparatively easy time learning to race a different type of vehicle. Similarly, in deep learning it is much faster to retrain an image recognition model on additional categories than it is to train one completely from scratch.
The benefits of transfer learning are two-fold: less actual training work is being done since only the top “classification” layer needs to be retrained, and less data needs to be collected since we are only responsible for training our model on the new categories. In programming we are always looking for ways to avoid “reinventing the wheel”, and transfer learning is really just a way to accomplish this with deep learning.
By taking advantage of pre-trained models like Inception and MobileNet (available as part of TensorFlow, an open source machine learning framework) and retraining them on a data set of images curated by me, I was able to implement a “food recognition” REST API that could identify pictures of 101 different types of food with pretty remarkable accuracy. I may have been inspired by a certain scene from Silicon Valley.
Retraining a model involves a lot less work than starting from scratch, but it does require a fair amount of time and lots of computing power. Even equipped with a high-powered CUDA-Capable nVIDIA GPU, it took several hours to train/test my model each time I wanted to try it with a new set of parameters. That’s all before I started with the process of getting the model into a format I could use in an API server. Overall, this was a much smoother process and better result than my first go-round, but a hefty workload nonetheless.
If only there was an even easier way to incorporate Artificial Intelligence in my apps…
The Power of MAX
As it turns out, there is an easier way to harness the power of machine learning in your apps. Powerful pre-trained models like the ones I mentioned above are out there, and no longer are they hidden behind the closed walls of big corporations or university research centers. On the Model Asset eXchange (MAX), you can browse a wide variety of models that have been pre-trained for effective use in specific areas, and more are being added all the time. Better still, the models are packaged with a fully documented API server already built-in, so you can can go from idea to development almost instantly.
The models are free and open-source, so there’s no fees or even account credentials/API keys needed to use them. I can’t think of an easier or quicker way to get up and running with real deep learning models that you can interact with, and it’s available to everyone! As long as you have Docker installed on your machine, you can download the model and fire up the API server with one simple command, that (depending on which model you choose) will look something like this:
$ docker run -it -p 5000:5000 codait/max-image-segmenter
Perhaps you’re developing apps in the cloud, and you want to deploy a model to your Kubernetes cluster. Most of the models on MAX are already set up to be deployed with no extra configuration steps, like this:
$ kubectl apply -f https://raw.githubusercontent.com/IBM/MAX-Image-Segmenter/master/max-image-segmenter.yaml
So really, no matter what programming language you’re working with, or what kind of development environment you’re operating in, working with MAX models couldn’t be simpler. They allowed me to build on the successful machine learning work done by others and focus on the development of my app.
The Test Run: Building my App
After taking one quick look at the models available on MAX and reading what they were capable of, my mind was flooded with ideas for apps based on them. A fake tweet generator, a program that recreates images in the style of another one, even cryptographic applications — all possible with these MAX models without needing to do any of the “heavy lifting” machine learning work.
I chose to work first with the MAX “Image Segmenter” model, with the idea being that I could make a tool that would serve a green-screen type effect, cropping out individual objects from images without any guidance or direction from the user.
The model was set up to do almost exactly that, right out of the box. The Swagger generated API docs are very complete, and allowed me to experiment with the model server right away, getting a real look at what the data structures I’d be working with would look like, and what types of information the model could give me. If you haven’t experienced this before it’s hard to overstate how simple it is. With zero coding required, you can interact with and get predictions from deep learning models using your own data.
When it was all said and done, I was able to implement the “Magic Cropping Tool” with relatively little effort thanks to the MAX model. It gave me all the data I needed to be able to construct a colormap of the different objects identified in the image, and then using canvas, crop out each of those segments and save them as individual images. As a developer, having the machine learning work already completed allowed me to focus on the “fun part” of development (for me) instead of getting bogged down with data science work that has already been done by others — which was a huge win.
I know I’ve linked to it elsewhere in the article… but now that you know a little behind the process of creating it, take a look at the app I created with MAX on GitHub. Play around with the tool by cropping some objects from some images, and share the good ones!
MAX + Me = A Better Developer
There’s no doubt that of the machine learning apps I’ve built, this Magic Cropping Tool comes the closest to having a real-world, useful function. It already does basically what I want it to do, but I’ve already got a growing list of features I’d like to add to it at some point:
- Server-side image processing
- Kafka & Cloud Function integration to handle batches/streams of images
- In-browser models and predictions with TensorFlow.js
- More identifiable object types
My best advice if you’re curious about integrating deep learning into your apps? Go see what new models are available on MAX… and start creating!