What can machine learning do for you?

Stanley Chin
Building CrowdRiff
Published in
7 min readOct 31, 2017
Image by Maddie Baker: https://unsplash.com/@maddybakes

Machine learning is hard to learn, but it’s getting easier and easier to use it to solve problems.

Several years ago, when I first started my CS major, machine learning was the freshest, Koolest thing to do, and working at Google was the dream workplace to be at. Several years later, machine learning is still pretty darn cool.

Imagine you go to the grocery store and compare two similar food items to see which one is healthier. You’re looking at the calories, which one has less carbs or fat, yet maximizing on fiber and protein. 300 calories are in 30g of product A, compared to 100 calories in 55g of product B, carry the 1… how much carbs are in A again? After properly trained, a machine learning model can give you an answer immediately! These types of solutions are what originally intrigued me first to really love machine learning.

It’s cool because we can theoretically train a program to learn about anything and it will predict the results of a problem! Dare I say, witchcraft!?

In my attempt, I tried many ways to pick up machine learning, going through the gold standard of Andrew Ng’s Coursera course, taking courses in data science and artificial intelligence, attempting Kaggle problems, and following blog posts to do personal projects. All of these things fell short because a) short attention span and b) there was a lot to learn before actually making something that will do machine learning.

Recently, at CrowdRiff, we had the opportunity to explore our interests through our quarterly hack days and I had vested interest to tackle a problem and solve it with machine learning.

Image by Patrick Fore: https://unsplash.com/@patrickian4

You can follow the process of what we did near the bottom of the blog post. I have provided a rundown of each individual step taken to setup, train, and test a model.

Three things I know to be true

  1. Google created an amazing convolutional neural net called Inception (v3) that allows us to make a copy of it and retrain its last layer so that we have a machine learning model that will classify our own images.
  2. As a company that provides a pool of images for our clients to curate, an incredibly useful exercise would have been to see if our model can predict whether or not a certain image will be curated by a client.
  3. Getting data is difficult, getting good data is VERY difficult.

What we did was use the Tensorflow library to retrain parts of the Inception model by using our own labeled data and then classify a new, unseen image and see what the model predicts.

After retraining the model, one small exercise we tried was to personally curate certain images and verify the responses from the model. This in a way, was validation to see how well our model performed.

The figure below illustrates our own example of curating images from Niagara Falls. The results were enlightening and the model did much better than anticipated. Images that looked like ads were filtered out, things irrelevant to the Falls were also filtered out. This machine learning model helped figure out what a bad image is and presented us with options of what a good image could be.

Output scores above 70% was labeled as ‘curatable’.

Three more things I know to be true

  1. From a scale of one to over trusting my data, I am pretty naïve. To truly understand what our model is learning, and thus predicting, we had to see the result of its output first. Then, go back and retrain.
  2. This time around, doing a project with machine learning was much more fun and approachable because we clearly defined our problem statement, we knew what tools to use, and we cared and were driven by its results.
  3. Machine learning is scary, especially if you are learning for fun! It took me awhile to do it but persevere, it can pay off. What I found to be very motivating is to work on a problem that you care about and not the generic “Hello World!” of machine learning which is predicting the type of iris flower example (yes I included a flower example in my tutorial).

What I learned

Good data is defined by what the problem statement is. Predicting what the client will curate has a LOT of meaning in reference to data. Questions like, "Has the client seen this picture before?", or "Is a previously curated picture still relevant after x amount of days?" are important to think about when building a model. However, we are not phased by this result, these are the type of things we expect we will learn over time.

Sometimes, it’s easier to run before you walk. Luckily for us, Tensorflow did some heavy lifting. The idea was to build something first with the least amount of background knowledge, make it work, somehow, and then review what we actually did.

DIY — Dip your toes into image classification

I outlined a sample workflow of what we did below. Please follow along if you are interested to make your own image classification model. This means the model will read in the pixels of each image and learn from it.

Essentially, what we’re going to do is use the Tensorflow library to retrain parts of the Inception model by using our own labeled data and then classify a new, unseen image and see what the model predicts.

Learning objectives

  • experience with a machine learning technique called supervised learning
  • retrain a convolutional neural net
  • tackle binary/multi-class image classification (binary is two labels, multi-class is more than two)

Ingredients

  • Docker
  • some terminal skillz
  • images, the more the merrier

Environment setup

Docker is a neat sandboxing tool that uses ‘container’ technology so that you can fire up applications within a bells and whistle-less version of your operating system.

Get Docker: https://www.docker.com/community-edition

After installation, if Docker hasn’t started yet, manually do so by running the application or running the command:

Create a Docker container with volume to persist data. This particular docker container runs a Tensorflow image so you get Tensorflow out of the box.

docker run -d -it --name <CONTAINER_NAME> -v <VOLUME_NAME>:/tf_files gcr.io/tensorflow/tensorflow:latest-develdocker exec -it <CONTAINER_NAME> bash

Cool, Docker has been set up.

exec lets you SSH into the Docker container.

Getting data

The next step is a very important step, getting data. You could get by with 100 images of each label, but the more the merrier. The project we did was upwards of 2000 for each label.

A label is what category you want the image to be named. For example, if you have 100 images of tulips, your label for that set is tulips.

From here, you can download a bunch of images or copy them from your local drive. These images must be labeled by the folder name.

So if you have 100 images of tulips, store them in a folder called tulips, same goes for daisies and lilies and etc… The flower example already separates the data into categories for you.

This will be your training data which means it is the images that the model will learn from.

(while inside your Docker container)mkdir /tf && cd /tfcurl -O http://download.tensorflow.org/example_images/flower_photos.tgztar xzf flower_photos.tgz-----------------------------------------------------------------(to copy images from your own drive to Docker)docker cp <FOLDER_NAME> <CONTAINER_NAME>:/tf/<FOLDER_NAME>

Your training images will now live in /tf/flower_photos.

The following command will start training your model to learn your images:

cd /tensorflow && git pullpython /tensorflow/tensorflow/examples/image_retraining/retrain.py \
-- bottleneck_dir=/tf_files/bottlenecks \
-- model_dir=/tf_files/inception \
-- output_graph=/tf_files/retrained_graph.pb \
-- output_labels=/tf_files/retrained_labels.txt \
-- image_dir /tf/flower_photos/

Depending on the size of your sample set and computer, this could take anywhere from 15 minutes to an hour. But afterwards, you have a fully functional classifier!

This accuracy means that out of 100 images, the model on average can correctly predict 91 of them.

Now you are ready to classify your images. Hop back over to your local drive and grab an image somewhere. You can open a new tab on your terminal.

Again, to copy images you want to classify from your local drive to the Docker:

docker cp <FILE_NAME> <CONTAINER_NAME>:/tf/<FILE_NAME>

Xblaster, came up with this nifty script to classify your images very simply. So throw that in there too:

docker cp <CONTAINER_NAME>:/tf/label.py label.py

So, grab an image that IS NOT from the training data. Make sure YOU know what it is, and then throw that in the classifier.

Hop back into the Docker container with:

docker exec -it <CONTAINER_NAME> bash

and then to run the python script:

python /tf/label.py /tf/<SOME_IMAGE>.jpg
Dandelion image by Azmi Semih: https://unsplash.com/@azsemok

The figure above illustrates the model predicting that the dandelion image is a dandelion with 99.8% confidence.

So now I ask you, what can machine learning do for you?

--

--