Is this a taco? A machine learning experiment with Custom Vision

Can machine see this as a taco?

Ever wondered how to get your hands on machine learning (ML)? Well, now you can through machine learning services provided over the cloud, called Function as a Service (FaaS).

While I was studying what machine learning is all about, I had a chance to learn about Custom Vision.

Custom Vision which is a cloud based machine learning service which allows users to provide custom image sets to train the algorithm to recognize certain features of images.

FaaS could bring ML closer to us

In the past, I have played with apps that provide feedback on pictures I upload such as sentiments of facial expression.

While those apps provide certain business values, the out-of-the-box solution might not meet specific needs.

Custom Vision by Microsoft

So when I learned about Custom Vision where I have control over what to recognize, I got excited.

I am by no means an expert in data science and would much rather take an easy route to experience the world of machine learning.

With Custom Vision, I just upload training sets and tag them accordingly, and I can quickly test the algorithm against my test image sets.

It doesn’t get any easier than that to just playing with them.

The training sets

When we had tacos for lunch at our office, I decided to put Custom Vision a test against photos of hand made tacos.

The test went like this:

1. Upload and tag six photos of tacos that are “correct” by our human standards, and tagged them as ‘tacos’ and ‘food’
2. Uploaded five photos of food around the office that is not tacos, and tagged them as ‘food’
3. Run the training process
4. Test the algorithm by uploading photos of tacos and others

Uploaded images with associated tags

As you can see, the training data set is minimal, and they were quick and dirty whatever I can grab in 15 minutes.

It’s nowhere near data science but hang on with me as we test the algorithm and review its performance below.

The testing sets

Let’s first start with an image that we know are definitely not tacos.

Neither food nor taco

Ok let’s keep moving on, how about just a beginning of taco with the shell?

A food but not a taco

Interestingly, it did recognize the taco shell as a food at the 87.5% prediction but it doesn’t quite qualify as a taco.

Let’s move on to another one, how about we put some beef in the shell?

A food and a taco

Now we are getting pretty high confidence that this is a taco at 90.5% prediction.

This is interesting because the training sets did not necessary feature beef in the forefront of it’s appearance.

With that in mind, let’s jump onto the finished tacos.

A food and a taco

This photo of a taco clearly resembles a complete taco and the algorithm picked it up likewise.

In the picture, the beef is not even visible so it’s possible it is looking at shapes than the hue.

Let’s test with tacos that are de-structured in a sense that ingredients are outside of each taco.

Pretty close to be a taco
Definitely a taco
Definitely a taco

It appears that the shape of a taco alone may not be the determining factor either as you can see in the result above.

Also, it is debatable whether a group of ingredients positioned closely would quality as a taco.

How about we throw extras that resembles a taco in their appearance

Some similarity to a taco
Definitely not a taco

The tests above might be giving us a hint that texture might be playing a bigger role in determining the taco.

How sure are we about the prediction?

It’s one thing to say a photo is 99% taco, but how confident are we about the prediction itself?

After all, I only provided just 11 photos to make a judgement call on what is taco or not.

In the Performance tab, Custom Vision does provide you with their measure of Precision and Recall.

In general, higher Precision and Recall will result in more accurate prediction for this service.

In this case, the Precision is somewhat reliable for tacos but not to be trusted all the time.

How do we utilize the service beyond testing?

Custom Vision and like tools provides API endpoint where you can post an image to either train the data sets with user input or get prediction of given images.

This is why they are called Function as a Service, it’s just a function in cloud your app can tap into instead of extending existing back-end capability.

If you want to learn more about Custom Vision, head over to their website to start testing.

Verdict

Cloud based services like Custom Vision brings machine learning closer to us who are not data scientists.

Although it would still require knowledge and critical thinking skills from data scientists to judge accuracy of the algorithm, this could be a great way to prototype an integration with your app.


I’m a front-end developer lead at Fresh Consulting. I write about influential technical leadership (The Pragmatic Question) and latest technology experiments.

Follow me on Medium and Twitter