Image for post
Image for post

What the fruit!

Jeroen Rietveld
Nov 1, 2018 · 6 min read

Some time ago, I was standing at the fruit basket with my colleague Casper, when we noticed a rather odd looking fruit. We asked each other the question: “What the * is that?!”.

Neither of us could answer it, leaving us hopeless and desperate. You need to know what you eat, right? At least we thought so. We figured the only logical approach would be to build an app. An app that allows you to point the camera of your phone at a fruit, and tells you what kind of fruit it is!

WTFruit was born.

The goal of this article

I discussed this topic during our . The problem we’re trying to solve is an image recognition problem. An incredibly easy problem for us humans, however a whole lot more — much more so until (somewhat) recently — difficult for computers. In this blog post I want to touch on some learnings we had when building WTFruit. I won’t go in detail on the ‘how’, but will refer to great articles that give more in-depth information on that matter. I will include a few lines of code, but if you’re not a programmer don’t worry if you don’t understand them.

Computers have advanced so much that image recognition is now a solvable problem using Machine Learning. Not only that, it’s also a very hot topic! Searching Google on ‘image recognition’ results in tons of interesting reading material. So, if you want to dive deeper into Neural Networks, or how to approach Image Recognition (with ML) check out and of the Machine Learning is Fun articles. Otherwise no worries, keep on reading!

Getting started

What we need to do first is finding a dataset of images with fruits. We found one on . We download the repo to our Jupyter environment and do some basic preprocessing (resizing images to fit our Neural Network and normalising the values).

Now that we have our data we choose a model. We start out with a very basic one:

Image for post
Image for post
Basic CNN

What we have is a deep neural network, with 2 convolutional layers, and 3 fully connected layers, resulting in an X number of outputs. The number of outputs depends on how many classes we have. In our case we have 81 classes, which is the number of different types of fruits that are in the dataset.

We start training, and get nice results:

Image for post
Image for post

The first challenge

However, we run into a problem. Since our dataset only has images with white backgrounds, the network “assumes” that the background is part of the object (fruit) it is trying to learn to recognise. We have a feeling this will be an issue, and verify that by replacing the background of one of the fruits in the test set to black.

Image for post
Image for post

As you can see the result is terrible, the image clearly shows a banana but the network thinks its a cherry! The ‘normal’ solution to this problem would be to find images of fruit that are in context. Think about images where people are holding the fruit, or they’re laying on a surface. We could create this dataset ourselves, by taking a lot picture of fruits, or we can try to find them online. However, we want to try a different method. What will happen if we just manipulate the white background?

Our little experiment

Our reasoning was as follows: if we can change the background color so there is no correlation between the backgrounds in different images, the network may be able to “regard” this data as irrelevant. We create a simple script that replaces the white background with random noise, which looks like this:

Image for post
Image for post
Banana’s with random noise backgrounds

After we create our modified training set, we feed the images to the network and wait for it to finish training. Once it’s finished we test our network. The test set still contains images with a white background.

The results? Terrible! We get an accuracy of 24%…

Now, it’s difficult to reason why the results are this bad. However, we can make quite a good guess when looking at the images. Even though the background is random, they still look very similar! So, presumably, this isn’t ‘different’ enough.

It’s time for approach 2: making it more different!

Image for post
Image for post
Banana’s with random solid color background

We decide to just replace the backgrounds with a single random color.
We now get an accuracy of 42%, which is already better. If we would have more (similar) training data, we could likely get a higher accuracy. This shows the importance of having a lot of data!

An even more important learning we can take from this case is the importance of having data that represents the problem you’re trying to solve. If we just want to recognise fruits on a white background [and nicely cropped] this dataset would suffice. However, in our case we want to point the camera of our phone at whichever surface and still be able to recognise the fruit. Your data should reflect your scenario.

Main take-aways

My goal was to give you an idea of how we started the machine learning process for the given problem. If you’re new to this matter, I hope you could get a grasp of what it takes to process data and of the experiment we did by giving the fruit random background colors. Please note that this is just the first process of creating an actual app, which I left untouched in this blog post.

At the moment we’re training more and more data on our servers to improve the recognition process. As this is a project we’re doing outside work hours, creating the app will take a while. When we’ve actually built it, I’ll write another blog post going more in detail in that process!

Summarising our experience in one sentence: you’ll always need more and better data! We may even go as far as saying: data is more important than the network!

For the developers amongst us

We wrote our ML script in Python and used the library . There are a lot of nice Python libraries out there. Our choice wasn’t for any specific reason, except that we had used it before. On our local server we installed (created by ), which gives you a nice environment for interactive programming. I highly suggest checking it out.

After our , an attendee pointed me towards Google’s . You can get a free Jupyter notebook without having to do any setup yourself, and it includes a GPU!


Join the Label A team!

Interested in other cool things we do? Do you want to keep developing yourself and the products you make? You might be a good fit for ! Check out our vacancies .

Label A

Label A develops intuitive and sexy apps, websites and…

Jeroen Rietveld

Written by

Label A

Label A

Label A develops intuitive and sexy apps, websites and online platforms. Dummy-proof and high-tech, with a focus on mobile and cloud technologies. We have everything in- house to be able to design, develop and support web and mobile applications. Visit us at www.labela.nl.

Jeroen Rietveld

Written by

Label A

Label A

Label A develops intuitive and sexy apps, websites and online platforms. Dummy-proof and high-tech, with a focus on mobile and cloud technologies. We have everything in- house to be able to design, develop and support web and mobile applications. Visit us at www.labela.nl.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store