How to fail at a Computer Vision project

Published in

The Startup

7 min readMay 15, 2020

Having an idea about a project is cool but at the end of the day we need to implement it.

Having an idea about a project is cool but at the end of the day we need to implement it and develop it into a prototype otherwise it just remains an idea in our head. In this article we’ll see how did I fail about implementing my idea and why did I fail at this Computer Vision project and what can we do to avoid these things in the future.

So Artificial Intelligence is all about making things easier for people, one fine day I had an idea of making an app for blind people to recognize currency notes using their smartphone’s camera. So I fired up my machine and started to research more about the same. After some time I thought this task can be performed by a simple classification model so I went on to the next stage i.e. to gather the data. I looked on many platforms (Kaggle, Github, Indian gov database) for Indian currency notes dataset but I couldn’t find it so I decided to create my dataset and after that as usual, any data science lifecycle, I chose a model and trained on it and as you must’ve guessed I failed terribly at it.

Let me tell you Data Scientist’ dirty secret: we very much LOVE TO DO “experimentation”, see, the little yellow square in the above diagram?

If this part takes more than 20% of your project effort, then believe me: you’ll fail. Why? Because starting without a careful problem description, without proper analysis that may uncover that Deep Learning is not a solution for this particular problem starting without organizing data collection, and deploying something half baked without properly communicating… Naah.

Here are the main mistakes I did:

Undervaluing the task

Creating my dataset
Choosing the wrong model
Choosing wrong evaluation metric

Undervaluing the task

This may be hard to believe, but the first model I tried with this project was support vector machine and believe it or not it is a very powerful algorithm and yeah I undervalued this task too much I didn’t think much then I just wanted to get on with it as soon as possible. Without proper research I begun this project hastily and the result was terrible. So after that I decided I needed to go with CNN.

Creating my dataset

After a long discussion with myself I decided to create the dataset using google open images, I wrote a python script to scrape the desired images from google open images dataset. The images acquired were not like a real camera would take but I thought it would do so I made around 9 classes (10, 20… various denominations and a background class) each class having around 150 images. Then I trained on model on this dataset as well and I failed again.

Speaking of data, there’s a magical formula in Data Science: garbage in, garbage out. If you don’t feed your model with data that represents what your algorithm will work on — you’re screwed.

You’ll get garbage if you feed your algorithm garbage.; Photo by Gary Chan on Unsplash

I thought it was a fault of the NN architecture so I also tried different architectures but it didn’t work. So finally I realized that the data is screwed up. Then I created a new dataset using my smartphone by collecting every note and then clicking their images in various lighting. Spoiler alert, It didn’t work either. You wanna make sure your Computer Vision project is a failure? Feed it with inaccurate data.

Choosing the wrong model

When I was training it for the first time using SVM, I thought that maybe the pattern is too complex for SVM so I decided to use a neural network classification model for the task, and although I was correct to some extent but it wasn’t enough.

I tried various sequential models training them from scratch, normalized my input but I couldn’t even get 20% training accuracy, I soon realized that this amount of data is too less for a CNN to work and that’s when I moved on to transfer learning. After trying various algorithms, VGG, ResNet, Inception, and fine-tuning their parameters, I chose resnet simply because it performed well on the training set.

Choosing the wrong path will never take you to your desired destination; Photo by Free To Use Sounds on Unsplash

Examples are countless. At best, your algorithm will be completely inaccurate. At worse, it will be biased and nobody would notice before harm would be done. Ok, want some examples?

Remember our autonomous cars? Algorithms are currently trained mostly in the US. Imagine you’re an Indian carmaker. Uh oh: suddenly, the left-hand drive seems to be a problem…
Air Sweden 294 crash in 2016: investigators were able to determine that the Air Data Inertial Reference Unit, or ADIRU — a device that tells the plane how it’s moving through space — had begun to send erroneous signals. But they couldn’t figure out why.

There are many such cases when just choosing the wrong algorithm cost the companies billions of dollars as collateral damage.

Choosing wrong evaluation metric

One more blunder I did was to choose the accuracy metric as an evaluation metric. You may have known that accuracy isn’t always the best evaluation criteria. For those of who didn’t get is let me explain it.

For example a client asked me to develop a model for handwritten digit recognition (Hello world of Deep Learning) and lets say I developed a model for a binary classification task (say 1 or not one, or choose anyone digit) . So, I created a dummy model that just outputs not one for every input. Lets say there are 100 examples for each of 10 classes, so predicting not 1 for every input will give me somewhat 90% accuracy.
So now you know that why choosing accuracy as an evaluation metric isn’t always the wise thing to do.

Instead of accuracy you can use many other evaluation metrics out there.

The final call

After some days I was browsing on google and I found a service offered by google Teachable machine then after getting on with it I realized that we can create a model on the teachable machine by just uploading the training data in the classes and then we can also download the model in h5 file format and can use it accordingly.

Now before you go all judging me let me tell you that I had no intention of using that model as it was. I just wanted to see its architecture and then reverse engineer new custom architecture myself from scratch. I just wanted to see what was I doing wrong.

Now I could’ve analyzed that architecture in the traditional format using model.summary()but I am too lazy for that stuff so I searched for a way to visualize a neural network that’s’ when I came across this awesome app Netron. Using this app you can visualize a trained neural network model in a flow chart diagram and you can also analyze its properties by selecting each layer.

So this was the final nail in the coffin, if this didn’t work I was done with a simple classification model for this project. I mean if engineers at google (with their unlimited resources) couldn’t do it I certainly couldn’t. I tested the model and once again the results were again terrible.

As you can see from the above image the model was always predicting background class it simply wasn’t able to differentiate between two different currency-notes and just to be clear it was a 60 layered architecture. I mean if this level of complexity couldn’t do it then, either I am reading the problem statement wrong or choosing the wrong technique altogether.

Conclusion…

My final piece of advice would be to take as much time as you need in understanding and researching the problem statement and to not to be stubborn about a particular technique, there are a lot of other algorithms on the internet, just get out there and try a few to see what works for you best.

And one more thing as of now, I am not gonna abandon this project I am now thinking of another approach:
Using object detection most probably yolo v3 to detect a particular area and then feeding it to a digit classifier.

Let’s save it for my next post. There are a lot of other algorithms that can be used. What’s your opinion about it? Let’s discuss!