State Of Computer Vision

Dharti Dhami
3 min readNov 25, 2018

--

If you look across a broad spectrum of machine learning problems, you see on average that when you have a lot of data you tend to find people getting away with using simpler algorithms as well as less hand-engineering.

When we look at machine learning applications the learning algorithm has two sources of knowledge.

  1. The labeled data
  2. Hand-engineering.

You can carefully hand design the features or hand design the network architectures or maybe other components of your system.

When you don’t have much labeled data you just have to resort to more on hand-engineering. For computer vision, even though data sets are getting bigger and bigger, often we just don’t have as much data as we need. And this is why computer vision historically and even today has relied more on hand-engineering and developed rather complex network architectures.

Fortunately, one thing that helps a lot when you have little data is transfer learning.

Computer vision researchers there is a lot of attention on doing well on benchmarks. But you also see in the papers people do things that allow you to do well on a benchmark, but that you wouldn’t really use in a production or a system that you deploy in an actual application.

A few tips on doing well on benchmarks.

  1. Ensembling.

And what that means is, after you’ve figured out what neural network you want, train several neural networks independently and average their outputs.

So, initialize say 3, or 5, or 7 neural networks randomly and train up all of these neural networks, and then average their outputs. And this will cause you to do maybe 1% better, or 2% better. This slows down your running time by a factor of 3 to 15, or sometimes even more. And so ensembling is one of those tips that people use doing well in benchmarks and for winning competitions. But is almost never use in production to serve actual customers unless you have huge computational budget and don’t mind burning a lot more of it per customer image.

2. Another thing you see in papers that really helps on benchmarks, is multi-crop at test time. So, if you have the computational budget you could do this.

Let’s take a look at two important applications of CNN:

Object Detection and localization

Face Recognition/Detection

--

--

Dharti Dhami

Mom, Tech Enthusiast, Engineering lead @Youtube Music.