4. Introduction to Computer Vision with Deep Learning: Datasets and DL libraries

Inside AI
Deep-Learning-For-Computer-Vision
4 min readJul 14, 2019

Written by Nilesh Singh & Praveen Kumar.

Prerequisite- In case you missed the first session, please go ahead to the previous session on kernels & Channels.

Let’s begin…

Now that we have a faint idea as to what a kernel and a channel is (trust me, that’s all the idea that you are ever going to get), the next logical thing to do is to look at other components that goes into making a deep learning model.

Channels and kernels are conceptually and theoretically very important, but practically we do not decide upon them until a very later stage. The first thing that we need to do while building a DL model is to look at our data. Working in the Computer vision domain, there’s a good chance that you’ll spend 80% of your time manipulating the data and maybe hurling random abuses every once in a while. Once the data is how you envisioned it to be or more practically you just ran out of energy to spend on it, the next logical thing to do is to choose a framework that you are going to use for implementing the model. Choosing a library to implement your model is a relatively easier task and in all probabilities, you’ll stick with the one framework that you are comfortable working on.

NOTE: We are starting a new telegram group to tackle all the questions and any sort of queries. You can openly discuss concepts with other participants and get more insights and this will be more helpful as we move further down the publication. (Telegram is preferred over Whatsapp because of group member constraints)[Follow this LINK to join]

So, let’s talk about data first.

1. Data

In the computer vision field, the data is unsurprisingly just visual data (with a bunch of numbers), either images or videos. We’ll be focusing solely on images here.

Just for the sake of it, let’s try to define an image.

Image is basically just a matrix of pixels with each pixel having a fixed illumination color and intensity. Our computer perceives the image as a matrix of numbers, we build our algorithms around this very fact.

But only images will not make much sense to our model.

Look at the following image,

Arc Reactor

We know that it is an arc reactor, let’s say that your friend didn’t know that. What will you do? Following an hour-long lecture on why they should watch all MCU movies, you’ll most probably tell them that it is an arc reactor.

Now, next time you show an image of an arc reactor to your friend, they’ll instantly identify it.

Let’s suppose that we show the same image to our computer, in order for it to be able to identify any future arc reactors shown to it, there should be someone who tells it initially that it is looking at an arc reactor.

This mapping between images and their classes is of utmost importance, and together with images, it is called a dataset. The mapping can be in text files, json files or in a bunch of other formats.

It can contain other information about the exact location of an arc reactor, or whether multiple reactors are present and so on, but let’s defer that reaction for a future article.

Initially, we’ll be working on a very basic dataset called MNIST. It is a very simple dataset which contains handwritten digits.

Here’s how the images in MNIST looks like:

MNIST Dataset

2. Libraries

Once we have the data, we need to decide upon which library we are going to use for our model. There are a ton of libraries that are available for you to use, the choice of which also depends upon the language that you are going to use. We’ll be sticking with python for our learning here, here is a list of frameworks and libraries which support python.

§ Caffe

§ Theano

§ Tensorflow

§ Keras

§ Pytorch

§ mxnet

Well, if you can write models in Theano and Tensorflow then you probably don’t need to be here. Caffe is not a native python library but does provide bindings to python. It’s extremely fast and very powerful but sadly not very beginner friendly.

We’ll be sticking with Keras in this series, it’s powerful, it’s intuitive and above all, it’s very simple to code in. Keras works on top of Tensorflow or Theano allowing you to wield their full power, it’s modular and it works natively with python. Pytorch is also a great option and is becoming increasingly popular.

You can read more about them here: LINK

We conclude this article here, hopefully, it was a quick read. We’ll learn about convolutions in the next article and try to get some hands on.

Hope you enjoyed it. See you soon…

--

--

Inside AI
Deep-Learning-For-Computer-Vision

We write about NLP, Speech Recognition, Computer Vision, Kaggle, and Data Science Competitions.