Diving into AI Consciousness
We have all heard the terms neural networks, object detection and deep learning. And we see these networks transforming science and engineering magically. We have applications like face detection, object detection, self-driving cars etc. And these are all some amazing applications of deep learning. But why do these networks work the way they work? (Read till the very end to find out!)
In this blog post we’ll dive a bit deeper into the “AI consciousness”, to get some intuition behind how these networks work and what do they see. We’ll minus the intimidating math and will try to develop the understanding of their working intuitively.
But let’s refresh some basics first.
Here we have simple architectures of neural networks; One layer and two layer. The way this works: you have a huge training set of inputs and outputs, you train the network and it figures out the function to map inputs to outputs.
Here’s a convolutional neural network(CNN) which is a special neural network designed initially to deal with images. It comprises of two parts; feature learning and detection. In a CNN, we have filters instead of neurons and you learn features in images. Then you use these features to classify or detect objects in images.
As a result of this, you can detect people, Baseball, Horse, dog etc etc.
The decimal value indicates how confident the convolutional neural network is in the prediction it gave e.g it is 99.3 percent confident that the object on the far left inside the green bounding box is a car.
But how do these networks learn detecting such complex structures. What do they see when they look at an image. What do they see at each layer in the network?
This is what we’ll try to build some intuition on today. In the initial layers, for example the first layer, this is what the network sees in the image:
It sees Edges; slant edges, horizontal, vertical… All sorts of edges!
And as we go deeper into the network the layers see more complex structures. So starting from the first layer for example, it would learn simple features such as edges, in the next layer it’ll learn features that are made up of edges like shapes, in the following layer it’ll learn features made up of shapes e.g smaller components of bigger objects and as we go deeper it learns more and more complex structures which are made up of features from previous layers. The output of every layer, acts as input features for the next layer. And that is what we need to understand. That for every next layer, what the network learns are features that are made up of features(outputs) from the previous layer. And eventually we get to objects in the last layer.
Trying to get clear intuition, here’s an example of how to draw a bicycle
So what do you do? You first draw simpler things, like circles, and then you combine them to make a tyre, and then you combine these with lines to make fork, frames and paddles. You do the same for handle and seat, and then you combine these components (handle, seat etc) to make the cycle. So the bicycle is made of up of smaller objects each of which is made up of even smaller object and eventually they are made up of shapes and shapes are made up of edges. This is how the neural network learns features too. And this is how it uses features as building blocks in each layer to make more and more complex structures in every next layer.
Starting from smaller to bigger, here is another example of what each layer sees in a five layer network. Edges, then some textures, then smaller parts/components of objects and eventually objects and people.
In the upcoming blog posts we’ll dive a bit deeper into the ‘AI consciousness’ through Neural Style Transfer and Google Deep Dream. To make sure you don’t miss out, Subscribe!
For a comprehensive guide on CNNs, have a look at this amazing article by Sumit Saha: