Deep Learning and Sample Applications
Lately one of the most popular topics in artificial intelligence is deep learning and every day it is becoming more popular. But what is this deep learning? Why did it become so popular suddenly? Let’s discuss this.
To understand the whole story we might need to start from the beginning. Neural networks, started to develop at 1950’s which is not long after AI researches had begun. Neural networks seemed interesting because they were simulating the human brain in a simple way. Briefly a neural network is a set of virtual neurons and assigning arbitrary values as known as “weights” to connections between neurons. These weights determine how each neuron responds. Generally each neuron outputs 0 or 1.
It is possible to train a network for image recognition or speech recognition. If network fails to recognize, the algorithm will adjust the weights according to get better recognition. The ultimate goal is consistently to recognize significant patterns.
But neural networks were not very successful of recognizing complex patterns with low amount of neurons back at 1970s.
Later in the middle 1980s the idea of multiple layer networks came up. Please see the article Geoffrey Hinton published at 1985. Having many layers of neurons, so-called “deep” models had started to made impact. But they were lacking a hardware power back then.
During 2006 Hinton developed a better and more efficient way to teach individual layers of neurons. First layer learns very basic features, like an edge of and image. After first layer recognizes basic features, they are fed to next layer which learns more complex features. And the procedure continues until system can reliably recognize objects.
A deep neural network (DNN) is an artificial neural network (ANN) with multiple layers between the input and output layers. The DNN finds the correct mathematical manipulation to turn the input into the output, whether it be a linear relationship or a non-linear relationship.
Convolutional Neural Networks
Convolutional neural networks or ConvNets or CNN’s, are deep networks that are commonly applied to classify images. ConvNets are very similar to regular neural networks that we saw up below. They have neurons that learns from weights. Each neuron receives some inputs, performs a dot product and optionally follows it with a non-linearity.
So what does change? ConvNet architectures make the explicit assumption that the inputs are images, which allows us to encode certain properties into the architecture. These then make the forward function more efficient to implement and vastly reduce the amount of parameters in the network.
So the main idea behind classifying images is taking an image and representing the image with pixels of that image as a 2D matrix. For example hand written number of 8 might look like this 2D matrix.
In CNN instead of taking each pixel in raw format like the picture below we can examine image in some regions. Like the upper right corner of that 8. This allows the computer understand the features like curves or edges.
If you look at the picture you can realize how ConvNets detects for example a car inside of a picture. Notice that each layer above has different filter. Like upper part of a car or the body or wheels.
Today ConvNets are broadly used in real life applications. Let’s take a look at some examples.
Some Applications Used By ConvNets
1- Self Driving Cars
ConvNets are at crucial point when it comes to self driving cars. While car is moving, real time detection of objects, cars, pedestrians, traffic signs and more are done by CNN’s. So basically CNN’s are used widely in self driving cars industry.
2- Face Recognition
In face recognition ConvNets handles many problems such as identifying the faces in a picture and locating them, and extracting unique features. It hence compare those features to existing database of features and finds the matches.
3- Scene Labeling
According to the paper of “Learning Hierarchical Features for Scene Labeling” written by Clement Farabet, Camille Couprie, Laurent Najman, Yann LeCun:
Scene labeling also known as a scene parsing, which consists in labeling every pixel in the image with the category of the object it belongs to. After a perfect scene parsing, every region and every object is delineated and tagged.
4- Image Classification
Image classification is a problem of defined set of target classes (objects to identify in images), and trains a model to recognize them using labeled sample photos.
5- Action Recognition
From the paper of Review of “Action Recognition and Detection Methods” by Soo Min Kang and Richard P. Wildes:
Action recognition refers to the act of classifying an action that is present in a given video and action detection involves locating actions of interest in space and/or time.
6- Human Pose Estimation
Human pose estimation is a computer vision technique that detects human poses in photos or videos. According to paper called “ PoseTrack: A Benchmark for Human Pose Estimation and Tracking” written by Mykhaylo Andriluka, Umar Iqbal, Eldar Insafutdinov, Leonid Pishchulin, Anton Milan, Juergen Gall, Bernt Schiele:
Human poses and motions are important cues for analysis of videos with people and there is strong evidence that representations based on body pose are highly effective for a variety of tasks such as activity recognition, content retrieval and social signal processing.
7- Document Analysis
Document analysis is a process of extracting information from the image of documentation. For example we have seen the analysis of hand written number “8” above. This technique is also used for detecting many features of documents such as: Language, context, category etc.
Recurrent Neural Networks
Recurrent Neural Networks as known as RNN’s are very useful if your data has a sequence. Sequence here can be for example time series of data or a conversation text between two person. But why are RNNs are better for this kind of data, what is the difference?
To understand the difference, let’s think one example of sentence generation. Let’s say we want to create an application that generates sentences. In order to generate sentences we need to output multiple words that are correlated with each other. CNN’s generally perform poorly in these kind of tasks. That is where RNN’s shine. RNN’s can use their internal memory to process sequences of inputs. Hence RNNs can use words it has seen before to generate new words so that words can be correlated to each other.
When you look at the diagram above we can see that RNN’s are using ‘old state’ from it’s internal memory to generate ‘new state’. That allows RNNs to use older states so that new states are not completely independent from old states.
Long Short Term Memory (LSTM) Neural Networks
LSTM networks are very special kind of recurrent neural networks. Generally works much better than the standard one. A common LSTM unit consists a cell, input gate, output gate and a forget gate. The cell remembers values for an arbitrary time and rest of units regulate the flow of information into and out of the cell.
Some Applications Used By Recurrent Neural Networks
1- Language Modeling
According to Wikipedia:
2- Translation
This is one of the most popular applications that RNNs are being used. Some examples are Google Translate or many other translators. This type of applications generally consists two recurrent neural networks in it. One for the processing input text called “encoder” and the other one for generating output from first network’s result and called as “decoder”.
3- Speech Recognition
The problem here is you get an audio clip as input and your goal is to generate the text version of that audio clip. Today speech recognition is used by: In-car systems, medical documentation, military, telephony and other domains, people with disabilities and many more.
Deep Reinforcement Learning
This algorithms work reward based. For example when algorithm does wrong move in a game you penalize the algorithm and when it does the correct or right move you give a reward. Reinforcement algorithms can be so good that they can reach a super human level. For example they can beat world champions at game of Go.
Definitions of Reinforcement Learning
Agent: An agent is the action taker. For example the player in the Go.
Action: Move that agent makes. Generally list of possibilities an agent can make.
Environment: The world that agent lives in. Environment takes the agent’s current state and action as input and outputs the agents next state and reward.
State: A state is the situation of an agent.
Reward: A reward is the feedback of the environment to the agent.
Policy: The policy is the strategy of the agent. Agent uses policy to decide next action based on current state.
Discount Factor: The discount factor is multiplied with the future rewards. It usually is smaller than 1.
Value: The value is expected long-term reward with discount.
Trajectory: List of states and actions that influence those states.
Some Applications Used By Deep Reinforcement Learning
1- Robotics and Industrial Automation
RL is used for high dimensional control problems. In industrial robotics to build products RL is being widely used.
2- Games
Reinforcement learning shines when it comes to game playing agents. You might have heard that OpenAI bots played Dota 2 with real players recently. You can watch those games at this link. But this was not the beginning of course. Let’s go back a little bit.
This is the abstract of paper named ‘Playing Atari with Deep Reinforcement Learning’ by Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller from DeepMind Technologies.
We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. We apply our method to seven Atari 2600 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm. We find that it outperforms all previous approaches on six of the games and surpasses a human expert on three of them.
References
1- https://www.technologyreview.com/s/513696/deep-learning/
2- https://en.wikipedia.org/wiki/Deep_learning
3-http://cs231n.github.io/convolutional-networks/
4- Farabet, Clement, et al. “Learning hierarchical features for scene labeling.” IEEE transactions on pattern analysis and machine intelligence 35.8 (2013): 915–1929. http://yann.lecun.com/exdb/publis/pdf/farabet-pami-13.pdf
5- Kang, Soo Min “Review of Action Recognition and Detection Methods” arXiv:1610.06906v2 [cs.CV] 1 Nov 2016. https://arxiv.org/pdf/1610.06906.pdf
6- Mykhaylo Andriluka, Umar Iqbal, Eldar Insafutdinov, Leonid Pishchulin, Anton Milan, Juergen Gall, Bernt Schiele “PoseTrack: A Benchmark for Human Pose Estimation and Tracking” arXiv:1710.10000
7- https://skymind.ai/wiki/deep-reinforcement-learning
8- https://www.oreilly.com/ideas/practical-applications-of-reinforcement-learning-in-industry
9- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. In NIPS Deep Learning Workshop. 2013. https://arxiv.org/pdf/1312.5602v1.pdf