Deep Learning

Brief introduction and Recent developments

Dashanka Nadeeshan
AI Geeks
9 min readNov 29, 2017

--

Story banner

Introduction & Supervised learning

Throughout this article, I will be discussing what is deep learning in machine learning, deep learning techniques, types of deep learning (like Convolutional neural nets) and applications & recent achievements.

The basic deep learning paradigm

Machine learning techniques have become a popular tech tool in modern society. Starting from web-based applications like search engines, e-commerce, social media, government & administration, health and welfare, medical and genomes, consumer electronics like smartphones and industrial automation, in fields of computer vision, pattern & object recognition, probabilistic estimation, decision making, speech recognition & text generation, natural language processing and customer-product optimization.

Conventional machine learning techniques mostly depend on the ability to process data in their raw form (image pixel for example) and they require expert knowledge in designing feature extractors to transform raw data into representations in more abstract level. Basic Deep Learning architecture is constructed as a stack of multiple layers of learnable modules which can learn to map inputs and outputs that transform a representation from one level to a representation in a more abstract level. Throughout the layers of transformations, the complexity of learning depends on transformations that have been done in each layer. But these layers are not explicitly hand-engineered, they train themselves to improve representations using general-purpose learning procedures like backpropagation and training data. In present deep learning, methods have reached to a level of understanding and solving intricate structures and they function with high dimensional data successfully applying them on various domains of science, commerce, consumer products and various fields.

Supervised Learning is the most common and popular form of machine learning technique. The idea is to train the neural networks by using previously known data; After feeding training data to the network, while training it computes and compare the error between estimated output with desired output and modify the internal parameters accordingly (often called as weights) by means of reducing the error. These adjustable internal parameters and learning procedure can simply equivalent to a set of knobs that define the accuracy of the input-output function of a machine. In the beginning, only binary linear-classifiers were introduced. But with advancements in technology requires tasking with classifying by multiple complex score classes, learning complex features while insensitive to irrelevant variations of features and generalization. This implies the requirement of good feature extractors and powerful training.

Backpropagation to train multilayer architecture

Difference between shallow neural network and a deep neural network

Requirement and aim of replacing hand-engineered features with trainable multilayer networks succeeded with the introduction of stochastic gradient descent to train multilayer architecture. In order to compute the gradients of functions of inputs and weights, fast and successful algorithm introduced called the Backpropagation algorithm. Backpropagation works far faster than earlier approaches to learning and it has become the workhorse of learning in neural networks today.

From the basic idea of neural network architecture learning procedure, to transform data from input layer to output layer, it computes a weighted sum of inputs from the previous layer and passes it to next layer through non-linear function, here most commonly used function is Rectified Linear unit(ReLu). Backpropagation is a method to calculate the gradient of the cost function with respect to the weights in the neural network, where cost function gives the error between outputs of estimated and actual. Target is to minimizes the cost function (minimizing the error) by tweaking the weights (and bias) and optimize the network. This is also known as the gradient descent algorithm. Even though backpropagation is fast and good at learning, gradient descent drawbacks like it could converge into local minima which no small change would reduce the average error or may ending up with saddle points on plot landscape where gradient becomes zero.

Around 2006 CIFAR introduced an unsupervised learning procedure that could create layers of feature detectors without requiring labelled data for training. These feature detectors have the ability to model or reconstruct outputs from the previous layer, which is an effective way of training very deep neural networks by pre-training one hidden layer at a time using the unsupervised learning procedure for restricted Boltzmann machines. Recognizing handwritten digits and speech recognition successfully applied these methods with the help of fast computational power of GPUs. These speech recognition tasks stared with simple and small vocabulary to larger vocabulary tasks. Hence they deployed to android smartphone platforms for general use (with small datasets) and these unsupervised pre-training methods leads to better generalization.

Convolution Neural Networks and Image Understanding

Basic Convolution Neural Networks architecture, source: Internet

Convolution Neural Networks is a type of artificial neural networks which process data with a form of multiples arrays and perform recognition or classification tasks of such images(mostly), audio spectrograms and Video data. From this article, ConvNet architecture is briefly explained according to its structure and composed layers. Basically ConvNets composed of 4 types of layers: Convolutional, Pooling, ReLu (Rectified Linear units) and Fully connected layers. Most important layers and which discussed in this article are Convolutional layer and Pooling layer. Throughout (from the basic architecture) the alternatively (initial three layers) stacked layers and final fully connected of the network it transforms the original input (an image for example) layer by layer from the original raw data values to the final class scores.

Convolution layer organized as a feature map which performs activation when it slides over the input array (image pixel array for example) or input volume from previous layer and response when consisting filters of convolutional layer detects a feature. An image, for example, these filters will activate when it detects a visual feature like an edge (on some orientation) or blotch in some colour. In each feature map, local connections with the input volume are connected with a set of weights which tweak and optimize the network during the learning process. These learned weight parameters then use to computes the weighted sum of inputs and feed to the next layer through a non-linear activation function.

How convolution layer works. source: CS231n Stanford University

The role of Pooling layer is to shrink down high dimensional features representation while preserving its most important information. Typically pooling units computes the maximum of selected local patch unit and hence reduce the spatial dimensionality of output which the input volume to the next layer (mostly to another convolution layer). This is important for reducing parameters and helpful in managing computational load.

Generic Convolutional Neural network architecture, source: Internet

Training such a Neural network done using Backpropagation algorithm which by adjusting the weights in filter banks of convolutional layers. ConvNets perform classification tasks basically by learning higher-level features by composing lower level features through a set of layers. An image, for example, starting from edges (in image pixel level) in certain orientation in space, then their local combination to form somewhat pattern and so on to develop more complicated motifs like human faces, which up to a higher level of representation, through the stack of layers. It can find ConvNets application starting from 1990s such as handwriting recognition and face detection and recognition systems.

Apart from ConvNets spectacular achievements in ImageNet 2012, today it has approached human performance at some tasks and functions. One of the breakthroughs is accurate image captioning and describing by a combination of Convolutional neural nets and Recurrent neural nets. Furthermore, this success achieved because of increased computational power, GPUs in particular, regularization techniques like Dropout and new methods of generating more training examples from existing (like distortion, orientation and position changing). Major tech companies like Google, Facebook and Microsoft, as well as a number of startups formed and started research and development based on recently developed outstanding performance of convolution neural nets. Also, development of specialized hardware (called ConvNet chips) enables CovNets to perform well on applications such as real-time vision and understanding on smartphones, robotics, cameras and autonomous vehicles.

Recurrent Neural Networks and Language processing

The idea of Natural language processing, source: Internet

Discussing Language processing and understanding using deep learning; Next word prediction of a word sequence from the previously defined context of words can be performed using trained multilayer neural network, which is starting from the first layer to create word vectors and following layers to convert the input to output that predicts the probability for any word in the vocabulary to appear as the next word. The network learns word vectors that contain many active components each of which can be interpreted as a separate feature of the word.

Recurrent Neural Networks, another major type of neural networks which are successfully capable of processing sequential data. Recurrent neural nets process an input sequence one element at a time while its hidden units (loops) in them that allow information to be stored and carried across neurons while reading the input. Because of each neuron or unit ability to use its internal memory to maintain information about the previous input information, which important to keep an index of the history that all the past elements of the sequence (Like human reading a sentence). By the advance of the architecture and ways of training it had become recurrent neural nets very good at predicting next character at a text and more complex tasks like language translation and understanding, by training the network to encode the words of a sentence (processing one at a time) from one language and then develop the representation to express the meaning, finally, decode by jointly trained another language decoder network to words or sentence from another language. Google Translate is a popular application of above.

Unfolded illustration of a neuron from Recurrent neural network, source: Internet

Starting from translating one language to another it can translate meaning or description of an image to a readable language format, also known as image captioning. In this application, a convolutional neural network becomes the encoder network to read and understand the image, and a recurrent neural network becomes the decoder network for translation and language modelling.

In a recurrent neural network, all the layers share the same weights, noticeable when unfolding a simple network. Even though its main purpose of learning long term dependencies, it’s difficult to learn to store information in neurons for a longer duration. Augmenting the network with an explicit memory like long short-term memory (LSTM). These memories have proven much efficient than conventions networks comes with special hidden units which contain special memory cell as an accumulator, connected to itself at the next time step and weight of one. With time recurrent neural networks have advanced in many ways like augment RNNs with memory modules, Algorithms to applications like neural turing machines and networks which can output or express the abstract or meaning or answering the questions after reading and learning from a set of texts of sentences (paragraph).

Conclusion

Even though supervised learning achieved great success in the field of deep learning. Since the basic learning technique of living creatures share unsupervised learning techniques are mostly expected to be developed in longer term. Computer vision and recognition with Convolutional neural networks and their advances discussed, Recurrent neural networks with natural language processing and translation discussed while it is expected to develop Recurrent Nets to read and understand whole texts and develop their strategies and extended memory capabilities for more performance. Furthermore, develop a combined system of Convolutional nets and Recurrent nets that use reinforced learning and achieve outperforming systems from existing technologies are expected. By ending, it is also expected to improve and develop combining systems of deep learning and reinforced learning, which is successfully applied already.

--

--

Dashanka Nadeeshan
AI Geeks

Student at Hamburg University of Technology. Study Mechatronics, Robotics and Intelligent Systems. Visit me@ https://www.linkedin.com/in/dashankadesilva/