Introduction to Deep Learning

Random Nerd
Deep Learning
Published in
4 min readNov 10, 2017
Buy the book from here.

Actual challenge to artificial intelligence was solving the tasks that are easy for people to perform but hard for people to describe formally. Solution is to allow computers to learn from experience and understand the world in terms of a hierarchy of concepts, with each concept defined through its relation to simpler concepts. By gathering knowledge from experience, this approach avoids the need for human operators to formally specify all the knowledge that the computer needs. The hierarchy of concepts enables the computer to learn complicated concepts by building them out of simpler ones. Hence. this approach is termed as Deep Learning.

A person’s everyday life requires huge amount of knowledge about the world. Much of this knowledge is subjective and intuitive, and therefore difficult to articulate in a formal way. Computers need to capture this knowledge in order to behave in an intelligent way. One of the key challenges in artificial intelligence is how to get this informal knowledge into a computer. People struggle to devise formal rules with enough complexity to accurately describe the everyday world. The difficulties faced by systems relying on hard-coded knowledge suggest that AI systems need ability to acquire their own knowledge, by extracting patterns from raw data. This capability is known as Machine Learning.

Introduction of machine learning enabled computers to tackle problems involving knowledge of the real world and make decisions that appear subjective. The performance of simple machine learning algorithms depend heavily on the representation of data they are fed. This dependence on representations is a general phenomenon that appears throughout computer science and even daily life. Many artificial intelligence tasks can be solved by designing the right set of features to extract for that task, then providing these features to a simple machine learning algorithm, however, it is difficult to know what features should be extracted.

One solution to this problem is to use machine learning to discover not only the mapping from representation to output but also the representation itself. This approach is known as Representation Learning. Learned representations often result in much better performance than can be obtained with hand-designed representations. They also enable AI systems to rapidly adapt to new tasks, with minimal human intervention. A representation learning algorithm can discover a good set of features for a simple task in minutes, or for a complex task in hours to months.

A brilliant example of a representation learning algorithm is an Autoencoder. An Autoencoder is the combination of an Encoder function, which converts input data into a different representation, and a Decoder function, which converts this new representation back into the original format. Autoencoders are trained to preserve as much information as possible when an input is run through the Encoder and then the Decoder, but they are also trained to make the new representation have various nice properties. Different kinds of Autoencoders aim to achieve different kinds of properties.

When designing features or algorithms for learning features, our goal is to separate the factors of variation that explain the observed data. These factors indicate separate influencing sources & are not combined by multiplication. Either they are unobserved objects/forces in the physical world that affect observable quantities or constructs in human mind providing simplified explanations or inferred causes of the observed data. They are concepts or abstractions that help us make sense of the rich variability in the data.

Deep learning solves this central problem in representation learning by introducing representations that are expressed in terms of other, simpler representations as it enables the computer to build complex concepts out of simpler concepts. Quintessential example of a deep learning model is the feed forward deep network, or multilayer perceptron (MLP). A multilayer perceptron is just a mathematical function formed by combining many simpler functions to map some set of input values to output values. Each application of a different mathematical function provides a new representation of the input.

Apart from learning right representation from data, another aspect is depth that enables computer to learn a multi-step program. Each layer of a representation is a state of the computer’s memory after simultaneously executing another set of instructions & that empowers networks with greater depth to execute more instructions in sequence. Later instructions can refer back to the results of prior instructions, so all the information in a layer’s activation don’t necessarily encode factors of variation that explain the input. Representation also stores state information that helps to execute a program that can make sense of the input and keep model processing organized.

Depth of a model can be viewed either based on number of sequential instructions (depth of Computational graph) OR based on correlation of concepts with each other (depth of Probabilistic modeling graph). Neither there is a single correct value for the depth of an architecture, nor is there a consensus about how much depth a model requires to qualify as ‘deep’. However, Deep Learning can be safely regarded as the study of models that involve a greater amount of composition of either learned functions or learned concepts than traditional machine learning does.

Edit: Thanks to Jeff Clune who reminded me that I haven’t credited original authors in here. This article is pretty much a subset of my learning from a recently (late 2017) published book Deep Learning (Adaptive Computation and Machine Learning series). This book has been one of the best investments that I have made recently and is highly recommended for beginner/professionals who wish to understand each and every concept of Deep Learning. It has been authored by few of the living legends in this domain, namely Yoshua Bengio, Ian Goodfellow and Aaron Courville; and published by MIT Press.

--

--