Hi friends,

I am starting a series of blog explaining concept of Machine Learning and Deep Learning or can say will provide short notes from following books. For this purpose I would be following few books namely:

1. Deep Learning By — Ian Good fellow and Yoshua Bengio and Aaron Courville (Link)

2. Machine Learning Probabilistic Perspective: By — Kevin Murphy

3. The Elements of Statistical Learning: By — Trevor Hastie, Robert Tibshirani and Jerome Fried


Today, Artificial intelligence(AI) is a thriving field with many practical applications and active research topics. The true challenge to artificial intelligence is to solve problems that human solve intuitively and by observing things like spoken accent and faces in an image.

The solution to the above problem is to allow computers to learn from experience and understand the world in terms of a hierarchy of concepts, with each concept defined in terms of its relation to simpler concepts. By gathering knowledge from experience, this approach avoids the need for human operators to formally specify all of the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones. If we draw a graph showing how these concepts are built on top of each other, the graph is deep, with many layers. For this reason, we call this approach to AI deep learning.

Several artificial intelligence projects have sought to hard-code knowledge about the world in formal languages. A computer can reason about statements in these formal languages automatically using logical inference rules. This is known as the knowledge base approach to artificial intelligence. None of these projects has led to a major success. One of the most famous such projects is Cyc (Lenat and Guha, 1989).

Now, this is not always possible to hard-core each feature in our machine. So the ability to acquire their own knowledge is necessary, can be gained by extracting patterns from raw data. This capability is known as machine learning. The performance of the simple machine learning algorithms depends heavily on the representation of the data they are given. Each piece of information included in the representation of our desired problem is known as a feature (Fig 1).

Fig 1. Representation of image as possible feature

Importance of features is very crucial, for instance take an example of human beings, we can easily perform arithmetic on Arabic numbers, but doing arithmetic on Roman numerals is much more time-consuming. It is not surprising that the choice of representation has an enormous effect on the performance of machine learning algorithms.

Fig 2. Importance of data representation: In first plot we have represented data in Cartesian coordinate and in second data has been represented in Polar coordinate. In second the task becomes simple to solve with a vertical line. (photo courtesy: Deep Learning Book)

Now to solve this problem we can use machine learning not only to discover mapping from representation to output but also representation itself. This is called as representation learning. Representation learning, i.e., learning representations of the data that make it easier to extract useful information when building classifiers or other predictors. In the case of probabilistic models, a good representation is often one that captures the posterior distribution of the underlying explanatory factors for the observed input(We will revisit this topic later in greater detail). Talking about representational learning the autoencoder are good example. An autoencoder is the combination of an encoder function that converts the input data into a different representation, and a decoder function that converts the new representation back into the original format.

Fig 3. Illustration of a deep learning model.

Of course, it can be very difficult to extract such high-level, abstract features from raw data. Many of representations, such as a speaker’s accent, can be identified only using sophisticated, nearly human-level understanding of the data. It is nearly as difficult to obtain a representation as to solve the original problem, representation learning does not, at first glance, seem to help us.

Deep learning solves this central problem in representation learning by introducing representations that are expressed in terms of other, simpler representations. Deep learning allows the computer to build complex concepts out of simpler concepts(Fig 3). There are two main ways of measuring the depth of a model (Fig 4).

  1. Number of sequential instructions that must be executed to evaluate the architecture.
  2. Depth of the graph describing how concepts are related to each other.

It is not always clear which of these two views — the depth of the computational graph, or the depth of the probabilistic modeling graph — is most relevant, and because different people choose different sets of smallest elements from which to construct their graphs, there is no single correct value for the depth of an architecture, just as there is no single correct value for the length of a computer program. Nor is there a consensus about how much depth a model requires to qualify as “deep.”

Fig 4. Illustration of computational graphs mapping an input to an output where each node performs an operation.

Coming on to the division of various type of learning Fig 5 will give you a great idea about the difference and similarity between them.

Fig 5. Flowcharts showing how the different parts of an AI system relate to each other within different AI disciplines. Shaded boxes indicate components that are able to learn from data. (photo courtesy: Deep Learning Book)

The earliest predecessors of modern deep learning were simple linear models. These models were designed to take a set of n input values x1, . . . , xn and associate them with an output y.These models would learn a set of weights w1, . . . , wn and compute their output f(x, w) =x1*w1+···+xn*wn. This first wave of neural networks research was known as cybernetics. In the 1950s, the perceptron (Rosenblatt, 1958, 1962) became the first model that could learn the weights defining the categories given examples of inputs from each category. The adaptive linear element(ADALINE), which dates from about the same time, simply returned the value of f(x) itself to predict a real number (Widrow and Hoff, 1960), and could also learn to predict these numbers from data.

Fig 6. Functions which can be predicted by linear models.

Models based on the f(x, w) used by the perceptron and ADALINE are called linear models. Linear models have many limitations. Most famously, they cannot learn theXOR function, where f([0,1], w) = 1 and f([1,0], w) = 1 but f([1,1], w) = 0 and f([0,0], w) = 0(Fig 7).

Fig 7. Cannot be solved by linear model.

This limitation of linear model has lead to more sophisticated techniques. Today deep learning is booming at a much higher rate than any before because of the following possible reasons:

  1. Increasing Dataset Sizes
  2. Increasing Model Sizes
  3. Increasing Accuracy, Complexity and Real-World Impact

Please provide your feedbacks, so that I can improve in further articles.

Thanks all

Articles in Sequence:

  2. Deep Learning: Basic Mathematics for Deep Learning
  3. Deep Learning: Feedforward Neural Network
  4. Back Propagation

One last thing…

If you liked this article, click the💚 below so other people will see it here on Medium.