I am starting a series of blog explaining concept of Machine Learning and Deep Learning or can say will provide short notes from following books. For this purpose I would be following few books namely:
1. Deep Learning By — Ian Good fellow and Yoshua Bengio and Aaron Courville (Link)
2. Machine Learning Probabilistic Perspective: By — Kevin Murphy
3. The Elements of Statistical Learning: By — Trevor Hastie, Robert Tibshirani and Jerome Fried
Today, Artiﬁcial intelligence(AI) is a thriving ﬁeld with many practical applications and active research topics. The true challenge to artiﬁcial intelligence is to solve problems that human solve intuitively and by observing things like spoken accent and faces in an image.
The solution to the above problem is to allow computers to learn from experience and understand the world in terms of a hierarchy of concepts, with each concept deﬁned in terms of its relation to simpler concepts. By gathering knowledge from experience, this approach avoids the need for human operators to formally specify all of the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones. If we draw a graph showing how these concepts are built on top of each other, the graph is deep, with many layers. For this reason, we call this approach to AI deep learning.
Several artiﬁcial intelligence projects have sought to hard-code knowledge about the world in formal languages. A computer can reason about statements in these formal languages automatically using logical inference rules. This is known as the knowledge base approach to artiﬁcial intelligence. None of these projects has led to a major success. One of the most famous such projects is Cyc (Lenat and Guha, 1989).
Now, this is not always possible to hard-core each feature in our machine. So the ability to acquire their own knowledge is necessary, can be gained by extracting patterns from raw data. This capability is known as machine learning. The performance of the simple machine learning algorithms depends heavily on the representation of the data they are given. Each piece of information included in the representation of our desired problem is known as a feature (Fig 1).
Importance of features is very crucial, for instance take an example of human beings, we can easily perform arithmetic on Arabic numbers, but doing arithmetic on Roman numerals is much more time-consuming. It is not surprising that the choice of representation has an enormous eﬀect on the performance of machine learning algorithms.
Now to solve this problem we can use machine learning not only to discover mapping from representation to output but also representation itself. This is called as representation learning. Representation learning, i.e., learning representations of the data that make it easier to extract useful information when building classifiers or other predictors. In the case of probabilistic models, a good representation is often one that captures the posterior distribution of the underlying explanatory factors for the observed input(We will revisit this topic later in greater detail). Talking about representational learning the autoencoder are good example. An autoencoder is the combination of an encoder function that converts the input data into a diﬀerent representation, and a decoder function that converts the new representation back into the original format.
Of course, it can be very diﬃcult to extract such high-level, abstract features from raw data. Many of representations, such as a speaker’s accent, can be identiﬁed only using sophisticated, nearly human-level understanding of the data. It is nearly as diﬃcult to obtain a representation as to solve the original problem, representation learning does not, at ﬁrst glance, seem to help us.
Deep learning solves this central problem in representation learning by introducing representations that are expressed in terms of other, simpler representations. Deep learning allows the computer to build complex concepts out of simpler concepts(Fig 3). There are two main ways of measuring the depth of a model (Fig 4).
- Number of sequential instructions that must be executed to evaluate the architecture.
- Depth of the graph describing how concepts are related to each other.
It is not always clear which of these two views — the depth of the computational graph, or the depth of the probabilistic modeling graph — is most relevant, and because diﬀerent people choose diﬀerent sets of smallest elements from which to construct their graphs, there is no single correct value for the depth of an architecture, just as there is no single correct value for the length of a computer program. Nor is there a consensus about how much depth a model requires to qualify as “deep.”
Coming on to the division of various type of learning Fig 5 will give you a great idea about the difference and similarity between them.
The earliest predecessors of modern deep learning were simple linear models. These models were designed to take a set of n input values x1, . . . , xn and associate them with an output y.These models would learn a set of weights w1, . . . , wn and compute their output f(x, w) =x1*w1+···+xn*wn. This ﬁrst wave of neural networks research was known as cybernetics. In the 1950s, the perceptron (Rosenblatt, 1958, 1962) became the ﬁrst model that could learn the weights deﬁning the categories given examples of inputs from each category. The adaptive linear element(ADALINE), which dates from about the same time, simply returned the value of f(x) itself to predict a real number (Widrow and Hoﬀ, 1960), and could also learn to predict these numbers from data.
Models based on the f(x, w) used by the perceptron and ADALINE are called linear models. Linear models have many limitations. Most famously, they cannot learn theXOR function, where f([0,1], w) = 1 and f([1,0], w) = 1 but f([1,1], w) = 0 and f([0,0], w) = 0(Fig 7).
This limitation of linear model has lead to more sophisticated techniques. Today deep learning is booming at a much higher rate than any before because of the following possible reasons:
- Increasing Dataset Sizes
- Increasing Model Sizes
- Increasing Accuracy, Complexity and Real-World Impact
Please provide your feedbacks, so that I can improve in further articles.
Articles in Sequence:
One last thing…
If you liked this article, click the💚 below so other people will see it here on Medium.