Statistical Learning Theory Part 2
Approaches to the Learning Problem Part 0
Recap
In the first part of the Statistical Learning Theory series, we gave an introduction to the Statistical Learning Theory. We discussed the various contributions made in the subject by eminent scientists over a considerable period of time, which ultimately led to development of the subject of theoretical statistics as we know it today. This subsequently acted as a catalyst in the gradual development of other fields of study like Machine Learning, Data Science, etc.
Approaches to the Learning Problem
The First Approach
The first approach is a problem of minimizing a Risk functional on the basis of empirical data. The quality of the approximating function that is chosen from a set of functions is evaluated by the Risk functional.
The Second Approach
The second approach requires the solution of integral equations to estimate stochastic dependencies in situations where some elements of the equations are known only approximately.
General Model of Learning from examples
Model of Learning from examples consists of three elements :
- Generator of the data G, which acts as the source of the situations that determines the environment in which the next two elements act. Generator G generates independent and identically distributed (i.i.d) vectors x∈ X, according to some distribution F(x).
- Supervisor S, which takes x as input and returns the target output values y. The supervisor is the target operator, which is unknown but we are sure that it exists.
- Learning Machine LM, which observes pairs (x₁,y₁),…,(xₗ,yₗ) (the training set) and constructs some operator which is used to predict the target values yᵢ for every input vector xᵢ. The goal of the LM if to learn an approximation of the target operator S, from the data generated by G.
The supervisor S returns the target values y on a vector x according to the distribution F(y|x). Thus the Learning Machine LM, observes and independently and identically distributed training data set drawn according to the joint distribution F(x,y) = F(y|x) F(x).
For the construction of the approximation of the Target Operator S the Learning Machine LM pursues two goals:
- Imitation of the Target Operator S.
- Identification of the Target Operator S.
The above two goals may seem similar, but there is a fine line between them. Imitating the target operator means only to achieve the best results in the prediction of the supervisor’s outputs based on the environment generated by the Generator G. Identification of the Target Operator requires the construction of a good approximation of the Target Operator in a certain metric. This consequently ensures good prediction results of the Supervisor’s outputs. These two goals imply the two approaches to the learning problem.
The problem of imitation leads to the development of a nonasymptotic theory. Whereas, the problem of identification leads, also called ill-posed problems, leads to the development of an asymptotic theory.
Constructing an operator means that the construction of the learning machine enables it to implement a set of functions and chooses from this set an appropriate approximate function. So, the learning process is essentially a process of choosing an appropriate function from a given set of functions.
In the next part, we will continue the discussion of the learning problem with the problem of Imitation.
Please clap if you liked the post or feel that it will be helpful to other learners!!
Reference
Statistical Learning Theory by Vladimir N. Vapnik