Optimization Algorithms for Machine Learning
Chapter-1: Introduction
This is the first chapter in the series on Optimization Algorithms. In this series, we will look into different Optimization concepts ranging from convex programming problems to geometric and dynamic programming problems.
In this chapter, we will look into:
- What is optimization?
- Preliminary concepts required to understand the mathematical form of an optimization problem
- General mathematical form of an optimization problem
- Applications of optimization problems
So, what do we mean by ‘optimization’? The term ‘optimization’ simply means to find the best results of a particular problem — to optimize a problem. “Best result” could either mean finding the minimum value of a problem or the maximum value of a problem, depending on the kind of problem that we are dealing with. For example, say you are at the top of a hill and your goal is to minimize your potential energy. Potential energy is given by: PE = m*g*h, where ‘m’ is mass, ‘g’ is acceleration due to gravity and ‘h’ is height. So, what is the solution to this problem? Go down the hill of course and find the point where the height is minimum (since ‘m’ and ‘g’ are constant). This is a case in which we would be able to find the optimal solution to a problem by minimizing it. Similarly, there may be problems in which we may have to find the maximum value of the problem in order to optimize it. So in a nutshell, ‘optimization’ means “to find the best value”.
Let us now look at some preliminary concepts required to understand the general mathematical form of an optimization problem.
CONCEPT OF VECTORS AND VECTOR ELEMENTS:
So, what is a vector? A vector can be thought to be a point in a n-dimensional space. So, if n=1, a vector represents a point in a line. If n=2, a vector represents a point in a plane. If n=3, a vector is a point in a 3-D space. So, for higher values of ‘n’, a vector would be a point in a higher dimension space. N-dimensional space is represented by:
The following diagrams show the vectors in different dimension space:
There are many ways to represent a vector. However in our case, we will need to know only 2 ways of representing vectors. These are listed and detailed below :
- Polar vector notation: In this notation, a vector is represented using the length of the vector and the angle that the vector makes with the positive x-axis, measured in an anti-clockwise direction. Below is an example that shows the vector in a red dot and the polar representation of the vector. ‘r’ is the length of the vector, that is the distance between the red point and the origin. ‘Theta’ is the angle between the x-axis and the vector, representing direction of the vector.
The above vector, represented by the red dot, will be represented as:
2. Cartesian vector notation: In this notation, a vector is represented by its rectangular co-ordinates. This is the notation that will mostly be required in our discussion further. As can be seen below, the vector Q is represented by its co-ordinates in the x-y plane. x and y are called the elements/components of the vector.
Note that the number of elements in a vector depends on the dimension of the space that it lies in. If the vector lies in 2-D space, it will have 2 elements as shown in the figure above. If the vector lies in 3-D space, it will have 3 elements, and so on.
The general mathematical form of an optimization problem is given by:
Note that an optimization problem can also be given in the form of maximize f(x). In such a case, the problem needs to be converted into a minimization problem by using a ‘-’ sign in front of f(x) and by changing the inequality constraint accordingly to fit the general form shown above. The above general form basically means to find the values of the ‘n’ elements of vector x for which the value of the function has the most minimal value. The value of the n-elements thus found would be the best values of the elements of the vector x. The domain of the optimization problem is defined by the constraints mentioned in the problem, that is by g(x) and h(x), along with the domain of the objective function. We will look into more of this concept later when we discuss mathematical problems. Another point to note is that the both the constraints may or may not be present. Such optimization problems are called unconstrained optimization problems, equality constrained optimization problems and so on. We will delve in these concepts later. The values x1, x2,…xN are nothing but the elements of different points in a N-dimensional space.
Now, lets look at some practical examples where concepts of optimization can be used.
- Stock portfolio optimization: Lets assume we want to invest in stocks belonging to 3 organizations, viz. Google LLC, Apple Inc., and in Microsoft Corporation. Say x1 represents the number of Google LLC stocks to be purchased, x2 represents the number of Apple Inc. stocks to be purchased and x3 represents the same for Microsoft Corporation. The objective function can represent “a measure” of overall risk in the portfolio, which needs to be minimized and is a function of x1, x2 and x3. Hence, X=[x1, x2, x3] and the objective function can be represented by f(x1, x2, x3). The inequality constraints, among many other things, could represent limits on the investment amount on Google and Microsoft stocks, which would obviously be a function of x1 and x2. Similarly, the equality constraints could represent the exact value number of stocks that can be bought for Apple Inc. As, one can now see, the equality and the inequality constraints together put restrictions on the domain of the original problem. For example, say the the measure of the risk in the portfolio is given by the function ax1+bx2+cx3 = 10 and this function needs to be minimized. Also the inequality constraints and equality constraints could be yx1+zx3<500 and x2=20 respectively. These constraints, though not present in the standard form, put restrictions on the values of x1, x2 and x3 and thus, contribute in defining the domain of the objective function.
- Distribution parameter value selection: For someone who is acquainted with statistical modelling, in a task where we want to find the best model, from a family of potential models, that fits the training data and the prior information (on the distribution of the training data) in the best possible way, the variables x1, x2,…,xN would represent the parameters of the statistical distribution of the training data (such as mean and variance in case of a normal distribution) and the objective function would be the parameter’s probability distribution function, which we are trying to maximize.
- Optimizing cost function in ML: In case of Machine Learning, the objective function could represent the cost function of the model that needs to be minimized. In a basic sense, cost function represents the error between the actual labels of the training data and the corresponding predictions. So, here is a simple question for you:
In linear regression, the cost function of which is given by the formula below, what elements does the X-vector in the objective function have, if you relate the cost function with the objective function in optimization? The cost function of linear regression is given by:
The answer to this question is given at the end of Chapter-2. Check it out to find the answer.
Thus, we can clearly see the manner in which optimization concepts will help us in the field of Data Analytics and Machine Learning. In a way, optimization concepts are used to improve the performance of engine of Machine Learning from the inside.
Ready to rock now?!!
The link to chapter 2: Convex Sets and Functions is here. Check it out!!