Optimization Algorithms for Machine Learning
Chapter-3: Some Important Convex Sets
The link to “Chapter-2: Convex Sets and Functions” is here. Chapter 3 is about the different kinds of convex sets that are used in the world of optimization. These sets become very important because they give us an idea on how to think about and handle problems in n-dimensional space.
In this chapter, we will look into:
- Some more vector concepts
- Hyperplane
- Half-space
- Separating and supporting hyperplane theorem
- Norm ball
- Polyhedron
In the end, we will have a look at the questions that were asked in Chapter-2. The chapter is a continuation of the divergence from the optimization track that we took in chapter 2. All these topics will get connected to optimization in later chapters.
Some symbols that will be used in the chapter are: ∀ meaning “for all”, ⊆ meaning “subset of or equal to”, ∈ meaning “belongs to”, ∩ meaning ‘intersection’, ∥ ∥ meaning ‘magnitude’
So, lets start with dot product of vectors. Vector multiplication are of two type, viz. dot product and cross product. We wont really need cross product of vectors here. So we wont get into it. But only to have an idea about it, cross product of vectors focuses on the multiplication of the magnitude of the vectors and the vector direction components separately. On the other hand, dot product of vectors (also called scalar product) takes into consideration the scalar components (length) of the vectors along with the angle between them. The mathematical representation of dot product is given by:
Unit vector is a vector whose length is considered 1. So, if you divide a vector by the magnitude of the same vector, the resultant vector is a unit vector. That is as shown below:
Now, lets have a look at the dot product using the matrix form of the vectors.
Another important point to note is that the magnitude of a vector, ∥a∥(say) can be found using the formula shown below. Remember that ∥a∥ is also called ‘norm’ of vector a.
Now lets see what a position vector is. This is simple! Position vector is a vector to a particular point in space from the origin of the system. These vectors are not movable and hence, are also called bound vectors. Also, these vectors cannot be added to each other. But on the other hand, these vectors can be subtracted. Below in an example of position vector.
In the above example, OB and OA are position vectors as they describe the positions of points B and A with respect to the origin. AB is a vector that is the vector subtraction of OB and OA. But how? Yes, let look into what is vector addition and subtraction is. Say in the figure above, you perform an action of moving from point O to A and then from A to B. So, what is the total distance covered? OA+AB of course. But, what is the resultant displacement from O to B? OB. So, OA+AB=OB (Note that AB is not a position vector. Hence, it can be added to OA). Now, look at the example below.
Look at vectors A and B (these are not position vectors). How do you add them since one does not start from where the other ends? So here we use a very important concept in vectors. If two vectors have the same magnitude and direction, they are considered the same vectors, no matter where the two vectors lie in space. As a result of this, vector B has been moved (without changing the magnitude or location) to a different location, which is the point where vector A ends. Now, we can use the addition technique that was used earlier and find that the resultant displacement is given by the vector A+B.
Now lets have a look at what a hyperplane is. The concept of hyperplane is nothing but an extension of the concept of a line. Yes, it is that simple and you will soon see how. Firstly, lets have a look at the the general equation of a line. We know that the general equation of a line in 2D is: y=mx+c. From high school mathematics, we know that this equation comes from ax+by+c=0. This equation can also be written as y= -(c/b) +(-a/b)x, where (-a/b) forms the slope ‘m’ of the line and (-c/b) is the y-intercept. Hence, you can clearly see the similarity between the above the two equations. Now, if the axes are x1 and x2 instead of x and y, the equation can also be written as w1x1+w2x2+w0=0, where w1 and w2 are nothing but coefficients of x1 and x2 and same as ‘a’ and ‘b’ in case of ax+by+c=0. Also, w0 is just a constant.
Moving a little ahead into 3-D geometry, equation of a plane is given by: w1x1+w2x2+w3x3+w0=0. So, what do you think equation of a plane would be in n-dimension? Simple. It would be w1x1+w2x2+w3x3+…+wNxN+w0=0. In n-dimension, a plane is called a hyperplane and the equation of a hyperplane, in a more concise form, can be written as:
Now we need to figure out what w0 is. Lets go back to 2-D to see what equation of a line says. w1x1+w2x2+w0=0. Here the y-intercept will be given by c=-(w0/w1). Now, if the line passes through origin, then c=0, which implicitly means w0=0. In that case,
We can extend the same concept to a hyperplane and get the above equation to represent it. We will do that in a bit. For now if we consider W and X as vectors, the above equation basically means a dot product of the vectors as we have seen in the earlier section. And if the dot product of 2 vectors is 0, this means that the angle between the vectors is 90 (since cos90=0)and hence, the vectors are perpendicular to each other. Lets keep this in mind.
Now refer to the diagram below:
In the diagram above, vector W is the normal vector to the plane. This means that W is perpendicular to the plane, which implicitly means that it is perpendicular to any vector that lies on the plane. Again, vector A is a position vector that points to a particular point ‘A’ on the plane with coordinates (a1,a2,a3). X is another position vector that refers to any random point on the plane with coordinates (x1,x2,x3). Now, the vector AX that lies on the plane is given by:
In a way, ‘b’ determines how far is the hyperplane from the origin or the ‘offset’ of the hyperplane. So, let see what happens to the hyperplane equation when the hyperplane passes through origin. When this happens, vector A can be taken as [0,0,0] since the plane passes through origin. Therefore, we get:
A python code to generate a hyperplane is shown below:
Now that we have a good understanding of hyperplanes, lets look at what half-spaces are in the diagram below.
As in the diagram above, assume that ∏n is a hyperplane in 3D. Vector W is a normal vector to the hyperplane. Let (A1,A2,A3) be a point above the plane such that it lies in the same direction as vector W. Let A be a vector from the base of W to point (A1,A2,A3). So, in order to measure the distance of the point (A1,A2,A3) to the plane, we drop a perpendicular from the given point to the plane, such that the perpendicular is parallel to the normal vector W. Let the length of this perpendicular be ‘d’. Now, since W is parallel to the perpendicular from A, the angles marked as θ in the diagram below are equal. So from here, we can say that:
Now, we can clearly see that the hyperplane divides the 3D space into half. One half lies above the hyperplane and other lies below it. These halves are nothing but half-spaces. Half space is a subset of the entire n-dimensional space created by the presence of a hyperplane in the space. Let us now understand the role that θ plays in determining the half space. In the above diagram, vectors W and point (A1,A2,A3) lie on the same side of the hyperplane. Therefore,
Since distance cannot be negative, we take the absolute value of ‘d’. However, by looking at the sign of ‘d’, we can identify which half space the point lies in. And thus, the equation of a half space in set builder notation is given by:
So, the half space in which the normal vector W lies in the same direction with the concerned point is given by the greater than/equal to sign. And the other half space is represented by the less than/equal to sign. So, here I have a question for you. What do you think will be the equation of the half space created by a hyperplane that passes through origin? To find the answer, check out the last section of the next chapter here.
Lets solve a question here. Prove that a half space is a convex set.
The same kind of proof can also be used for the other convex sets discussed in this chapter.
We now need to look at a couple of important theorems on hyperplane. These theorems dictate how hyperplanes interact with convex sets. Below are the theorems.
THEOREM 1 (SEPARATING HYPERPLANE THEOREM):
THEOREM 2 (SUPPORTING HYPERPLANE THEOREM):
Now, lets look at what a norm ball is. For that, lets understand the meaning of norm. Norm means size or length of a vector and is given by ∥x∥. There are many ways to find norm. Lets look at a 2D space for now. The same concept can be extended to N-dimension. So, consider a vector A in 2D with components [2,3] as shown below:
If norm-1 and norm-2 equate to 1, we get a unit square and a unit circle. Now, let there be a circle with center at origin and radius ‘r’. So, equation of any point, denoted by vector x, that lies inside or on the circle (norm-2 of vector x) is given by:
A point here to be kept in mind is that the circle shown above is simply the space created by the length of a particular vector and vector x is just a point in that space. Generalizing the equation further, if we shift the center of the circle to another point Xc, the equation of any point, denoted by vector x that lies in the interior of the circle is given by:
When this concept is extended to 3D, we get a norm ball. A norm ball is nothing but representation of a vector length in its norm-2 form in 3D and is the space created by the length of a vector. Also, as stated above, vector a represents any point in that space. Have a look at the below diagram to have a better understanding of it.
Here, I have a question for you. Can you tell the magnitude of the vector whose norm ball has been shown in the diagram above? To find the answer, have a look at the end of the next chapter here.
The last topic in this chapter is a polyhedron. You must know what a polyhedron is from high school geometry. If not, have a look at the figure below. So now, can you interpret what a polyhedron is with respect to the topics that we have covered in this chapter?
Are you able to guess the answer to the above question by looking at the diagram above? What do you think the arrows are? Well kudos to you if you could guess it right. The arrows are normal vectors of the corresponding planes. So, a polyhedron can be defined as an intersection of ‘n’ number of hyperplanes and half spaces. Each of the triangles and squares are formed by the intersection of hyperplanes at certain degrees. At the same time, if you expand your imagination a little bit, you will realize that even the half spaces, which have been created by each hyperplane, are intersecting in the space inside the polyhedrons. So, that’s a polyhedron for you. Mathematically, a polyhedron is given by:
So finally with this, we come to the end of the chapter.
Solutions to questions from previous chapter.
- I had asked how does the following expressions have the same meaning:
Both these expressions are same because if there are only two points x, y and the second expression is expanded, in that case x1(in second expression)=x(in first expression) and x2(in second expression)=y(in first expression). And thus:
2. Next, I had asked to come up with the inequality of a concave function. This should have been fairly simply. The inequality conditions for concave functions can be both:
Continue to Chapter-4: Important Convex Functions and Convex Properties here.