Vector Norms

4 min readJun 4, 2019

Going by this article , I started exploring the first book in the list. [Deep Learning (Adaptive Computation and Machine Learning series). Documenting on Medium a few of the concepts that have been beautifully covered in the book, this is the first post in the series , covering ‘vector norms’ — one of the fundamental topics in linear algebra.

What is a vector ?

A vector is simply an array of numbers that denotes a point in an n-dimensional space. In data science, this point could represent say an instance of a house, and each of the numbers , some ‘property’ or feature of the house.

For example, let the following be the features ( n = 4) of the house that we want to consider —

the number of bedrooms
the number of bathrooms
the number of balconies
age of the house (current year-year built )

[3,3,1,3] is a vector that represents a house , each of the number corresponding to the listed features above.

With the idea of a vector being clear, one of the important measure in data science is to find the magnitude or size of a vector. This is given by what is called ‘Norm’. The norm is fundamentally a function that maps a vector to a single non-negative number.

The norm in ‘p form’ is given by :

Euclidean Norm

When p = 2, the norm is called ‘Euclidean Norm’ which is frequently used in machine learning. It is also known as the SRSS or the square root of sum of squares

For the house example , the Euclidean norm is given by

Manhattan norm

The 1- norm, also known as Manhattan norm , is the sum of the absolute values of the features, given by :

For the house example , the Manhattan norm is given by

Max Norm

The max norm gives the absolute value of the largest feature.

Isosurfaces of the Lp norm

For the sake of simplicity and to be able to visualize, let’s consider just two features, instead of 4 . Suppose we want to plot the following , for p=0.5,1 and 2; ie.plot the features x1 and x2 such that Lp norm is a constant, say equal to 1

The visualization for L1 and L2 norms is beneficial in understanding the differences between ridge and lasso regression (which are two regularization methods ), specifically why L1 or Lasso can be used for feature selection and not L2.

Note the diamond shape of the L1 plot vs that of the circular shape of L2 norm plot.

If you have earlier worked on the regularization techniques , you will be able to appreciate how the geometrical properties of these shapes determine the intersection point of the cost function contours and the norms.

Figure from the book ‘*Elements of Statistical Learning’*

The regularization techniques will require a separate blog and is beyond the intended scope of this blog.

Hope you have gained some understanding on norms after reading this. Thanks for your time!