Linearization and Gaussian

Ujjwal Saxena
6 min readApr 3, 2018

--

For someone who is new to these topics or has read them and still not able to find a relation between both, I’ll start the whole thing from scratch. Let’s first just understand the Difference between Linear and Non Linear Data.

Linear Data: There is nothing called “Linear Data”. However, the data can be linearly distributed. A linearly distributed data is distributed more or less along a line, and thus can be approximated using a straight line.

image source: wikipedia

Non Linear Data: A non linearly distributed data is the distribution that is not along a straight line. Simple enough.

However an important thing to learn about these distributions is that they represent a model. A model is nothing but a function which says if you give me “x” input, i’ll give you “y” output. If a change in x changes y linearly it is linear model or function. If it doesn’t, it’s non linear.

For example:

y= 3x+2, y= 3 and y= 3/4 x+7 are linear, whereas y= , y= sqrt(x²+y²), y=x⁵ or y= log(x) are non linear.

Important: If you pick a linear function from above and generate a 1000 random numbers to replace x one by one for each number, you’ll get a linear plot like the one displayed above. Similarly If you pick a non linear function you’ll get something like below. Where x axis represents the random numbers you have generated to insert as input to the function, and y represents the output of the function.

image source: wyzant

But how do the real world examples of RADAR and LIDAR come under these categories? Both of these sensors use EM waves to perceive the world, and scan the surroundings in a similar way. The difference lies in their ways of taking measurements. A LIDAR finds out the location of an object directly in x-y coordinate system. However A RADAR does that in Polar Coordinates: rho, phi and rho dot and rho is given by rho= sqrt(x²+y²), a Non Linear function.

So changing x or y directly changes position-X and Position-Y that a LIDAR perceives. But changing x and y, changes rho by sqrt(x²+y²). It’s just not linear in the x-y coordinate system. So can we convert a non linear function to Linear?

Yes, there are some ways to do so. One way is to approximate. Let’s visualize this.

image source: Wikipedia

Lets say we have a non linear function and we want to approximate or “Linearize” it at a point. Then we draw a tangent at that point.

This is equivalent of taking a slope of the function or calculating the derivative of the function at that point.

Please keep in mind that this is just an approximation and just gives a fair estimate at and near that point.

Let’s assume the curve to be f(x) and the tangent L(x). L stands for Linear here. The equation of this tangent can be given as y - y1= m(x - x1) + c. This can also be represented as L(x) - f(a) = m (x-a) + c.

Here m is the slope or differentiation of f(x) at point a, represented by f ’(a).

so L(x) - f(a) = f ’(a) (x-a) + c => L(x) = f(a) + f ’(a) (x-a) + c

This is first Order Taylor Series. This is an infinite series and further order terms are also there but it’s safe to ignore them if they are small. Also keep in mind if we consider higher order terms we will be able to approximate better as the Linearization function L(x) will be able to follow f(x) more properly, being able to bend.

But how to take this derivative, if f(x) is a multivariate Matrix F ?

This is where Jacobian matrix comes into picture. A Jacobian matrix contains the derivatives of the matrix F. For the brevity of the article I’ll not get into it’s details.

This is the same approach that is followed in Extended Kalman Filter. I wanted to put this here just to explain a relation between Kalman and the Linearization approach. In Extended Kalman Filter we use the mean of the function (just like “a” above ) to linearize it. If you don’t know about Kalman, don’t bother about it.

But as I said earlier, linearization using a tangent gives a proper estimate only around the point, about which approximation is done. An Unscented Kalman filter not only linearizes the non linear function about a single point. But it also takes many sample points around the non linear space, maps them to a new space and then draws a fitting Gaussian around it. This gives a better approximation than Extended Kalman filter.

But what is this Gaussian anyway?

Most distribution in nature are Gaussian. What I mean by that ? I’ll show you with an example. Lets just assume we have data about the heights of students in a class. Generally the heights are normally distributed, which means that if there are 100 students in a class, many of them will be of same height. Few will be shorter than others and few will be taller.

This is represented by a bell curve, which says that the probability of finding a student with a height around mean height of the class is the most. This is an example of discrete Gaussian. It gets continuous with larger data set. If all students have a different height, probability of finding a student with any type of height will be equal. This is the state of maximum confusion and you can’t determine which height is prevalent in the class. If out of 100, 50 have a height of 5 ft, 15 are 4.5 ft, 15 are 5.5 ft, 10 are 4 ft and 10 are 6 ft. We can say with more certainty, that a height of 50 is most common. This is not the best example but I hope this will give you an idea about Gaussian.

Noise in a similar way generally contains some low frequencies, some high frequencies and majority of medium ones.

image source: Stack overflow

A probability density function for large no. of randomly taken x input is also Gaussian and the area under it is always 1. we can find the probability of a certain event range by calculating the area under the curve within that range. Only linearly distributed data exhibits Gaussian or Normal distribution. This is the reason why it is necessary to linearize non linear functions, so that a Gaussian can be generated out of them and some inference can be made from their distribution. They are often represented by contour lines.

image source: Bayesian Decision Theory

Gaussian distribution plays an important role in determining the inclination of a certain measurement along with its uncertainties and the related math is also known as the math of AI.

For more articles please visit: https://wordpress.com/posts/erujjwalsaxena.wordpress.com

--

--

Ujjwal Saxena

A learner in AV development, DNNs and computer perception. I worked at Infosys earlier and now at Nvidia working as test dev for verifying AV features