Regressing Regression
It has just been a couple of weeks at DS12, and we are still very unabashed about our geekery as we truck along. In our models class, one of three unique modules that DS12 Residents engage in weekly, Chris Mckinlay gave a very interesting lecture on linear regression.
Linear regression is the ur-model in data science whenever we want to predict real-valued outputs. Finding new angles to introduce the subject is a challenging, and perhaps thankless, exercise. The task is more so challenging given that our talented DS12 students have some background in machine learning. Ah well, Chris managed to find a fresh angle by starting, of all places, on the Moore-Penrose pseudoinverse.
For example, a prototypical regression problem would be to predict home prices based on square footage. This should not be an unreasonable task, after all a link between area and home prices is expected; usually, the bigger the square footage, the more expensive the home.

A regression model would posit that the home price = constant_1 * square_footage + constant_2. Finding the constants reduces to solving a linear system Ax = y. Here, A is a matrix containing the n rows of the training set and m feature columns. Our home price example has two columns, one for the number of rooms and the other for square footage. And y is a n-sized vector that has a home price for each row in the training set.
We can think of matrices as linear maps that transforms a vector into another vector. A transforms x into y if A*x = y, or specifically y lives in the image of A. Once we have a map, we can define an inverse map that takes y back to x, we call this mapping, the inverse of A, Inv(A)*y = x. An inverse of a matrix, Inv(A), undoes a transformation by A, Inv(A)*Ax = x.
Inverses don’t exist for all matrices; particularly, they don’t exist for non-square matrices. One reason is that non-square matrices will transform certain vectors to zero, meaning A*x = 0 for some x. Once a vector is mapped to zero, you can’t define a transformation that recovers x from a zero vector. The vectors that get mapped to zero are special enough to get a name, the kernel of A. If x is in the kernel of A, then Ax= 0.
We can’t define an inverse for vectors that get mapped to zero, but we canstill define an inverse mapping for the vectors that don’t get mapped to zero. It so happens that the space of vectors can decomposed into the two parts: the ones that live in the image of A and the others that don’t. If y lives in the image of A, there must have been an x such that A*x = y. The Moore-Penrose pseudoinverse of A will find inverses of vectors that live in the image of A; all other vectors get mapped to zero. The pseudoinverse does what an inverse is supposed to do, but only on vectors for which it makes sense.
What does this mean for our linear regression problem? Remember we are trying to solve a linear system, Ax = y, i.e. find all the vectors x such that for a given vector y and matrix A, A*x= y. You can solve this problem easily by taking inverses, x = Inv(A)*y, but that only works if A is a square invertible matrix. Typically A is a tall, skinny matrix, with many more rows than columns. Our real estate data will have many rows and just one column. Instead of finding inv(A), we take the pseudo inverse of A, so x = PseudoInverse(A)*y.
And now that we have learned something about matrices, onwards to the state monad.