Solving systems of linear equations using matrices and Python

Published in

DataDenys

6 min readOct 19, 2022

Matrices stay at the very basis of all math used for ML. Let’s understand why it is so and how matrices can be used to solve systems of linear equations from perspective of 2 different methods.

Systems of linear equations in ML

Standard task that ML methods solves can be generally illustrated as:

We deal with a set of features, having a “table” with feature values and target value combinations. We have target value defined for a subset of data we can use to learn on. After learning is done, we can then apply our findings to get results for a dataset with unknown target value:

As we remember from Math, line function can be represented as:

This is a single dimension example, and for 2 and more dimensions:

Now if we treat our x1, x2 and so on, as our features and f as a target value, we can rewrite our initial learning dataset into the form of linear equations:

And now we basically only need to write that down as a system of linear equations and solve that:

After solving that, we get specific values of a1, a2 … (also called weights), that we will use in our linear function to find unknown target values (f) with known feature values (x). And this final function with calculated weights is what called an ML model.

Real live note

As you may have noticed, this sounds too easy to be a part of powerful ML industry. In fact, this is the ideal case. In real live you never have enough data to be able to build ideal model, that’s why you end up with either infinite solutions or no solutions at all using this approach. In practive, best you can do, is get a solution that is close to ideal solution. But, also in practice, this is more than enough in most cases.

Still ML engineers deal with systems of linear equations but on a different level. Instead of solving direct target value function of features equations, we solve equations that help us minify so called error (distance between real target value and calculated one based on given weights). E.g. if we use MSE based on Euclidean norm:

I will write more on linear regression and error function optimization later, and for now let’s forget about it and return to equation systems and how matrices can be used to solve them.

Matrix form of linear equations system

So how matrices are related to systems of linear equations? Say we have the following system of linear equations to solve:

Now let’s build a matrix of weights (those numbers before our x1…3) and two vectors — X and target values Y:

As we know, matrices can be multipled, that’s why writing the following in the form of matrices (vectors) is the same as writing our initial system of linear equations:

Great, so now we know, matrices are cool for less writing. But not only, we can as well use matrices to find solutions for our equations.

Finding solution using inverse matrix

Linear algebra acually lets us easily find solution in matrix form:

Where this strange w with -1 is called inverse matrix. It can be easily found with Python as well as X vector after multiplying inverted matrix with Y:

Here we’ve found w_inv using np.linalg.inv method, and the multiplied w_inv matrix by Y (which is vector) to get resulting X vector. This script gives us the following:

[[1.]
 [2.]
 [3.]]

Which is a calculated X vector:

Gaussian elimination method

While previous method is cool, it is hard for computers, as inverted matrix calculation complexity growth dramatically with the number of objects and dimentions in matrix.

Another approach (used by a group of methods) is to transform w matrix to the form where equations solution becomes trivial. This form is called triangular form of a matrix and the method itself is called Gaussian elimination (because we actually eliminate some elements of a matrix).

Transformation process if done in multiple iterations and the following is allowed:

multiply any row by any number (except zero, of course),
add any row to any other row,
swap any rows.

Another thing Gaussian method asks us to do is to append our Y vector to our w matrix to get a so called extended matrix:

Now let’s itarate to get final triangular form. First, let’s swap first and last rows:

Now let’s subtract first row multiplied by 4 from third row:

Next, we’ll add first row to the second one:

Good, we need to do one last thing to get triangular form — get 0 where we have -6. To achive this we have to multiply second row by 6/8 and add it to third row:

This is the final form of our matrix, since we have zeros under its diagonal (this is the triangular form we’ve searched for). As we remember, this matrix represents our w for X vector and Y vector on the right side, so we can write updated system based on new matrix form:

This is the trivial system form, that let’s us iterate from third to first equation by using each lower-iteration solution on upper-iteration:

In Python scipy can be used to get optimized triangular matrix:

Note, that scipy will return different matrix, than we iterated to. This is because scipy picks different row as a first row (it takes an element with max first value). But still, the solution to the system using this matrix will be the same as we’ve got. So there can be different triangual forms of a single matrix.

Summary

We can write down any system of linear equations in a matrix form. And from there we can use matrices transformations to find system solution, for example, using inversed matrix: