Null space. The black hole of Math.

6 min readAug 29, 2022

Why do neural networks learn? What does the null space have to do with the information? Why does a system of linear equations have no solution when the determinant is zero? A better understanding of the concept of null space can help you answer these questions.

Modifications on: Photo by Jeremy Perkins on Unsplash and Photo by Markus Spiske on Unsplash

My personal experience

Linear algebra is one of my favorite subjects. It is one of the most important branches of mathematics and can be applied to many other applications: solving linear equations, systems of differential equations, and more.

When I was taught this null space concept, it was part of a routine to make sure we covered all the course content. Teachers put a checkmark next to each concept and you’re good to go!

There is usually no interest in understanding what null space is, or why it might be useful. We usually are only interested in doing mathematical exercises without having any sense of the reason behind them.

However, I enjoyed asking myself what is this for and why we learn about this but is it never used?

But what is null space?

Null space is the set of all vectors which satisfy a given linear equation. In other words, null space is the set of all solutions to a linear system of equations equal to zero.

In other words, all vectors in a null space are mapped to zero by the matrix. This is an information killer and you can’t reverse this information once it falls in null space.

General context

In mathematics, it is very common to use transformations, be they matrices, vectors, derivatives, or equations… to map from one domain to another. For example, the Laplace transform transforms maps functions in power terms to oscillatory terms.

A matrix is not the exception, what it does is make a change of “perspective”, a change of reference, a change of vectors’ basis, or a change of dimensions…

When the determinant of a matrix is different from zero, we have a reversible mapping… this means that we can transform objects without losing information about them since when performing the inverse transformation we obtain the “original” mat value or object again.

The problem is when the determinant is equal to zero, many stop there as soon as they find this. The most common thing is to say: “This cannot be done” or “This has no solution”. But that is wrong, actually, it does have a solution but it is not unique, they are multiple, in fact, they are infinite, but that they are infinite does not mean that they are any you choose. Those solutions are the null space.

What is it for?

The desire usually is to find a single answer and not too many. This is because one of them is more relevant than the others (optimization). But what if you want to understand this having an infinity of answers? At least not stop because the determinant is zero. You should know about null space then.

Technically

This is all about dot product and linear combinations, so let’s remind a little about them:

A linear combination is a result of multiplying each vector on a list by a constant, then adding them to a new vector generated by the combination of those.

Matrix notation is mainly a notation of dot products or linear combinations. The solution x if the determinant is different from zero is found by applying the inverse of A on both sides. But if don’t:

Null space information is multiplied by zero

For a vector x, its projection over the null space will be lost while the projection over the column space will remain.

Generally, the null space is only worked for the homogeneous equations:

By linearity properties, this null space would be the same as Ax=b. It is the set of vector on x that make the linear combination with A result in zero’s vector.

The reason why the inverse of matrix A fails is that by default its process tries to reverse the null space which is mathematically impossible due to the multiplication by zero, which deletes the information.

However, the most important and relevant abstraction of null space is what happens to the information.

Optimization

As there are many valid answers, the possibility of choosing the best of all arises. This is known as optimization, which depends a lot on the restriction that comparatively defines which is better than others. You can’t optimize without having a comparison metric.

I could explain soon an interesting example of an application to ground this. Basically, it ends in the use of pseudoinverse. The relation with the null space is that this optimal solution is part of the null space (as it has all, indeed).

The spoiler is that the optimal solution by pseudo-inverse corresponds to the closest point to the origin (0,0) restricted to the null space.

What happens when determining is equal to zero?

If the determinant is different from zero there is no null space, at least it will be 0-dimensional.

When the determinant is equal to zero we map from higher cardinality to lower. This means many inputs will end up sharing the same output. Then it is expected that we will no longer get a single solution like normally happens when the determinant is different from zero, but instead, we will get an entire set of solutions.

What has this to do with the information?

Then, when the determinant is equal to zero we know there is null space, as cardinality decreases because some inputs are sharing the same output, we could interpret it as a loss of information in the process of mapping or transformation.

This means that the process is no longer reversible due to this loss of information. This is apparently bad, but not really if you understand what happens with that information you could indeed redefine the matrix to avoid that.

Why do neural networks learn?

Why do neural networks “learn”? The short answer is that they practically first memorize all, then forget the noise. But actually, both at the same time.

This only states for isolated layers as nonlinearities come in a role at activation functions, which are a way of abstraction in a new domain.

What a neural network does from the point of view of the null space, is to increase the dimensions to the maximum possible, in such a way that the noise or unnecessary information is inside the null space, therefore it is killed. Remember that the null space practically is an information killer.

“Perfect learning” or memorization, would be equivalent to a 1–1 mapping with no loss of information, this is when the determinant is different from zero. But in the other way with determinants equal to zero, “junk information” can live in the null space without problems.

The idea is to keep the right information in the column space bases and delete useless information or noise keeping them in the null space bases.

Conclusion

Now you should know the insights of null space and be able to answer those questions from the beginning: Why do neural networks learn? What does the null space have to do with the information? Why does a system of linear equations have no solution when the determinant is zero?

This also could be used as a mind tool for reinterpreting mapping, transformations, and information processing. We could see the learning process not only about memorizing information but ignoring some of it in this black hole of math called null space.