The Pain in the Sparse
Gems in STEM: A Brief Discussion of Sparsity
Wouldn’t it be nice if this article was over in just a couple sentences? It’d certainly be less work for both of us and free up some time to binge Netflix...I mean, read a book.
Unfortunately, no can do. There would definitely be a major loss of information—you’d probably have no idea what in the world we’re talking about, nor could you reconstruct the missing information with the tiny glimpse you got from a few words. No bueno. That means that this wouldn’t be a very good article (and that I’m not a very good writer), and we can’t have that.
Turns out, we have this same difficulty of trying to simplify information in the field of compressed sensing! We start out with some high-dimensional signal, like an audio or image (which typically takes a lot of energy and storage to work with), and we want to compress it without losing too much of its quality or information in reconstruction. After all, a picture is worth 1000 words, and we don’t want that to become like 10.
Let’s first look at this standard compression example. We have some signal, x (which is the cute jellyfish in this case) that we can represent as ℱS, where ℱ is a Fourier transform and S is a sparse vector, an array in which most of the elements are zero. You don’t have to worry too much about what a Fourier transform is, just know that it’s a nice way to decompose the image.
To compress the image x, we can Fourier transform it (to get the top right image), throw away a lot of the coefficients and only keep, say, 5% (to get to the bottom right image), and finally inverse transform it to get a pretty faithful image back (the bottom left image).
This standard compression is easy to do and is actually already built into your phones and computers!
But if we’re throwing away like 95% of the coefficients, why did we collect all that information in the first place?
What if we instead started with only a few measurements of the image?
Say we instead had some measurement that is only around 10% of the original image x. Let’s call this measurement y. Then, we can write:
(Recall that x = ℱS, which is why y = Cx = CℱS.) Here, C is like a pixel mask that selects some sample of the pixels in x at random. It’s a random sample because if we only had measurements of some corner, we wouldn’t really be able to accurately infer the rest of the image.
So we have some measure y (the first image on the top left), and we want to try to solve for a sparse vector S that is consistent with y. If we do this, then we can inverse transform to get back to something accurate to the original image x.
We like sparse solutions because they can be compressed more easily (since most of the entries in the array are zero), meaning they need less storage, thereby making solving problems with them a lot more efficient!
Turns out there are actually infinitely many S that satisfy the equation y = CℱS, so the real question is how do we find sparse S to successfully compress the image?
To do this, we use something called L1-norms, known to promote sparse solutions!
You’re probably already familiar with something called the L2-norm, which is how you find the distance to a point in geometry. For example, if you have a point W = (w1, w2) in the xy-plane, the distance of that point from the origin is the square root of the sum of the squares of w1 and w2.
Or, to write it more clearly,
Now, recall that we have infinitely many solutions S that satisfy the linear equation y = CℱS. So, we can represent this as a line of possible solutions, shown as the red line below!
We’re now going to get slightly more technical, so try your best to follow along! It’s okay if this doesn’t make total sense, I’ll clarify the “moral of the story” at the end.
Suppose we wanted to find a specific solution S that had the smallest possible L2-norm. Well, each circle you see in the image has an equal L2-norm (which makes sense because every point on the circle is the same distance from the origin, called the radius). Then, we can make this circle bigger and bigger until it intersects the red line.
This point of intersection is the point that has the smallest, or minimum, L2-norm — voila!
But, remember, we don’t want the smallest L2-norm, we want to find the sparsest S.
So, what we would actually want to do is to minimize the L0-norm, which just counts the number of nonzero entries in a vector S. Thus, if we looked for the smallest L0-norm, we would be finding the vector with the least nonzero entries, which is exactly what it means to find the sparsest solution S!
Uh oh, stop right there. Houston, we have a problem. Unfortunately, the L0-norm is not technically well-defined as a norm, which sadly means we can’t easily minimize it — this is actually a combinatorially hard problem!
Turns out, a big advancement in computation was when people realized that since we can’t use the L0-norm to find sparsity, we can use the L1-norm as a good substitute instead!
Wait, what is the L1-norm?
Let’s take the same example as before. If we have a point W = (w1, w2) in the xy-plane, the L1-norm is the sum of the absolute values of w1 and w2.
Or, to write it mathematically,
We can solve for the solution that has the smallest L1-norm solution using a variety of optimization methods, and it usually gives us the sparse solution that we want!
Wait, hold up, why?
Well, every diamond you see in the image has an equal L1-norm (the same idea as the circles before). Then, when we grow this diamond, the first intersection of the diamond with the red line (aka the set of possible solutions) is usually at a sparse solution! For exampLE, in this image, the intersection is at a point where w2 = 0, which is a sparse solution since one of the two components (w1, w2) is zero. The intuition behind this is that a diamond is spikier than a circle, meaning that it’s more likely than the circle to intersect the red line at a sparse solution.
The idea of the L1-norm promoting sparse solutions generalizes even better in higher dimensions! This is because these diamond norms get pointier and spikier, so the norm has a better chance of hitting sparse solutions.
It’s kind of like how you would have a higher chance of being poked in the hand by a hedgehog than a turtle.
Moral of the story: if we want to find a sparse solution S, we should minimize the L1-norm.
To do this, we write our problem as something called a convex optimization problem, where we’re minimizing the L1-norm of S subject to some constraints. (Convex optimization problems are nice and typically easier to solve because the property of convexity lends itself to several powerful theoretical tools in computation.)
If you’re curious, the optimization problem for our image compression equation is written as something like this:
To find yourself a solution, you could easily code this up (with some additional conditions for the variables).
Ta-da! We have now figured out how to compress an image with less work but not too much information loss. I’d call that a success, even if it did take us slightly more than a couple sentences to get there. ❤
If you’re curious, I did research on a similar problem of finding sparse solutions at the Research Science Institute this summer, under the direction of Mo Chen, a graduate student at MIT, and Professor Steven Johnson. You can find my paper here!
Until next time! If you found this interesting, make sure to follow me to stay updated on the next ones.
In the meantime, check out other articles in my column here! If you have any questions or comments, please email me at email@example.com.
To be the first one to hear about all my new articles, recent events, and latest projects, make sure to subscribe to my newsletter: Letter? I Hardly Know Her!
As a reminder: this column, Gems in STEM, is a place to learn about various STEM topics that I find exciting, and that I hope will excite you too! It will always be written to be fairly accessible, so you don’t have to worry about not having background knowledge. However, it does occasionally get more advanced towards the end. Thanks for reading!