Compression and Loss

I recently spent a week trying to compress data, which is a new thing for me. I had this little Arduino audio project I wanted to build, and the audio transmission from the device was too slow. The only solution I could think of was to crunch the data, so that there would be less of it to send. As noted, I’m not a professional compression algorithmer (if that is indeed a thing), nor even a full-time engineer, and I’ve really never written C++ before, so to call the process irritating makes a vice of moderation. The fact is that I wanted to break things and do crimes. I wanted to flog C++ in a public square while a glass-eyed midget yelled in its face, “How’s your illegal type conversion now, muthafucka?”

But as irritation turned into rage, so rage subsided into knowledge, and knowledge yielded my desired 50% compression ratio, and finally, at the end, I realized that thinking incessantly about compression had managed to catalyze some thoughts about the basic construction of the universe. And if that’s not a sufficiently grandiose lede for you, well, then I pity your friends and family.

Here’s a basic idea about compression: there is lossy compression, where some of the data gets left behind, estimated, thrown out, and there is lossless compression, where the end product exactly matches what you started out with before you crunched it. For my project, I needed to do the latter.

Here’s a basic idea about lossless compression: some data is very compressible and some data is not compressible at all. In general, this corresponds to entropy. If the data exhibits structure and order, you can make use of that to predict or make shorthand for what comes next. Otherwise, tough luck. In other words, compression is about finding patterns. If you find a pattern, you can merely send a description of the pattern, which is often shorter than the expression of the data itself.

The most obvious example is the case where the data is all the same. Let’s say I have a piece of music that is simply playing the C chord over and over at a time interval for two minutes. Well, boring song, but easy to transmit. In fact, I just transmitted it to you in the prior sentence. Done.

Another common technique in compression is to look at the change from one data point to the next. If the step is repeated for a period of time, I can send that pattern. Say, the song goes up one step every second for five steps, then goes back down every second for five steps and it keeps doing this for 2 minutes. Again, bad song, but easily compressed.

A more interesting song will have more variable changes, so it’s harder to compress. In my case, I was dealing with human speech, and I was recording 8000 amplitude values every second. These values were between 0 and 256. In other words, each one was one byte of data. If we look at the change between one byte and the next, we’ll see that much of the data moves by rather small amounts, usually in a wave-like pattern that ends up making the somewhat melodic sounds of the human voice. So one thing you can do is look for runs of bytes where the change is in the 0–7 range, and represent that in a three bits instead of the full eight bits of that byte. Add in one bit for positive or negative change and you can cut the size of the data in half.

Now, the simple patterns offer the best compression, because the description of the problem takes the least amount of space. As the pattern gets more complex, the description takes more and more space until, eventually, it’s fatter than the problem it was trying to shrink.

Anyway, long story short, you always end up with a bunch of data that doesn’t really compress, and this data is often the most important part: in my case, the part where the speaker is most active and expressive and, so, creating the most information. The part before she starts speaking has little or no change; the part where she’s lingering over an uh, may look very consistent. But lots of words and enunciation makes a lot of scatter in the values. Some of this can be estimated, but again, I wanted lossless compression, something that would decompress into exactly what was recorded.

But wait a minute, you might say, isn’t the low entropy data, the stuff with lots of information, supposed to be compressible while the random stuff is unpredictable and incompressible? How is it that at a certain level of information complexity, the data becomes unpredictable and unpatterned again?

It turns out we have two different categories of stuff that’s hard to compress, what we might call (abusing the terms only slightly) signal and noise. Noise tends to be stochastic — that is, it results from chaotic inputs to create disordered things like static. On the other hand, we distinguish signal as precisely the part that is not random: the communication, the part intended by the sender. Neither end up being completely predictable but for very different reasons.

Noise is unpredictable because the kinds of problems that generate it are very, very sensitive to input conditions. The input conditions are themselves the result of problems very sensitive to input conditions. The movement of air particles next to a microphone. The induction of various electromagnetic waves bouncing around the room. And so on. Lots of problems are like this: predicting the weather a year from now, or predicting the next number a roulette wheel will stop on. You might say that prediction is an attempt to do compression on history. And much like my problem with human speech, parts of it are compressible, parts are not. Orbit of planets, sure; weather, meh; roulette wheel, not so much. But for my purposes, the good thing about noise is that if you can recognize it, you can throw it out. Lossy is okay for that. Certain frequency ranges, for instance, are not going to be human voice, so you can get rid of that. Or if you see a bunch of data that’s close to zero, you can probably flatten it all to zero, because no one is talking yet.

Signal becomes hard to compress for very different reasons. Namely, you just never know what a person will say next. Or as we used to say in my youth: hey, it’s a free country. And because a person can say whatever they want, they often do, and that tends to distribute values in a way that is hard to summarize with a formula. Composition — the general descriptor for this particular behavior — presents a quite distinct data set from stochastics.

The difference, of course, goes back to entropy. While noise is the disorder generated by highly derivative lawful systems, signal is the order generated by living things. Living things are able to do this, because living things are able to do stuff, period. They get knocked down, but they get up again. Piles of bricks do not. Living things, by definition, have this quality of agency, which is precisely the capacity to reduce entropy.

Now some people will argue that this is illusory, that our experience of freedom as we go through the day deciding what to eat or what to wear or what to say into a microphone is really just a higher order complexity problem, indistinct from weather prediction or the roulette wheel. This idea is probably dismissible on a purely pragmatic basis: there can be no sadder charade than deterministic creatures arguing passionately to convince another determined creature that it has no choice in the matter. But the more interesting discussion lies in the science.

The idea that all things can be compressed is appealing to us, but science never tends to bear this out. However much we may yearn for simplicity, the world tends to make things complicated and messy.

If you think back to stochastic randomness like the roulette wheel, you’ll see that this derives from the very fine resolution of the deterministic inputs of matter. This is related to a mathematics problem called P versus NP. There’s a very good Wikipedia article for those so inclined, but a cheap and easy explanation is this: there are certain problems one can solve in a reasonable amount of time for a given data set. There are other problems that are extremely difficult to solve, but that are easy to verify, if someone says they know an answer. Imagine a case where trial and error is an easy way to proceed, but that the number of potential trials is just friggin exhausting — like you have a big set of numbers and you want to know if any subset of them add up to seventeen. You basically just have to run through all the possible cases. The problem is not compressible. However, if someone says, “The answer is yes, because the set contains 10, 3, and 4, and they add up to 17,” then it becomes trivially easy to check the set for those numbers and verify that they add up to 17.

Another NP problem is called the halting problem. There turns out to be no sure way to determine if a computer program will come to a conclusion or will keep calculating forever. I mean, sometimes it’s really obvious just by looking at them that they just do one thing and then stop. Others, like the main menu loop of a DVD, obviously have designs on eternity. But there’s no general way to take a program and an input to it and determine if it goes on forever or stops. The running of the program can’t always be compressed. You may just have to live through letting it run. Most of history is like this.

Mathematicians want to know if the set of P problems can be proved to be equal or unequal to NP. That is, if a problem’s solutions can be easily verified does that mean a reasonably quick method exists for solving the problem itself (if we just figured out what it is)? Most of them say no, but they want a mathematical proof, and there’s a million bucks in it if you can supply one.

I find it sort of remarkable that even in something like math, where we expect cleanliness and order, we have very basic problems where the solution defies simplification.

You’ll find the same problem rattling around various other aspects of nature. One obvious one is the tension between genotype and phenotype. We like to think of our genes as basically the compressed version of us. You know, just add water or something. Of course identical twins know this is not the case. There is no completely determinative encoding of the lives of organisms. Like the roulette ball or the weather next June, the degree of complexity in the development of organisms — how their endless interactions with the world dance with their genetics — is truly mind-boggling.

Even more than that one-way challenge of predicting lives and characteristics from genetic code, however, the whole notion of heritable traits has been evolving (no pun intended) in the past couple of decades. Recent evidence suggests that the experience of organisms — those myriad interactions with external reality — has not merely a critical impact on the expression of genes in their own lifetimes, but even transmits characteristics across generations. The sins of the parents are, to some degree, visited on the children.

In any case, your body, like the roulette ball, is on a path through time-space that is nigh impossible to predict. You’re shaped as much by the particular way you bounce off the nooks and corners of the world as by any tidy code. Sorry if you were hoping to be reconstituted from a genome file some hundreds of years hence. The Franken-you that results may look like you, but you’ll still be dead.

One is tempted to say that most of the problem of figuring out what’s going to happen next — in our lives, on the roulette table, with a gestating infant — is a measurement problem. We just can’t really get the precision we need. But there was a great scientist in the 19th century named CS Peirce who addressed this question — dice, rather than roulette, to be precise — and suggested that the problem is not what we might think. He had noted, as an experimental scientist, that the problem of measurement was not merely one of precision, but that when one achieved higher precision, it tended to come at the cost of higher deviation from expected values. The range of error was wider. You may call this experimenter error, Peirce said, but there is no real way around it. All results arise from imperfect circumstances. And so, he suggested, one might as well admit that the world has a real quality called luck. Peirce summed up the problem this way: the world deviates from the law all the time in minute ways, and in large ways very rarely. But it does deviate.

You may start to see a trajectory in that statement, one that finds its target some decades later in Werner Heisenberg’s uncertainty principle. At a certain level of precision, the degree of uncertainty in measurement becomes infinite. Heisenberg summed up the problem this way: “in the question of whether, if we knew the exact state of all the particles in the universe, we could know its entire future, it is the premise, rather than the conclusion that is false.” We can articulate laws well enough, but the world gets to decide things about itself without regard to them. And it is this characteristic that makes it largely incompressible.

To choose a very simple example: there is nothing in the whole history of the universe that is going to allow you to determine exactly when an individual uranium atom will decay. You can’t do extra credit research and get closer. You can’t introspect its state and determine the date.

It’s not just that we can’t measure the dice precisely enough: it’s not even clear that they have a position at the degree of precision we need them to in order to exhaust the prediction problem.

But this still doesn’t address whether perhaps “signal” is just a highly evolved type of “noise.” Certainly, one must consider that the continuity of complex systems — physics yielding chemistry yielding biology yielding culture — means that we must be part of the same process. But I think the conclusion is not that human freedom is a form of determinism obfuscated by over-complication, but that freedom is a much more fundamental characteristic of nature than we have previously thought. That is, even things we think of as simple are exhibiting freedom all the time. This certainly was Peirce’s take on experimenter error, and it is also a conclusion shared by the mathematician John Conway in his 2009 paper “Strong Free Will Theorem.”

Instead of viewing signal as merely highly evolved noise, we ought to reverse, and see the noise as meaningful, but in a foreign tongue. This is less a piece of advocacy for astrology (although I do endorse the practice of gazing at the night sky in remote places, just don’t expect it to help with your stock picks), than it is a paean to the mundane, and a challenge to take delight in its strange liberties and prerogatives. You may be able to learn a lot more stopped at an ill-timed red light or hunting for an encoder bug than you might otherwise think. The world is organized in a fundamentally mysterious way and trying to compress it into something pat is a very risky business.

We may not be able to explain the full meaning of freedom, since by definition it lies outside the descriptive power of science, but it is a fundamental part of our experience, an experience that cannot be perfectly compressed, nor perfectly reduced to laws — at least not without discarding massive chunks of it; and as I said, for my project, like the project of life, I require a strategy that is lossless.

Like what you read? Give James Albrecht a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.