Objectives

Loonar Technologies
Loonar Technologies

--

  • Learn the principles behind data compression
  • Understand how numbers can be represented in a computer

Introduction

The Loonar high-altitude balloon kits can send down custom data, but not a huge amount of it. The pipe of satellite or radio communications isn’t that big, which means you can’t send that much data through it. Luckily, there’s a way to make the amount of data smaller and get more information back from near space: compression. Uncompressed the entire works of Shakespeare are 5.5MB, but compressed, they’re only 2MB.

Ask your students to brainstorm ideas about how it’s possible to shrink the size of Shakespeare’s works.

Excercise

The Loonar flight computer sends down the current altitude with each message. What if, instead of just the current altitude, it sent down the last 50 altitudes? Assuming an ascent rate of 5 m/s, and logging the altitude every 50 ms, you might have an array of altitudes that looked like:

float altitudes[50] = [0, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, 2.25, 2.5, 2.75, 3, 3.25, 3.5, 3.75, 4, 4.25, 4.5, 4.75, 5, 5.25, 5.5, 5.75, 6, 6.25, 6.5, 6.75, 7, 7.25, 7.5, 7.75, 8, 8.25, 8.5, 8.75, 9, 9.25, 9.5, 9.75, 10, 10.25, 10.5, 10.75, 11, 11.25, 11.5, 11.75, 12, 12.25]

Let’s think about the different ways we could send this.

As a comma-separated string

The most simple option is to represent this as a string, separated by commas.

"0,0.25,0.5,0.75,1,1.25,1.5,1.75,2,2.25,2.5,2.75,3,3.25,3.5,3.75,4,4.25,4.5,4.75,5,5.25,5.5,5.75,6,6.25,6.5,6.75,7,7.25,7.5,7.75,8,8.25,8.5,8.75,9,9.25,9.5,9.75,10,10.25,10.5,10.75,11,11.25,11.5,11.75,12,12.25"

This weights in at 208 bytes. Even worse, if the altitudes were more irregular, it would take even more space: “5.26725” uses twice as many bytes as “5.25”. That’s because, under the hood, a computer writes a string as a series of bytes. A byte is a number between 0 and 255 (that’s 28–1), represented in binary. Normal characters are represented with one byte each: the letter “A”, for example has a value of 65. In binary, that’s 01000001. That’s the reason why “5.26725” uses twice as room “5.25”. Let’s try something else to shrink the size of our data.

Representing the numbers as bytes

Strings aren’t the only way to store numbers: your computer stores them as binary directly. For integers, that’s easy: 1 is stored as 00000000 00000000 00000000 00000001, 2 is stored as 00000000 00000000 00000000 00000010, etc. For normal integers, your computer allocates 32 bits or 4 bytes, which gives a maximum value of 232–1, or 4,294,967,295. If allows negative numbers, 231–1, or 2,147,483,647, since one bit is dedicated to saying whether or not the number is negative.

Non-integers, or floats, are represented using the binary equivalent of scientific notation. Where in scientific notation 48.5 would be written as 4.85 x 101, your computer will write it as 25 x (1 + 2–1 + 2–6). Computers write this in binary using the IEEE 754 standard. One bit is given to whether or not it’s negative, eight bits are given to the exponent, and twenty three are given to the “mantissa”, the equivalent of scientific notation’s 4.85. You can have your students experiment with how these numbers are stored at https://www.h-schmidt.net/FloatConverter/IEEE754.html. Luckily, we don’t need to worry about the details too much, because we can convert directly to a character array.

float altitude = 48.5;

char *s = (char *) &altitude;

By converting each altitude to its byte equivalent, we get down to 200 bytes:

0x0 0x0 0x0 0x0 0x0 0x0 0x80 0x3e 0x0 0x0 0x0 0x3f 0x0 0x0 0x40 0x3f 0x0 0x0 0x80 0x3f 0x0 0x0 0xa0 0x3f 0x0 0x0 0xc0 0x3f 0x0 0x0 0xe0 0x3f 0x0 0x0 0x0 0x40 0x0 0x0 0x10 0x40 0x0 0x0 0x20 0x40 0x0 0x0 0x30 0x40 0x0 0x0 0x40 0x40 0x0 0x0 0x50 0x40 0x0 0x0 0x60 0x40 0x0 0x0 0x70 0x40 0x0 0x0 0x80 0x40 0x0 0x0 0x88 0x40 0x0 0x0 0x90 0x40 0x0 0x0 0x98 0x40 0x0 0x0 0xa0 0x40 0x0 0x0 0xa8 0x40 0x0 0x0 0xb0 0x40 0x0 0x0 0xb8 0x40 0x0 0x0 0xc0 0x40 0x0 0x0 0xc8 0x40 0x0 0x0 0xd0 0x40 0x0 0x0 0xd8 0x40 0x0 0x0 0xe0 0x40 0x0 0x0 0xe8 0x40 0x0 0x0 0xf0 0x40 0x0 0x0 0xf8 0x40 0x0 0x0 0x0 0x41 0x0 0x0 0x4 0x41 0x0 0x0 0x8 0x41 0x0 0x0 0xc 0x41 0x0 0x0 0x10 0x41 0x0 0x0 0x14 0x41 0x0 0x0 0x18 0x41 0x0 0x0 0x1c 0x41 0x0 0x0 0x20 0x41 0x0 0x0 0x24 0x41 0x0 0x0 0x28 0x41 0x0 0x0 0x2c 0x41 0x0 0x0 0x30 0x41 0x0 0x0 0x34 0x41 0x0 0x0 0x38 0x41 0x0 0x0 0x3c 0x41 0x0 0x0 0x40 0x41 0x0 0x0 0x44 0x41

Compression

We can do a lot better by understanding the data we’re working with. These are altitudes, and so we can assume they’ll be between 0 and 30,000 meters. The altitudes also don’t have that much precision. By using these properties, we can chop the range into 216 little buckets, which will cut the amount of data we have to send in half. The first bucket will represent altitudes from 0m to 0.5m, the next 0.5m to 1m, the next 1m to 1.5m, all the way up to 32,767m to 32,767.5m.

How do we represent this and send it? Well, first we need a function that converts an altitude into its bucket number. 0.2m should go to bucket 0, 0.7m should go to bucket 1, 1.2m should go to bucket 2, etc. We can do this by multiplying by two (since there are two buckets for every meter) and rounding it.

int bucket = (int)(altitude * 2);

Then we need to convert this into a byte array with length two. We can do this with “masking”: taking only the bits in the first 8 positions, then taking the bits in the next 8 positions.

char encoded[2] = [(char)(bucket & 0b0000000011111111), (char)((bucket & 0b1111111100000000) >> 8)];

We’re down to 100 bytes!

0x0 0x0 0x0 0x0 0x1 0x0 0x1 0x0 0x2 0x0 0x2 0x0 0x3 0x0 0x3 0x0 0x4 0x0 0x4 0x0 0x5 0x0 0x5 0x0 0x6 0x0 0x6 0x0 0x7 0x0 0x7 0x0 0x8 0x0 0x8 0x0 0x9 0x0 0x9 0x0 0xa 0x0 0xa 0x0 0xb 0x0 0xb 0x0 0xc 0x0 0xc 0x0 0xd 0x0 0xd 0x0 0xe 0x0 0xe 0x0 0xf 0x0 0xf 0x0 0x10 0x0 0x10 0x0 0x11 0x0 0x11 0x0 0x12 0x0 0x12 0x0 0x13 0x0 0x13 0x0 0x14 0x0 0x14 0x0 0x15 0x0 0x15 0x0 0x16 0x0 0x16 0x0 0x17 0x0 0x17 0x0 0x18 0x0 0x18 0x0

Discussion

  1. We’ve more than halved the amount of data we need to send, but it’s actually possible to do better. Brainstorm other ideas for compression. Example answer: only send how much it rose or fell relative to the previous altitude. Since these differences are lower, we don’t need to use as much space for each element of the array.
  2. What are the problems with this approach? Example answer: it has a pretty low resolution. For example, 0.6m and 0.7m will both be decompressed as 0.5m.

Followup Projects

  • Have your students implement this compression algorithm on their Loonar flight computers
  • Have your students write the algorithm to decompress this data
  • Teach a class on “lossy” versus “lossless” compression
  • Have your students research other compression algorithms, such as Huffman encoding

--

--

Loonar Technologies
Loonar Technologies

We’re a High-Altitude Balloon startup that sells easy-to-use educational kits. Order one of our kits at https://loonar.tech/ and launch in minutes.