Divide, compress and conquer: Building an Earth data server in Go (Part 2)

Pablo Rozas Larraondo
5 min readJan 3, 2018

--

This is the second article in a series of three. Here, we continue working with the NASA Blue Marble image proposed in the previous article, focusing on the effect of compression. The third article will use the concepts developed previously to create an Earth data server using the Google Cloud Platform.

Artistic representation of the NASA’s Blue Marble Next Generation image as a mosaic.

In the previous article, we saw how dividing a large image into smaller adjacent tiles can improve performance when accessing subsets of a large satellite image. The sample image is encoded using the PNG format, which significantly reduces the size of the file on disk. However, decoding this format is quite compute intensive and slow. In this article, we will see how PNG compares with other data compression methods and how performance accessing the image data can be improved.

Before moving into the different compression techniques, let’s have a look at some basic concepts of the PNG format. PNG, as opposed to other formats such as JPEG, offers a lossless image compression algorithm. This means that the decoded image is exactly the same as the original one. There is no quality or information lost in the encoding/decoding process. Lossless image formats are normally required for most satellite image processing applications.

The PNG encoding process is composed of two stages, called filtering and compression. The filtering stage tries different spatial patterns per channel to maximise redundancy. The compression stage applies the deflate algorithm to the output of the filtering stage. If you’re interested in finding out more about the internals of the PNG format, I recommend this excellent post by Colt McAnlis.

Our Blue Marble PNG image is encoded using an RGB colour model. This means that the image is stored interleaving the Red, Green and Blue values for each pixel. Unfortunately, Go’s image package doesn’t include an image.RGB type and instead defines a more generic image.RGBA, which adds a transparency or alpha channel increasing the memory footprint of this image.

In order to provide a fair comparison between the different compression algorithms, we will separate the RGB channels of the image. The following function shows how an RGBA image can be split into the different colour channels using the image.Gray type.

const (
fileName = "world.topo.bathy.200412.3x21600x10800.png"
)
func GetChannels() []*image.Gray {
data, _ := os.Open(fileName)
img, _ := png.Decode(data)
rgba := img.(*image.RGBA)
rect := rgba.Bounds()
ch1 := make([]byte, len(rgba.Pix)/4)
ch2 := make([]byte, len(rgba.Pix)/4)
ch3 := make([]byte, len(rgba.Pix)/4)
for i := 0; i < len(ch1); i++ {
ch1[i] = rgba.Pix[i*4]
ch2[i] = rgba.Pix[i*4+1]
ch3[i] = rgba.Pix[i*4+2]
// Alpha channel is not needed (opaque image)
}
red := image.Gray{Pix: ch1, Stride: rect.Dx(), Rect: rect}
green := image.Gray{Pix: ch2, Stride: rect.Dx(), Rect: rect}
blue := image.Gray{Pix: ch3, Stride: rect.Dx(), Rect: rect}
return []*image.Gray{&red, &green, &blue}
}

Each colour channel is stored in its own image.Gray instead of the original image.RGBA with interleaved colour pixels. The reason for doing this is to reduce the entropy of the data and facilitate the work of the different compressors. The compression stage in PNG applies the deflate algorithm per channel and, to be fair in our comparison, we are separating these channels too. Now we can use apply Go’s deflate compression to the pixels on each colour channel to see how they compare to the original PNG.

func FlateWriter(fName string, data []byte) {
outFile, _ := os.Create(fName)
flateWriter := flate.NewWriter(outFile)
flateWriter.Write(data)
flateWriter.Close()
}
func FlateReader(fName string) []byte {
data, _ := ioutil.ReadFile(fName)
buf := bytes.NewBuffer(data)
flateReader := flate.NewReader(buf)
var resB bytes.Buffer
resB.ReadFrom(flateReader)

return resB.Bytes()
}

Apart from deflate, the Go standard library offers a few other compression algorithms, such as gzip or lzw. Currently, there is great interest in developing new compression algorithms that prioritise speed over compression ratios. Examples of fast compression libraries available in Go are lz4 or snappy. Similarly to what we’ve done previously with deflate, we can write equivalent functions to encode and decode data for each of these algorithms providing a comprehensive comparison of their performance. The following plot represents the performance results when compressing the original Blue Marble image (size and speed results for each method are an average of the three colour channels). The x axis represents the access speed [ms] and the y axis represents the relative compression ratio [%].

Comparison between the different compression algorithms on the RGB channels. (Flate compression level=1)

As you can see in this plot, none of the used compressors matches PNG’s level of compression, which implies that PNG’s filtering stage is doing a pretty good job. On the other hand, Snappy offers the best reading speed, which is more than an order of magnitude faster than the PNG format. The Snappy library used in this experiment has been written using assembly instructions as opposed to the LZ4 implementation. In theory, LZ4 is faster than Snappy. Unfortunately, there’s not an equivalent assembly implementation available in Go yet. If you’re interested in this amazing field of fast compressors, I recommend reading Yann Collet and Francesc Alted’s work on LZ4 and Blosc.

We can now apply this new way of storing image data, using the Snappy compression, to the tiles proposed in the previous article getting an extra gain in performance. As with the Mosaic function defined in the previous article to assemble the PNG tiles, we can create similar functions to read the uncompressed and the Snappy compressed tiles.

prl9000$ go run get_region_tiles.go -lat 42 -lon -1 -chan 1
Generating PNG tile: 28.031036ms
Generating Raw tile: 17.357828ms
Generating Snappy tile: 15.885718ms

PNG results to be much slower than both Snappy and the uncompressed Raw formats. What’s really interesting is the fact that reading Snappy compressed files is sometimes faster than reading the uncompressed data. This effect is normally more noticeable when the data is stored on spinning disks, but I’ve run the tests on my laptop which has an SSD disk and Snappy is faster. Although this might seem surprising at first, it’s a well known effect in which the reading operation is dominated by the time it takes to load data from disk into memory. Compressed data is loaded faster than uncompressed data because it’s smaller and decompression can be performed really fast by the CPU — specially if data fits in the processor’s L1 cache. You can find more details here.

The complete code to reproduce these experiments can be found in the “part2” folder of this repository. As for the previous article, the functionality is exposed as stand-alone programs that receive the latitude, longitude and colour channel values as command line flags.

In the next article, we’ll finally expose our Earth Data Service written in Go on the Google Cloud Platform.

--

--

Pablo Rozas Larraondo

Geospatial Data Analyst @NCInews scalable data services | deep learning | weather forecasting | go | python | surfing