Let’s get our hand dirty with compression libraries.

Sahil Paudel
Codantastic
Published in
4 min readMay 26, 2020
Clicked By Sahil Paudel (IG: thehighpaudel)

Today, we will explore the land of data compression, we will go through multiple libraries available and write code to compress a file content and check which performs better.

Let's get started.

Snappy

Developed By: Google
Written in: C++
Plugins are available in Go, Java, NodeJS, Python, PHP, Ruby, and more.
Github stats: Stars: 4.1k, Fork: 712

Let's write some code using Golang.

package main

import (
"fmt"
"github.com/golang/snappy"
"io/ioutil"
"time"
)

func main() {

data, err := ioutil.ReadFile("./sample.txt")
fmt.Println("Original Length", len(data))
if err != nil {
fmt.Println("Error Ocurred While Reading The File")
} else {
compressionStartTime := time.Now()
out := snappy.Encode(nil, data) // start compression

fmt.Println("Compressed Length", len(out))

fmt.Println("Time Taken For Compression", time.Since(compressionStartTime))

decompressionTime := time.Now()
_, _ = snappy.Decode(nil, out) // start decompression
fmt.Println("Time Taken For Decompression",time.Since(decompressionTime))
}
}

We read a file as a byte array and compressed the data and record the length as well as the time taken by whole compression and decompression operation.

Output:

The output of the above code: go run snappy_test.go

LZ4

Developed By: Individual Contributors (Employee at Facebook)
Written in: C
Plugins are available in Go, Java, NodeJS, Python, PHP, Ruby, and more.
Github stats: Stars: 5k, Fork: 818

Let’s dive into the code and see the result.

package main

import (
"fmt"
"github.com/pierrec/lz4"
"io/ioutil"
"time"
)

func main() {

data, _ := ioutil.ReadFile("./sample.txt")
fmt.Println("Original Length", len(data)) // print original length

buf := make([]byte, len(data))
ht := make([]int, 64<<10) // buffer for the compression table

compressionStartTime := time.Now()
n, _ := lz4.CompressBlock(data, buf, ht)
if n >= len(data) {
fmt.Printf("Is not compressible")
}

buf = buf[:n] // compressed data
fmt.Println("Compressed Length", len(buf)) // print compressed length
fmt.Println("Time Taken For Compression", time.Since(compressionStartTime))

decompressionTime := time.Now()
// Allocated a very large buffer for decompression.
out := make([]byte, 10*len(data))
n, _ = lz4.UncompressBlock(buf, out)

out = out[:n] // uncompressed data
fmt.Println("Time Taken For Decompression",time.Since(decompressionTime))
}

Output:

The output of the above code: go run lz4_test.go

Blosc

Developed By: Blosc
Written in: C
Plugins are available in Python (Other language libraries are not maintained).
Github stats: Stars: 664, Fork: 113

Let’s write some code in Python shall we?

import blosc
import time

with open("/Users/sahilpaudel/Documents/idea/CompressorTest/src/sample.txt", 'rb') as f:
byte_array = f.read()
print("Original Length", len(byte_array))
start_time = time.time()
out = blosc.compress(byte_array, shuffle=blosc.BITSHUFFLE, cname='lz4')
print("Compressed Length", len(out))
print("Time Taken For Compression", str(round(time.time() - start_time, 6)), "s")

start_time = time.time()
data = blosc.decompress(out)
print("Time Taken For Decompression", str(round(time.time() - start_time, 6)), "s")
print(data == byte_array)

Output:

The output of the above code: python3 blosc_test.py

Brotli

Developed By: Google
Written in: C, C#, Java, JavaScript, Python, C++ (You add on…)
Plugins are available in C, C#, Java, Python, Go (All in the same package)
Github stats: Stars: 8.7k, Fork: 818

Shall we try this one too? Of course we will.

import brotli
import time
with open("/Users/sahilpaudel/Documents/idea/CompressorTest/src/sample.txt", 'rb') as f:
byte_array = f.read()
print("Original Length", len(byte_array))
start_time = time.time()
comp = brotli.compress(byte_array)
print("Compressed Length", len(comp))
print("Time Taken For Compression: " + str(round(time.time() - start_time, 6)), "s")

start_time = time.time()
deco = brotli.decompress(comp)
print("Time Taken For Decompression: " + str(round(time.time() - start_time, 6)), "s")
print(deco == byte_array)

Output:

The output of the above code: python3 brotli_test.py

Phew, that was a heck of compression going on there. We can see that Snappy & lz4 are best at the business but the others too are effective if we look at the only size of the compression.

We can see that Snappy is faster than lz4 on compression by almost 40–50% although the compression size is similar. And the speed of decompression is much faster in lz4 as compared to Snappy. We can opt-in this library based on requirements although there is not much difference.

Blosc compression is faster than lz4 compression but slower than snappy compression but has a huge compression ratio as compared to both other compression’s almost 30% difference in size. And also the decompression time is faster than snappy. The only downside is it is not as popular as snappy and lz4 but code push is very frequent on Github.

Please let me know what you think and also if you want me to add more libraries in my testing.

Be Safe :)

--

--

Sahil Paudel
Codantastic

Backend Engineer in love with Full-Stack. Develops in Java | Spring Boot | Elixir | NodeJS | ExpressJS | React | Go https://portfolio.sahilpaudel.in