Let’s get our hand dirty with compression libraries.
Today, we will explore the land of data compression, we will go through multiple libraries available and write code to compress a file content and check which performs better.
Let's get started.
Developed By: Google
Written in: C++
Plugins are available in Go, Java, NodeJS, Python, PHP, Ruby, and more.
Github stats: Stars: 4.1k, Fork: 712
Let's write some code using Golang.
package main
import (
"fmt"
"github.com/golang/snappy"
"io/ioutil"
"time"
)
func main() {
data, err := ioutil.ReadFile("./sample.txt")
fmt.Println("Original Length", len(data))
if err != nil {
fmt.Println("Error Ocurred While Reading The File")
} else {
compressionStartTime := time.Now()
out := snappy.Encode(nil, data) // start compression
fmt.Println("Compressed Length", len(out))
fmt.Println("Time Taken For Compression", time.Since(compressionStartTime))
decompressionTime := time.Now()
_, _ = snappy.Decode(nil, out) // start decompression
fmt.Println("Time Taken For Decompression",time.Since(decompressionTime))
}
}
We read a file as a byte array and compressed the data and record the length as well as the time taken by whole compression and decompression operation.
Output:
Developed By: Individual Contributors (Employee at Facebook)
Written in: C
Plugins are available in Go, Java, NodeJS, Python, PHP, Ruby, and more.
Github stats: Stars: 5k, Fork: 818
Let’s dive into the code and see the result.
package main
import (
"fmt"
"github.com/pierrec/lz4"
"io/ioutil"
"time"
)
func main() {
data, _ := ioutil.ReadFile("./sample.txt")
fmt.Println("Original Length", len(data)) // print original length
buf := make([]byte, len(data))
ht := make([]int, 64<<10) // buffer for the compression table
compressionStartTime := time.Now()
n, _ := lz4.CompressBlock(data, buf, ht)
if n >= len(data) {
fmt.Printf("Is not compressible")
}
buf = buf[:n] // compressed data
fmt.Println("Compressed Length", len(buf)) // print compressed length
fmt.Println("Time Taken For Compression", time.Since(compressionStartTime))
decompressionTime := time.Now()
// Allocated a very large buffer for decompression.
out := make([]byte, 10*len(data))
n, _ = lz4.UncompressBlock(buf, out)
out = out[:n] // uncompressed data
fmt.Println("Time Taken For Decompression",time.Since(decompressionTime))
}
Output:
Developed By: Blosc
Written in: C
Plugins are available in Python (Other language libraries are not maintained).
Github stats: Stars: 664, Fork: 113
Let’s write some code in Python shall we?
import blosc
import time
with open("/Users/sahilpaudel/Documents/idea/CompressorTest/src/sample.txt", 'rb') as f:
byte_array = f.read()
print("Original Length", len(byte_array))
start_time = time.time()
out = blosc.compress(byte_array, shuffle=blosc.BITSHUFFLE, cname='lz4')
print("Compressed Length", len(out))
print("Time Taken For Compression", str(round(time.time() - start_time, 6)), "s")
start_time = time.time()
data = blosc.decompress(out)
print("Time Taken For Decompression", str(round(time.time() - start_time, 6)), "s")
print(data == byte_array)
Output:
Developed By: Google
Written in: C, C#, Java, JavaScript, Python, C++ (You add on…)
Plugins are available in C, C#, Java, Python, Go (All in the same package)
Github stats: Stars: 8.7k, Fork: 818
Shall we try this one too? Of course we will.
import brotli
import timewith open("/Users/sahilpaudel/Documents/idea/CompressorTest/src/sample.txt", 'rb') as f:
byte_array = f.read()
print("Original Length", len(byte_array))
start_time = time.time()
comp = brotli.compress(byte_array)
print("Compressed Length", len(comp))
print("Time Taken For Compression: " + str(round(time.time() - start_time, 6)), "s")
start_time = time.time()
deco = brotli.decompress(comp)
print("Time Taken For Decompression: " + str(round(time.time() - start_time, 6)), "s")
print(deco == byte_array)
Output:
Phew, that was a heck of compression going on there. We can see that Snappy & lz4 are best at the business but the others too are effective if we look at the only size of the compression.
We can see that Snappy is faster than lz4 on compression by almost 40–50% although the compression size is similar. And the speed of decompression is much faster in lz4 as compared to Snappy. We can opt-in this library based on requirements although there is not much difference.
Blosc compression is faster than lz4 compression but slower than snappy compression but has a huge compression ratio as compared to both other compression’s almost 30% difference in size. And also the decompression time is faster than snappy. The only downside is it is not as popular as snappy and lz4 but code push is very frequent on Github.
Please let me know what you think and also if you want me to add more libraries in my testing.
Be Safe :)