compress-tools: A Swiss Army Knife for handling compressed data in Rust

Jonathas Conceição
O.S. Systems
Published in
4 min readApr 15, 2020
Someone working on computer and laptop

Here at O.S. Systems we have been working with Rust for a over a year now.
From type safety and a well designed syntax, to high performance and nice memory usage, Rust has been a great tool for the new products we have been working and even for older ones we are migrating.

Working with Embedded Linux often brings into discussion some common issues to the table when specifying a new product; we can mention binary size, disk usage, memory scalability just to name a few. The possibility to use dynamically linked libraries is mostly a welcome solution to tackle this problems.

We present here our motivation into developing compress-tools, a Rust library we have created to offer some of libarchive’s functionalities with handy integration to Rust’s type system.

Rust itself has a really nice and robust compression and decompression library called flate2, with support for DEFLATE based compression formats. Its has a native implementation and support for a libz and miniz backends for the user to choose from, it has served us really well in past projects. However, for the new version of Updatehub Agent, which is been fully rewritten in Rust, we are working on we need to support for a larger pool of compression and archive formats, leading us to search for alternatives.

libarchive was already used in the original version of the Updatehub Agent for handling this compressed objects, and being able to handle so many compression and archives formats made it our main choice once again. Rust’s package registry, unfortunately, didn’t seem to have any library for libarchive usage being actively maintained to the best that we could find. This crate published as libarchive would probably serve well but it has not seen a release in over 4 years, and there was basically no activity on the issues and pull requests open on their Github repository.

Maintaining a full port of libarchive with the Rust type safe binds to make it more usable is not a simple task, but what we need for this project at this point is just a small set of it’s functionalities, which shouldn’t be hard to use nor to maintain. That was when we decided to create compress-tools offering some simple API to handle compressed data that we can ingrate our embedded software projects in Rust.

Updatehub is an enterprise-grade solution for Software and Firmware Update Over The Air. The update packages are always compressed to improve network usage and multiple different formats of compression are supported. The Updatehub Agent is the application that runs on the embedded device to install the updates, hence having to deal frequently with uncompressing packages.

Initial version of compress-tools was aimed at better defined the API we would need to provided in order to use it on Updatehub as well as to test a hole execution cycle of the Updatehub Agent implementation we had at the time. To avoid complexity we started a initial implementation that instead of using the libarchive library, just did the objects uncompressing using shell commands under the hood. After testing with disk space and memory usage we had defined three main functions that would need to be implemented by the compress-tools in order to have it fully meet our needs:

compress-tools function’s prototypes

The functions should all be able to handle a generic reference to a std::io::Read, so the compressed data can come straight from memory, from a buffered OS file read, or even a special read object that can support timeouts to handle possible disk fails that are fairly common with flash memory devices as provided by the timeout-readwrite crate.

The output would be a generic std::io::Write for most cases so it can also be manipulated more freely, but when uncompressing an archive file, a single Write object won’t do the trick, and there are always proprieties of the files we need to preserve, such as ownership and permissions, so for this case the function actually accepts a Path as it’s target and write it’s output straight to disk.

Using this compress-tools handy functions on Updatehub Agent has allowed us to have more control when handling compressed data. This are some of the changes we were now able to made:

  • We can now read the Metadata structure stored inside a uhupkg (Updatehub’s update package format) as well as validate it’s Digital Signature without the need to extracting the hole package, greatly improving disk usage.
  • We were able to implement our Raw Install mode, that installs a update from a compressed source directly into one of the devices’s disk, much like a gzip -d foo.gz | dd of=/dev/bar would do in plain shell.
  • The Tarball Install Mode can install files while keeping the expected permissions, ownership and timestamps without the need of the tar to be installed in the Embedded device.

Another advantage of using libarchive for handling compression on this IoT context is that the library itself can be built with different configurations depending on what the device will have to support, which works really well to adapt it to the embedded device constrains.

Any new requirement that we might have for using libarchive with Rust now will be handled though our new compress-tools crate. Other handy functions might become necessary and we consider at some point adding support for compressing and archiving objects there too, so feel free to follow the repository over on Github to keep track of new releases.

Be sure to check the compress-tools documentation if you want more details and examples on how to use this tools.

--

--

Jonathas Conceição
O.S. Systems

Computer Scientist from Brazil. Interested in programming languages, concurrent and parallel systems, high performance, open source software, GNU/Linux and IoT.