What’s this?

Tarring files with Elm

On November 14, 2018, Evan Czaplicki released two new Elm packages, Bytes, and File. The packages have a simple, elegant interface and will open many new possibilities for Elm developers.

As a simple use case for the new packages, I will describe a library, jxxcarlson/elm-tar, which one can use to create a tar archive from a list of strings and binary data, the latter represented as Bytes values. One can then download the archive using File.download. (I needed this for exporting LaTeX files and image files for the MiniLatex app hosted on knode.io. But there should be many other applications.)

The library has fairly small API:

module Tar exposing (Data(..), FileRecord, encodeFiles, 
encodeTextFiles, defaultFileRecord)

The Data type is used to discriminate between strings and binary data:

type Data = StringData String | BinaryData Bytes

We will show below how it is used.

Archiving Text

Suppose you have some text data, like this

content1 =
"This is a test (ho ho ho).\nIt is a frabjous day!"
content2 = 
"Also a test: The green meanies awoke today in a fine mood."

For each piece of content we need to create a FileRecord — this is required by tar. There are many fields in a file record, so we supply a default, then modify it to suit our data:

import Tar exposing(defaultFileRecord)
fileRecord1 =
{ defaultFileRecord | filename = "text1.txt" }
fileRecord2 =
{ defaultFileRecord | filename = "text2.txt" }

Next we encode the data:

bytes = Tar.encodeTextFiles [ 
( fileRecord1, content1 )
, ( fileRecord2, content2 ) ]

Finally, we download the Bytes data as a tar archive:

File.Download.bytes "myArchive.tar" "application/x-tar" bytes

That’s all there is to it.

Archiving arbitrary data

Now suppose we want to tar both text and binary data. To make a silly example, first use the function Hex.toBytes from the package jxxcarlson/hex:

content3 =
Hex.toBytes
"B0C1D2E4F4"
|> Maybe.withDefault (encode (Bytes.Encode.unsignedInt8 0))

We can tar all the content created so far like this:

tarArchive = Tar.encodeFiles
[ ( fileRecord1, StringData content1 )
, ( fileRecord1, StringData content2 )
, ( fileRecord3, BinaryData content3 )
]
|> encode

where we have created a suitable fileRecord3 . To download the archive, we proceed as before:

File.Download.bytes "tarArchive.tar" "application/x-tar" tarArchive

A Demo App

For a demo app, see the source code for the tar package. There you will find /examples/Main.elm which you can compile using elm make to create an app residing in index.html. Click onindex.htmlto run the app. Behind the scenes, it loads two images from given URLs, creates bytes values for them, then downloads a tar archive with the two (uncompressed) images. You should be able to click on the downloaded archive, test.tar, or use tar xvf test.tar to untar the files. Pop quiz: what are the images?

The Development Process

I used the description of the tar file format on Wikipedia to write the encoders. Each file is encoded as a 512 byte file record with information such as filename, permissions, last modification date, etc. There is also a 12-byte checksum field, which is computed by adding the bytes of the file record, where the initial checksum is a sequence of twelve blanks (ASCII encoded). The twelve blanks are replaced by the checksum. The file record is followed by the data, which must be padded with nulls so that the padded data consists of a multiple of 512 bytes. Call the header plus the padded data a tarred file. A tar archive consists of a sequence of tarred files placed end-to-end, followed by two 512-byte blocks of nulls.

Here is the encoder for text strings:

encodeTextFile : FileRecord -> String -> Encode.Encoder                       encodeTextFile fileRecord_ contents = 
let
fileRecord = { fileRecord_ | fileSize = String.length contents }
in
Encode.sequence [ [
encodeFileRecord fileRecord
, Encode.string (padContents contents)
]

The Encode.sequence function is used to pack bytes end-to-end. It is used repeatedly in the definition of encodeFileRecord to build up the required sequence of bytes.

It wasn’t easy to get the encoder to work — it is an all-or-nothing matter. To help, I wrote another package, jxxcarlson/hex, to create Bytes values and to convert Bytes values to strings of hexadecimal digits so that I could look at them. Here is an example:

$ elm repl
> import Hex exposing(..)
> import Bytes.Encode as Encode exposing(encode)

> encode (Encode.string "Hello") |> Hex.fromBytes
"48656C6C6F" : String
> Hex.toBytes "FF66" |> Maybe.map Hex.fromBytes
Just "FF66" : Maybe String

Although this helped in the initial stages, the tar archives created were still invalid. I eventually had to resort to experimental science, making a tar archive as described above, downloading it, examining it with a hex editor, and comparing it, again, with an archive created with tar cvf. That way I could spot the differences between a valid tar archive and the one I made with Elm. After some detective work, which included generous use of pencil and paper, I was able to resolve the differences to create a valid archive using pure Elm.