Loading Binary Files in Python that Were Written Using C Structures

Martín Lamas
Trabe
Published in
3 min readJul 27, 2020
Photo by asoggetti on Unsplash

In C/C++ applications it’s usual to use structures to group related data registers. The data contained into these structures can be persisted/loaded into/from a file through a serialization/deserialization process.

During the development of one of my pet projects I needed to analyze my workout statistics registered in my Geoanute ONMove 220 GPS watch. In a previous research, I discovered that this watch stores the workout statistics and the GPS track log into binary files. The content of these files is defined by C structures, so I had to do some more research. Fortunately, I discovered the omx2gpx project and I was able to know the format of these structures.

The workout format

The watch registers the workout into two separate files: a OMH file containing the workout data, such as the distance, the time and the date, and a OMD file containing the GPS track data.

In this post I´ll show you how to load the workout data stored into the OMH file using Python. As I said before, this file holds the serialized version of a structure containing the data. In the following gist we can see the format of this structure stored into the OMH file:

Loading the data by reading the raw bytes

A first approach to load this data is just to open the file in binary mode and read the different fields in byte chunks, performing the needed conversions:

Notice that, inside the file, the structure registers are stored sequentially. Also, we have to keep in mind the byte ordering of these registers.

This is a valid and simple approach but I thought that there may be a more elegant way to do this. That was how I discovered the struct module from the python standard library.

Loading the data using struct

As we can see in the docs, the struct module performs conversions between Python values and C structs represented as Python bytes objects. We can use this module to handle binary data stored in files or network connections, among other sources.

Using this module is simple: we need to define a declarative format string to specify the type of data being packed/unpacked. Also, with some special characters we can control the Byte Order, Size, and Alignment.

In the following table we can see the available format characters with their type equivalences:

With these format characters we declare the format string that specifies the structure. Then, loading the workout data is pretty simple now:

We only need to pass two arguments to the unpack function: the format string <I4H8BH18B5H10B and the byte data. With the first character of the format string we specify the byte ordering: we’re using < to tell the unpack function that the registers are stored using little-endian byte ordering. Next, we use the format characters to describe the format of the structure registers: one unsigned integer I for the distance (field 0); four unsigned short 4H for the time (1), average speed (2), max speed (3) and kilocalories (4); eight unsigned char 8B for the average heart rate (5), max heart rate (6), year in short format (7), month (8), day (9), hour (10), minute (11) and file number; one unsigned short H; eighteen unsigned char 18B; five unsigned short 5H and ten unsigned char 10B.

Notice that although we’re only getting some registers of the struct we must provide a format string for the whole structure.

And that’s it. Using the struct module we can read binary files in a simple way. I hope you found this post useful :)

--

--