Loading Binary Files in Python that Were Written Using C Structures
In C/C++ applications it’s usual to use structures to group related data registers. The data contained into these structures can be persisted/loaded into/from a file through a serialization/deserialization process.
During the development of one of my pet projects I needed to analyze my workout statistics registered in my Geoanute ONMove 220 GPS watch. In a previous research, I discovered that this watch stores the workout statistics and the GPS track log into binary files. The content of these files is defined by C structures, so I had to do some more research. Fortunately, I discovered the omx2gpx project and I was able to know the format of these structures.
The workout format
The watch registers the workout into two separate files: a OMH file containing the workout data, such as the distance, the time and the date, and a OMD file containing the GPS track data.
In this post I´ll show you how to load the workout data stored into the OMH file using Python. As I said before, this file holds the serialized version of a structure containing the data. In the following gist we can see the format of this structure stored into the OMH file:
Loading the data by reading the raw bytes
A first approach to load this data is just to open the file in binary mode and read the different fields in byte chunks, performing the needed conversions:
Notice that, inside the file, the structure registers are stored sequentially. Also, we have to keep in mind the byte ordering of these registers.
This is a valid and simple approach but I thought that there may be a more elegant way to do this. That was how I discovered the struct module from the python standard library.
Loading the data using struct
As we can see in the docs, the struct module performs conversions between Python values and C structs represented as Python bytes
objects. We can use this module to handle binary data stored in files or network connections, among other sources.
Using this module is simple: we need to define a declarative format string to specify the type of data being packed/unpacked. Also, with some special characters we can control the Byte Order, Size, and Alignment.
In the following table we can see the available format characters with their type equivalences:
With these format characters we declare the format string that specifies the structure. Then, loading the workout data is pretty simple now:
We only need to pass two arguments to the unpack
function: the format string <I4H8BH18B5H10B
and the byte data. With the first character of the format string we specify the byte ordering: we’re using <
to tell the unpack
function that the registers are stored using little-endian byte ordering. Next, we use the format characters to describe the format of the structure registers: one unsigned integer I
for the distance (field 0); four unsigned short 4H
for the time (1), average speed (2), max speed (3) and kilocalories (4); eight unsigned char 8B
for the average heart rate (5), max heart rate (6), year in short format (7), month (8), day (9), hour (10), minute (11) and file number; one unsigned short H
; eighteen unsigned char 18B
; five unsigned short 5H
and ten unsigned char 10B
.
Notice that although we’re only getting some registers of the struct we must provide a format string for the whole structure.
And that’s it. Using the struct module we can read binary files in a simple way. I hope you found this post useful :)