Prasanna
Prasanna
Nov 5 · 2 min read

MDF4 Wrapper

I have got an opportunity to work for a proof of concept regarding integration of MDF4 file with Apache Spark.
MDF is expanded to Measurement Data Format. It is used to efficiently store the measurement data.
It is a binary file format and defacto standard in automotive field. Benefits of MDF4 includes it allows high data rates for both reading and writing,
is the measurement of ECU software and sensor values, compatibiltiy with earlier versions of MDF files, ability to store measurement data as memory space-saving and fast storage.
MDF contains both raw data and its associated metadata. Opensource and commercial solutions are available to process MDF4 files.
But the problem with MDF4 files is they are not splittable in nature since they are binary files. So, they are not suitable to store and parallelly process in Bigdata syatems like Apache Hadoop and Apache Spark.
To overcome this problem, we have to get fixed size of MDF4 files like 1GB and 2GB etc and store them in Hadoop’s HDFS.
In typical advanced driver assistance systems (ADAS), a hardware logger is attached to the vehicle which captures all the data from RADAR, LIDAR, camera etc and stored as multiple MDF4 files with 2 GB limit.
These files are stored on an SSD attached to logger and then transferred using a physical cable to a server and then to Hadoop’s HDFS. The hardware logger has very high bandwidth and writes data very quickly in FIFO basis.
The MDF4 files that are tested by us are unsorted, have multiple small blocks for holding metadata, have one contiguous big block for storing a record payload data.
There is an opensource Python library called ASAM MDF to read and perform operations on MDF files.
This library is not fully compatible to use in Apache Spark. I will demonstrate an approach as a starting point to parallelly process MDF4 files
using ASAM MDF library. The cgeneral challenges to deal with MDF4 files parallel processing includes MDF files are of dynamic length in nature and don’t have clear boundaries.

Since we cannot split MDF4 files, we need to hint Hadoop’s HDFS that the minimum block size should be of 2GB in size, which will benefit us the way the data is stored in HDFS.
and it helps us to process one file to one input split. The configuration dfs.blocksize shuld be changed to 2GB.
Hadoop configuration change => dfs.blocksize=2GB

Along with this option, Apache Spark provides a way to read binary files.
The detailed documentation and code is provided in Github code repository below.
https://github.com/ERS-HCL/MDF4Wrapper


References:
https://www.asam.net/standards/detail/mdf/wiki/
http://spark.apache.org/docs/latest/api/python/pyspark.html?highlight=parallelize

MDF4 wrapper

provides a starting point to integrate MDF4 files with Apache Spark and decoding images from MDF4 files

Prasanna

Written by

Prasanna

MDF4 wrapper
Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade