This is the 2nd article in our MAFAT Radar competition series, where we take an in-depth look at the different aspects of the challenge and our approach to it. If you want a recap, check out this post.
Let’s jump straight in.
The competition organizers give a clear explanation of the data they provide:
The dataset consists of signals recorded by ground doppler-pulse radars. Each radar “stares” at a fixed, wide area of interest. Whenever an animal or a human moves within the radar’s covered area, it is detected and tracked. The dataset contains records of those tracks. The tracks in the dataset are split into 32 time-unit segments. Each record in the dataset represents a single segment. A segment consists of a matrix with I/Q values and metadata. The matrix of each segment has a size of 32x128. The X-axis represents the pulse transmission time, also known as “slow-time”. The Y-axis represents the reception time of signals with respect to pulse transmission time divided into 128 equal sized bins, also known as “fast-time”. The Y-axis is usually referred to as “range” or “velocity”
The following datasets were provided:
- 5 CSV files (Training set, Public Test set, and Auxiliary set (3 files)) containing the metadata,
- 5 pickle files (serialized Python object structure format) containing doppler readings that track the object’s center of mass and slow/fast time readings in the form of a standardized I/Q matrix.
The Auxiliary datasets consisted of:
- An Auxiliary “Experiment” Dataset of human only labeled recordings, but they were recorded in a controlled environment, which doesn’t necessarily reflect a “natural” recording.
- An Auxiliary “Synthetic” Dataset with low SNR segments that were created by transforming the high SNR signals from the train set.
- An Auxiliary “Background” Dataset — Segments that were recorded by a sensor in parallel to segments with tracks but at a different range. These segments contain the recorded “noise.” Each segment also contains a field mapping to the original High or Low SNR track id.
Braden Riggs & George Williams from GSI Technology — SPOILER ALERT: they were the winning team — wrote a very thorough post at the start of the competition where they provide a great overview of the dataset and give key insights into the challenges posed by it. We’ll give a summary below, and for those who want to read the whole thing, it’s available here:
Is It Human or Is It Animal? Target Classification With Doppler-Pulse Radar and Neural Networks
The Role of Radio Signals in Distinguishing Between Targets and MAFAT’s Latest Data Science Challenge.
The radar data had a few different important characteristics worth explaining:
Signal-to-Noise Ratio (SNR)
The SNR refers to the quality of the signal that produced the data i.e. to what degree the signal was generated by the movement of the target as opposed to some other internal or external noise-generating process, for example the weather, or the inherent noise of the machine.
An I/Q Matrix is a N x M matrix with complex values, in our case 32 x 128. The real and imaginary parts result from the amplitude and phase components of the doppler radar reading. In short, even though the radar is picking up a very complicated wave, it can still be described using only the amplitude and phase of two sinusoidal signals in quadrature. For a good explanation, read this more lengthy description. Each row corresponds to a “slow-time” radar pulse while the columns are a point in the “fast-time” reading of the reflected signal, which corresponds to the distance from the origin.
If you want to do a deep dive — MIT has a lecture series just for the courageous few:
The doppler burst reading is a vector indicating the location of the “center of mass” for each long-time radar burst.
Segments vs Tracks
The data was originally recorded in tracks, which did not have a set time length. However, they were split into 32 second time frame segments, and we needed to predict a classification from just the 32 second segment. While in the train data we were given the track-id for the given segments (and therefore could theoretically restitch it together) in the test data we did not know the track ids and therefore couldn’t rely on a longer timeframe to use for prediction.
The small size of the training data
The Training set consisted of only 6656 segments, while the test set had 106 segments. To put that into some perspective, the CIFAR-10 dataset has 60000 images, and the Image Net Dataset has over 14 Million. In short, we’d need to generate a lot more data if we’d want to use any Deep Learning algorithms as a classifier.
Signal to Noise Ratio Imbalance
There was a 1.7:1 ratio of Low SNR to High SNR segments in the train set. Not only were the segments inconsistent with their SNR, but the overwhelming majority (~2/3) of them were extremely noisy.
In the test set, the lowSNR:HighSNR ratio was much more balanced, closer to a 1:1.
There was a majority of animal segments/tracks in the training data, which would inevitably create a bias in any model towards predicting animals. Again in the test set, the ratio of labels was more balanced.
The Scoring Metric
To quote the official website:
Submissions are evaluated on the Area Under the Receiver Operating Characteristic Curve (ROC AUC) between the predicted probability and the observed target as calculated by roc_auc_score in scikit-learn (v 0.23.1).
If you’re unfamiliar with ROC AUC — check out this article.
The next article in the series will outline how we dealt with the primary limitation we saw, namely the limited amount of training examples, by going deeper into the data augmentation techniques we utilized for this challenge. Stay tuned!