Help Robots Navigate!!!

Rijul Vohra
3 min readApr 28, 2019

--

This was a Kaggle Competition. Although I could not participate in the competition while it was running but I worked out on the problem later. The problem is multi-classification problem wherein we have to predict the surface on which the robot is standing.

Data Exploration

The data contains features whose values are calculated using sensors. The features in the dataset are Orientation, Linear Acceleration, Angular Velocity, Series Id and Measurement Number. The key point to note is that orientation is given across four dimensions (i.e. x,y,z,w) also known as quaternion.

The data folder had three files: train.csv, test.csv and y_train.csv (included the output of the dependent variable)

The train set and test had 3810 and 3816 series id respectively along with 128 measurement values for each series id .

The problem was multi-classification problem with 9 classes as possible outputs.

Count Plot for each of the class

I further plotted the values for each of the feature for the 0th series.

The inference drawn from this plot was that there is a lot of noise in the angular velocity and linear acceleration plots. This noise can be reduced by taking a Fast Fourier Transform of these features as has been done by many kernels on Kaggle for this problem.

Feature Engineering

I used statistics to engineer new feature in the test and train dataset. So I generated features like mean, median, max, min, mean absolute deviation, rolling mean and many more.

Model Building

  1. I encoded each of the classes using the train_cats() and proc_df() function of fastai v 0.7.
  2. I built a Random Forest with initially with 120 estimators, min_samples_leaf being set to 5 and max_features being set to 0.5.
[RMSE(Train), RMSE(Validation), R2 score(Train), R2 score(Validation), oob score]

Thereafter, I figured out the important features by using the feature importance plot and only kept those features which had an importance value of greater than 0.005. So the number of features reduced to 55.

Removing Redundant Features

I drew a dendrogram to figure out the redundant features, dropped each of them and checked the score before permanently dropping them from the dataset.

Dendrogram

On figuring out the redundant features, I individually dropped each of the redundant feature and checked the oob_score. The score did not drastically change on dropping each of these features one at a time from the dataset. Hence I decided to permanently drop:

['orientation_Z_mean','orientation_Z_median','orientation_Y_mean','orientation_Y_median','orientation_X_mean','orientation_X_median','orientation_W_mean','orientation_W_median']

Cross Validation

The overall performance can be improved by denoising some of the noisy features using FFT. One could even convert the quaternion axis to euler angle. There could also be possible improvements in score on building other classifier models like LightGBM.

The entire code can be found at

  1. https://www.kaggle.com/rijboy/help-robot-navigate-random-forest-fastai
  2. https://github.com/rijulvohra/Help-Robots-Navigate

References

  1. https://www.kaggle.com/prashantkikani/help-humanity-by-helping-robots

2. https://www.kaggle.com/artgor/where-do-the-robots-drive

--

--

Rijul Vohra

Data Science | Machine Learning | Deep Learning Enthusiast | Graduate Student | Viterbi School of Engineering | University of Southern California