Gowtham Dongari
Aug 12, 2017 · 4 min read

Understanding the problem:
Metal detecting strips are fixed on a Highway which take the recordings every second given a sample of the data like below -

Understanding the data and analysing it
1) How many readings will be noted for a vehicle to cross the detector
2) Based on the given data we can see the average number of readings a vehicle took to cross is well differentiated.

For the given data lets try to classify it to different categories without a model.

Given- All the vehicles are moving at a speed of 15 kmph

Let us check the upper and lower boundaries of the data by observing the mean and standard deviation for vehicle types.

Finding out the six sigma range and with the aim for an accuracy of 99%
we build on it further.

column: ampere_reading
datatype: float64

vehicle — — — — — mean — — — — -3 sigma — — — — +3 sigma
Bus — — — — — — 1.49000— — — 1.063620 — — — — 1.916380
Car — — — — — — 5.246667 — — — 3.498067 — — —6.995266
Motorbike — — —- 0.450000 — — 0.237868 — — — — 0.662132

We included this +/- 3 sigma levels as our new feature for our data, for this our assumptions are:
1) The data provided is the sample representation of our data.
2) Range of vehicles does not overlap into each other.

Result:- With this approach we can find out the type of vehicle passing through the metal strip without deploying the model.

CREATING DATA for building our model.

Now we start to create the data which replicates our sample data where a machine learning model can be deployed. For creating the data we consider the average lengths of vehicles as follows :
bike — 1.8m, car — 4.5m, bus — 14m

As per the given condition the speed at which each vehicle travelling is 15 km/hr i.e. 4.1666 m/s
Now let us assume the length of the metal detector is 0.1 m and time taken for covering its distance for any vehicle is negligible.

so for bike time taken to cover its distance is 1.8/4.166 = 0.432 seconds
so for car time taken to cover its distance is 4.5/4.166 = 1.080 seconds
so for bike time taken to cover its distance is 14/4.166 = 3.360 seconds

We calculated the average time taken for the vehicles to cover its own length
accordingly we created a new data-set ,
1) Occurences provide us choice of repetition for vehicles based on the typical time taken for the vehicle to cross the metal strip.
2) we select how many times the vehicles are to appeared every time while crossing the metal strip.
Defined the data and distributed it randomly for resembles of an actual data recorded.
Here we see the distributions of the data. Let us create some features which helps in feeding it to the model to learn.

— — The above graphs shows us how the data is distributed — —

Creating new features
1) Created polynomial features x², x³
2) Captured moving average of 3 ampere readings as bike has atleast 2 consecutive occurences and this will gives the jump in the average as a vehicle has changed
3) Now this average has been assigned even at the ampere readings are zero or negative values to counter this created a new feature ampere variation where in this problem is solved.

Fitting the model

We used RandomForestClassifier: A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the data-set and use averaging to improve the predictive accuracy and control over-fitting. The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement if bootstrap=True

for more information about random forest classifier link1, link2.

This approach has certain assumptions and works best for the given scenario only
Further developments if you can try
1) if the speed of vehicles is 60–80 kmph
2) if there are multiple detectors on a highway where 1000’s of vehicles passing through it


GreyAtom is committed to building an educational ecosystem for learners to upskill & help them make a career in data science.

Gowtham Dongari

Written by

sailing in a never ending sea of data!! 😉



GreyAtom is committed to building an educational ecosystem for learners to upskill & help them make a career in data science.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade