BMW used cars analysis.

Rodrigo Antonio Sebben
6 min readJan 16, 2022

--

This blog post is part of Udacity Data Scientist Nanodegree Program. The analysis is described here, the code can be found at GitHub and the Notebook here.

Introduction:

As a BMW insider, it would be nice to do this first project using a data set directly correlated to my day-to-day business activities.

This course is my first touch at data science. It’s good to develop and learn the concepts and apply this to something correlated to my job and possible future activities.

With that in hand, I started to search for BMW related data set and found a list of BMW cars that had the potential to answer some good questions like:

1. Which is the most desirable model that BMW produce?

A series 3 model is one of the BMW most sold cars. Is this true for all years? To answer this question, I need to understand which is the most desirable car produced every year.

2. What model lose more market value?

It’s also common sense that a used car loses value every year, but which car lost the most by year. Is good to know this information because it can help later to understand, for instance, if is there something wrong with a specific unity produced or to start a further investigation that could improve the quality of future cars.

3. What model gain more market value?

While some cars might lose market value, some may gain. It might happen due to a change of market desired, to a movie that shows a specific model, etc. So, it would be nice to know which car grow the most market value by year.

To start answering any of the questions we need at first understand how much of the data is trustful. If we are going to study base on sales, let’s first check how many cars was sold by year, according to this data set.

As can be seen at the image, the number of cars sold before 2013 might bring some wrong conclusion but I will conclude it any way as a meter of study. As the code was created assuming that the amount of data would be substantial, once that data set is incremented, new results can be concluded.

1. Which is the most desirable model that BMW produce?

Knowing which is the most desirable model can help the company to focused both sales and production. Measuring this constantly can help the company to understand if their marketing and sales strategies are working correctly.

The first thing that need to be done is to group the data set by model and year and the calculate by each year the number of cars sold by model. The result it will be something like this.

With the current analysis it is possible to see that the model 3 series was the model that was mostly sold by 3 years and 1 Series by 2 years. By the information in hand is possible to say that from 2016 and 2020 the 3 Series model was the dominant model. Now let’s calculate which of the models are the dominant model across all years.

Looking to the full data is possible to see that 3 Series dominate with 60% of the time being the best seller. Looking to smaller dataset (2020–2016) it was possible to conclude that there was no clear dominance as 3 Series had 3 and 1 Series had 2 but looking to overall picture is easy to say that 3 Series is the dominant model.

2. What model lose more market value?

It’s important to understand which car lost more market value so the brand can study, for instance, if there was any architectural or concept failure so that they avoid the same error again.

To start the analysis, it’s important to first calculate the mean price of a model per year.

With that in hand, now we need to calculate the drop price comparing to the same model of the year after (e.g., if the current is 2014 5 series, compare it to the 2015 5 series and calculate the drop).

Now we can calculate the biggest absolute and proportional drop. First, we need to check which model was the one with the highest drop and the select that model to further analysis.

After calculating it all, the model with the biggest absolute drop was 2018 M3 with a total of U$ 22.700,50 of market value.

Looking to the model behavior is possible to see that even though it lost to much value from 2019 to 2018, the same model did not have any other high drop like that one. By the opposite, it gains some market value after that drop, which might indicate a problem with the 2018 model.

With this kind of analysis, the brand could run an ingestion with the 2018 model and search for the root cause of the drop in order to avoid future models with the same behavior.

Looking now to the proportional drop, the model that lost more value was the 5 Series with 57,4007% lost.

As can be seen with the graph, the drop was caused by a gain of market value of the year before model. If we compare the value of the car ignoring the 2003 price to all the values from 2012 to 2010, the drop was not that significant, but as the car has the growth of value in 2003, this massive drop occurs.

3. What model gain more market value?

To calculate if a specific model gain value is the same process described for the lost. The difference is that once the price grow, the drop value becomes negative, so, if instead of checking for the max value, we need to check for the minimum value.

After the analysis, it was possible to check that the same car had the highest absolute and proportional gain. The model was the 5 series which grow U$10.700 of market value from 2014 to 2013 which represent an astonish 251,76% growth

Conclusion

During the study two main lessons were learned.

The first one was during the dominance study. During the study the I’ve realized that the overall information was as important as the small sets information. As the overall can show what happened during the period, the small set can also show the current situation and tendencies.

For instance, looking to the overall graph It would be unthinkable that the 1 series can fight for dominance, but if we look to the last 5 years is reasonable to conclude that the series 1 is fighting for it, gaining 2 of the last 5 years.

The second learned lesson was during the drop/gain analysis. As shown looking only to proportional or absolute values does not always brings the truth. It’s important to always context the analysis to check if the calculated information make sense and, also decide if further investigation is mandatory to create a conclusion.

With that in hand, is important to say, lear about your data before jumping into conclusions. Play with your information, create assumption and try to prove by analysis the data.

--

--