Porsche, there is no substitute.

Using Machine Learning to predict prices of the Porsche 911 (991 generation)

Popkdodge
5 min readJun 20, 2020
Porsche 911 991.2 generation

With the introduction of the next generation of Porsche 911s, the previous generation is starting to become a bargain for Porsche enthusiasts. Unless you are a Porsche connoisseur like me, it can be quite a confusing endeavor to navigate through all of the different types and model ranges of the 911 Carreras, to be specific. How can one figure out if your car is priced correctly? Fear not! I have turned toward data science to answer this very question. TLDR: Click on me and put in the vehicle you are looking to buy for a price assessment. BUT, if you are here to learn the process I undertook, keep reading!

Collecting Data

Since there are no available datasets for prices of 911s out there that are easily accessible, I resorted to having to web-scrapped my datasets. I use cars.com for the data and use Python’s Beautiful Soup to web-scraped datasets.

cars.com and unclean data from web-scrapping.
Unclean data

Cleaning Data

The web-scraper aggregates a lot of the data into one column; this can be cleaned to separate into many columns. I used the filter on cars.com to find cars by colors and transmission to include more data. I used my domain knowledge about the Porsche 911 to choose the most valuable features. For example, the Porsche 911 Carrera is a base model, then there are the S, 4S, GTS, GTS4, T (Touring), and Black Edition. All of them come in multiple colors, a choice of an automatic or manual transmission, and a cabriolet(convertible) or hardtop. However, I excluded the Targa 4S and the Targa GTS4 due to the small number of Targas in the US, after merging all of the data and cleaning it up.

Choosing what types of Machine learning to use.

When choosing the right model I had to take into consideration that this is a supervised learning regression problem, because the target value is known, and the target value is a continuous numeric value.

Establishing a baseline.

To get the baseline value for this problem I calculated the mean value of the Porsche Carrera for every year of the 991 generations.

Using for loops and MAE to establish a baseline.

Trying out linear models.

MAE Scores for the basic linear model.

A linear model is the go-to when predicting continuous values such as price or whenever dealing with monetary values. By using the sk-learn linear regression model with default hyperparameters, the MAE value was $9,245, beating the baseline score.

Building upon the linear model, I also performed different methods of modeling such as ridge regression with cross-validation and using standard scaler in the hopes of improving my scores. Cross-validation MAE scores for ridge regression are $9,039 showing an improvement of my original MAE by about $200. In terms of predicting Porsche prices, I don’t think this is good enough. I would like to dial this down to roughly $5000–6000, so I need to look toward other modeling methods such as Decision Trees and Random Forest Regression.

Improving MAE with Random Forest Regression

By applying Random Forest regression and using cross-validation to tune hyperparameters, I was able to lower my MAE to a more sensible range. The MAE score was $5,752.

MAE Scores for Random Forrest Regression.

The MAE score improved significantly to a point where I believe it is within acceptable bounds. I used this model to create the original Porsche price predictor. I then got a significantly improve score of $5,061.

Best MAE scores with a random XGBregressor.

Conclusion

The 991.1 generation of various Porsche models seems to be at an excellent range in prices because depreciation has already been reduced and won’t lose as much in value for each following year. This model also shows after 60,000 miles, it is considered high mileage and won’t significantly decrease its value. This model is useful for those who are seeking to buy or sell their Porsche 991. The model provides a baseline for your car’s value or vital information when dealing with a car salesman. So Porsche enthusiasts, to get the most bang for your buck I highly recommend utilizing this information to empower yourself as a seller or buyer.

About the author

I am a data science student at lambda. I am avid Porsche fanatic and a financial geek.

Connect with me:

Porsche Price Predictor website: https://porsche911.herokuapp.com/

LinkedIn: https://www.linkedin.com/in/sasana-kongjareon-2618281a6/

Email: popkdodge@gmail.com

--

--