f(D) — A Function of Time and Price

Javed Karim
DataFrens.sg
Published in
3 min readJul 30, 2019
MacRitchie Reservoir. Photo taken on my Lenovo K3 Note.

We are all too familiar with the concept of depreciation. It is a basically a function of time and price and if you’re considering the context in real estate property, then you will include features of location, amenity and size area in your calculations. Furthermore, your regression analysis will arrive to negative depreciation, essentially indicating its value to be an appreciating asset since there have been improvements to the property itself or it’s perimetral features.

Now, for the average Joe, the car is definitely a depreciating purchase. It is a means of transportation from point A to point B; for most of us point A being home and point B work. Then will you fork out a hundred thousand dollars for a car for such a purpose? Keep in mind that the public transport is very convenient on this sunny island and cars really expensive.

A Toyota Corolla that would fetch less than US$20,000 new in the United States costs around US$80,000 (SG$110,000) in Singapore. A top-of-the-range Ferrari or Lamborghini, meanwhile, can cost up to US$1.4 million (SG$2 million) in Singapore — more than triple the pricetag for the same supercar stateside.
www.forbes.com

If the situation requires, it will be a necessary investment. Approaching your friendly car salesman, he will utilize the vehicle transaction data-set to help you choose the the car of our choice, but to apply machine learning will be even more awesome. The vehicle transaction data-set obtained in csv format can simply be imported into the pandas dataframe:

import pandas as pd
df_veh = pd.read_csv(filename, names=col_headers), ignore_index = True

And since we’re only interested in the available cars:

# We're only interested in 'Available' vehicles
df_veh_ = df_veh[df_veh['status'] == 'Available']

Similarly, we may filter by make, model, engine capacity and even date. Of course it will make better sense to have the engine capacity in integer format:

df_veh = df_veh.astype({'price': 'int32', 'eng_cap': 'int32'})

We might want to check for null entries and remove them:

# check if any column have null values
df_veh.columns[df_veh.isnull().any()]
# Remove null rows
df_veh.dropna()

A strategy for handling any missing mileage data could be to insert the computed mean:

# Calculates mean for each columns and replaces zero with that mean.
df_veh.replace(0,df_veh.mean(axis=0),inplace=True)

It is a good idea to pickle the sanitized data for quick reuse for subsequent processes:

# pickle the vehicle dataframe
df_veh.to_pickle("./df_veh.pkl")

One-hot encoding and label encoding are utilized to transform the categorical data to binary or some value as input to the linear regression model. Building a linear regression model is essentially a 3-step process:

# Set up the target and input variables
y = df.price
X = df.drop(columns=['price'])
# hold out 20% of the data for final testing
X, X_test, y, y_test = train_test_split(X, y, test_size=.2, random_state=10)
# further partition X, y into datasets X_train, y_train (60% of original) and X_val, y_val (20%).
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=.25, random_state=3)
# Build Linear and Ridge Modellm = LinearRegression()#Feature scaling for train, val, and test so that we can run our ridge model on each
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train.values)
X_val_scaled = scaler.transform(X_val.values)
X_test_scaled = scaler.transform(X_test.values)
lm_reg = Ridge(alpha=1)#validate eachlm.fit(X_train, y_train)
print(f'Linear Regression val R^2: {lm.score(X_val, y_val):.3f}')
lm_reg.fit(X_train_scaled, y_train)
print(f'Ridge Regression val R^2: {lm_reg.score(X_val_scaled, y_val):.3f}')

The test partition can now be used to predict on the model and the r-square error used as an indication for performance of the model. It has been a good start to delve into the dynamics of machine learning and it has provided a good foundation to build more complex models. So onto more complex machine learning and in the meantime to conclude, investing in a Mercedes-Benz will be a good choice since it is the most popular in the luxury car segment.

A Message from DataFrens…

Thanks for being a part of our community!

Do join us here at:

Read all our DataFrens articles here at:

--

--