Simple Regression Example

K-Fold Validation, Normalization, MAE

Jake Batsuuri
Computronium Blog
6 min readSep 25, 2020

--

Overview

Basics:

  • Goal
  • Input & Output
  • Encoding & Decoding
  • Architecture
  • Regularization
  • Validation

Code:

  • Import
  • Model Definition
  • K-Fold Validation
  • Validation
  • Final Model

Goal

We will try to predict the median house price given 13 different parameters. The parameters are attributes such as crime rate, property tax rate, square footage etc.

Input & Output

You can learn more about the data set here and here.

The input variables in order are:

  1. CRIM per capita crime rate by town
  2. ZN proportion of residential land zoned for lots over 25,000 sq.ft.
  3. INDUS proportion of non-retail business acres per town
  4. CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
  5. NOX nitric oxides concentration (parts per 10 million)
  6. RM average number of rooms per dwelling
  7. AGE proportion of owner-occupied units built prior to 1940
  8. DIS weighted distances to five Boston employment centres
  9. RAD index of accessibility to radial highways
  10. TAX full-value property-tax rate per $10,000
  11. PTRATIO pupil-teacher ratio by town
  12. B 1000(Bk — 0.63)² where Bk is the proportion of blacks by town
  13. LSTAT % lower status of the population
  14. MEDV Median value of owner-occupied homes in $1000's

If number 12 is what I think it is, that’s really fucked up. Anyway. The output should be the median price in the 1000’s. If we used activation functions in classification, we do no such thing for regression since the output should be a real-valued number.

Encoding & Decoding

In this stage, we don’t necessarily encode so much as normalize the data. The reason we do this is because each attribute has different ranges, which makes learning more difficult.

So we normalize each attribute or feature by making its mean 0 and a standard deviation 1.

Architecture

We will use a relatively small network of 2 hidden layers, each with 64 units. The considerations for this are the size of the data set and the normalization step.

Having a small data set makes overfitting more likely, and a small network is one way to deal with overfitting. And normalizing the data set makes it a bit easier to learn.

These two are the necessary and sufficient conditions.

For the loss function, we will use the MSE or the Mean Squared Error.

Regularization

Our primary method of regularization is early stopping again. But this time, we will use a metric of MAE or the Mean Absolute Error, which is the absolute value of the difference between the predictions and the actual prices.

Validation

Generally, with a bigger data set, we partition it into training and validation sets. But what would happen here is that our validation set would end up being too small. And smaller sets often give a high variance, which isn’t great for validation.

So we use a k-fold validation method. Here we partition the data into k groups, where k is often 3 to 5. And we train on the k-1 groups and evaluate with the last group. Then we repeat this process k times. We are changing the evaluation group every time.

During each iteration, we get a validation score. After all the iterations, we average the scores to get a final validation score, like a GPS measurement error reduction.

Import

Tuple of Numpy arrays: (x_train, y_train), (x_test, y_test).

x_train, x_test: numpy arrays with shape (num_samples, 13) containing either the training samples (for x_train), or test samples (for y_train).

y_train, y_test: numpy arrays of shape (num_samples,) containing the target scalars. The targets are float scalars typically between 10 and 50 that represent the home prices in k$.

Here even for the computed mean and standard deviations, we only use the training data, we make sure to treat the test set completely external, so as to not pollute our testing.

Model Definition

Here we make the empty model building into a function because we will use it k times for the K-Fold validation method.

K-Fold Validation

These are the validation scores for each fold.

Whereas the above is the final averaged validation score. Which means our k times trained models give an average error of $2604.

Validation

For each iteration of the K-Fold, we save the MAE so that we can plot it.

Here we average the MAE from the folds over the number of epochs.

Here we remove the first 10 points because they’re way off anyway. And then we replace each point with a exponential moving average of the previous points.

Here we see the global minimum is somewhere between 0 and 100. To find the exact location, we use min on a python iteratable.

This is the lowest MAE. To get the index of this value.

This is the epoch with the lowest MAE.

Final Model

Notice that we used the lowest MAE epoch number to know when to stop the training. And our test_mae_score is 2.754209280014038.

Other Articles

Up Next…

Coming up next is the Part II of back prop algorithm. If you would like me to write another article explaining a topic in-depth, please leave a comment.

For the table of contents and more content click here.

--

--