Machine Learning Tutorial #3: Evaluation

Topics: Performance Metrics, Commentary

Adam Novotny
Aug 19, 2018 · 3 min read
Machine Learning project overview. Author: Adam Novotny

In this third phase of the series, I will explore the Evaluation part of the ML project. I will reuse some of the code and solutions from the second Training phase. However, it is important to note that the Evaluation phase should be completely separate from training except for using the final model produced in the Training step. Other tutorials in this series: #1 Preprocessing, #2 Training, #3 Evaluation (this article), #4 Prediction. Github code.

Performace Metrics

model = pickle.load(open("dtree_model.pkl", "rb"))
>>> model
DecisionTreeRegressor(criterion='mse', max_depth=3, max_features=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=5, min_weight_fraction_leaf=0.0, presort=False, random_state=1, splitter='best')

Next, we will load the testing data we created in the Preprocessing part of this tutorial. The primary reason why I keep the Evaluation section separate from Training is precisely this step. I keep the code separate as well to ensure that no information from training leaks into evaluation. To restate, we should have not seen the data used in this section at any point until now.

X = pd.read_csv("X_test.csv", header=0)
y = pd.read_csv("y_test.csv", header=0)

At this stage, we may perform additional performance evaluation on top of the Training step. However, I will stick to the metrics used previously: MAE, MSE, R2.

Decision tree MAE, MSE, R2


The key comparison is how well does our model evaluate relative to the training phase. In the case of models ready for production, I would expect the performance in the Evaluation step to be comparable to those of testing folds in the Training phase.

Comparing the last training test fold here (5249 datapoints used to train) and the Evaluation results above:

  • MAE: final Training phase ~10^-2. Evaluation phase ~10^-2
  • MSE: final Training phase ~10^-4. Evaluation phase ~10^-3
  • R²: final Training phase ~0. Evaluation phase ~0

The performance on dataset the model has never seen before is reasonably similar. Nonetheless, overfitting is still something to potentially address. If we had a model ready for production from the Training phase, we would be reasonably confident at this stage that it would perform as we expect on out of sample data.

Other tutorials in this series: #1 Preprocessing, #2 Training, #3 Evaluation (this article), #4 Prediction

Author website:


Coinmonks is a non-profit Crypto educational publication.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store