What do the Machine Learning Training and Validation graphs tell us?

If you’re like me and you think the Training and Validation graph looked good because it was smooth, or heading in a certain direction, but wasn’t really sure why, then you’re in the right place.

I know it can get confusing, but every graph is telling a story. And I like stories. Especially ones with happy endings.

In my last article, A Simple Maths Free PyTorch Model Framework, I proposed a simple framework to build models for Classification (Binary, Multi Class or Multi Label) and Regression, but more importantly a Maths Free approach of understanding what it was doing, and why.

In this article I want to extend that a little to help you interpret what your Training and Validation graphs are telling you.

Let’s start off with some simple boilerplate code to set up a reusable model framework.

from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
import torch
from torch.utils.data import Dataset, DataLoader
import torch.optim as torch_optim
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import matplotlib.pyplot as plt
class MyCustomDataset(Dataset):
def __init__(self, X, Y, scale=False):
self.X = torch.from_numpy(X.astype(np.float32))
self.y = torch.from_numpy(Y.astype(np.int64))

def __len__(self):
return len(self.y)

def __getitem__(self, idx):
return self.X[idx], self.y[idx]
def get_optimizer(model, lr=0.001, wd=0.0):
parameters = filter(lambda p: p.requires_grad, model.parameters())
optim = torch_optim.Adam(parameters, lr=lr, weight_decay=wd)
return optim
def train_model(model, optim, train_dl, loss_func):
# Ensure the model is in Training mode
model.train()
total = 0
sum_loss = 0
for x, y in train_dl:
batch = y.shape[0]
# Train the model for this batch worth of data
logits = model(x)
# Run the loss function. We will decide what this will be when we call our Training Loop
loss = loss_func(logits, y)
# The next 3 lines do all the PyTorch back propagation goodness
optim.zero_grad()
loss.backward()
optim.step()
# Keep a running check of our total number of samples in this epoch
total += batch
# And keep a running total of our loss
sum_loss += batch*(loss.item())
return sum_loss/total
def train_loop(model, train_dl, valid_dl, epochs, loss_func, lr=0.1, wd=0):
optim = get_optimizer(model, lr=lr, wd=wd)
train_loss_list = []
val_loss_list = []
acc_list = []
for i in range(epochs):
loss = train_model(model, optim, train_dl, loss_func)
# After training this epoch, keep a list of progress of
# the loss of each epoch
train_loss_list.append(loss)
val, acc = val_loss(model, valid_dl, loss_func)
# Likewise for the validation loss and accuracy
val_loss_list.append(val)
acc_list.append(acc)
print("training loss: %.5f valid loss: %.5f accuracy: %.5f" % (loss, val, acc))

return train_loss_list, val_loss_list, acc_list
def val_loss(model, valid_dl, loss_func):
# Put the model into evaluation mode, not training mode
model.eval()
total = 0
sum_loss = 0
correct = 0
batch_count = 0
for x, y in valid_dl:
batch_count += 1
current_batch_size = y.shape[0]
logits = model(x)
loss = loss_func(logits, y)
sum_loss += current_batch_size*(loss.item())
total += current_batch_size
# All of the code above is the same, in essence, to
# Training, so see the comments there
# Find out which of the returned predictions is the loudest
# of them all, and that's our prediction(s)
preds = logits.sigmoid().argmax(1)
# See if our predictions are right
correct += (preds == y).float().mean().item()
return sum_loss/total, correct/batch_count
def view_results(train_loss_list, val_loss_list, acc_list):
plt.rcParams["figure.figsize"] = (15, 5)
plt.figure()
epochs = np.arange(0, len(train_loss_list))
plt.subplot(1, 2, 1)
plt.plot(epochs-0.5, train_loss_list)
plt.plot(epochs, val_loss_list)
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'val', 'acc'], loc = 'upper left')

plt.subplot(1, 2, 2)
plt.plot(acc_list)
plt.title('accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'val', 'acc'], loc = 'upper left')
plt.show()
def get_data_train_and_show(model, batch_size=128, n_samples=10000, n_classes=2, n_features=30, val_size=0.2, epochs=20, lr=0.1, wd=0, break_it=False):
# We'll make a fictitious dataset, assuming all relevant
# EDA / Feature Engineering has been done and this is our
# resultant data
X, y = make_classification(n_samples=n_samples, n_classes=n_classes, n_features=n_features, n_informative=n_features, n_redundant=0, random_state=1972)

if break_it: # Specifically mess up the data
X = np.random.rand(n_samples,n_features)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=val_size, random_state=1972) train_ds = MyCustomDataset(X_train, y_train)
valid_ds = MyCustomDataset(X_val, y_val)
train_dl = DataLoader(train_ds, batch_size=batch_size, shuffle=True)
valid_dl = DataLoader(valid_ds, batch_size=batch_size, shuffle=True)
train_loss_list, val_loss_list, acc_list = train_loop(model, train_dl, valid_dl, epochs=epochs, loss_func=F.cross_entropy, lr=lr, wd=wd)
view_results(train_loss_list, val_loss_list, acc_list)

Please don’t worry too much about the above. It’s pretty much all explained further in my previous article if you’re interested, but the key thing is we now have the foundations to be able to create simple models and then create suitable data, train it and see results, just by calling the function get_data_train_and_show. So let’s start looking at some scenarios.

Scenario 1 — Model seems to learn, but does not perform well on Validation or Accuracy

Regardless of hyper parameters, the model Train loss goes down slowly but the Val loss does not drop and its Accuracy does not indicate it’s learning anything.

I.e. in this case, binary classification, it’s accuracy is hovering around 50%.

class Scenario_1_Model_1(nn.Module):
def __init__(self, in_features=30, out_features=2):
super().__init__()
self.lin1 = nn.Linear(in_features, out_features)
def forward(self, x):
x = self.lin1(x)
return x
get_data_train_and_show(Scenario_1_Model_1(), lr=0.001, break_it=True)
Scenario 1 — Model seems to learn, but does not perform well on Validation or Accuracy

Scenario 1 — Possible Reason — ‘Not enough information in the data to allow “learning”’

It is possible that the training data does not contain sufficient information to allow the model to ‘learn’.

In this case (in the code I am assigning the train data to be just random data) it means it can’t learn anything of substance.

It is always imperative that the data has sufficient information to be able to learn from. Remember, EDA and Feature Engineering are key !! Models learn what can be learnt. Not make up what doesn’t exist.

Scenario 2 — Train, Val and Accuracy curves all really choppy and not settling

  • Learning Rate = 0.1
  • Batch Size = 128 (default)
class Scenario_2_Model_1(nn.Module):
def __init__(self, in_features=30, out_features=2):
super().__init__()
self.lin1 = nn.Linear(in_features, out_features)
def forward(self, x):
x = self.lin1(x)
return x
get_data_train_and_show(Scenario_2_Model_1(), lr=0.1)
Scenario 2 — Model doesn't seem to want to learn — all graphs choppy

Scenario 2 — Possible Reason — ‘Learning rate is too high’ or ‘batch size too low’

Reduce the learning rate from 0.1 to 0.001, means it won’t ‘bounce around’ and will instead smoothly reduce.

get_data_train_and_show(Scenario_1_Model_1(), lr=0.001)
Scenario 2 — With a Reduced Learning Rate

As well as reducing the learning rate, increasing the batch size will also make it smoother.

get_data_train_and_show(Scenario_1_Model_1(), lr=0.001, batch_size=256)
Scenario 2 —With a Reduced Learning Rate and Increased Batch Size

Scenario 3 — Train Loss goes to nearly zero and the Accuracy is looking OK, but the Val doesn’t drop as you would like, and then it just starts to rise

class Scenario_3_Model_1(nn.Module):
def __init__(self, in_features=30, out_features=2):
super().__init__()
self.lin1 = nn.Linear(in_features, 50)
self.lin2 = nn.Linear(50, 150)
self.lin3 = nn.Linear(150, 50)
self.lin4 = nn.Linear(50, out_features)
def forward(self, x):
x = F.relu(self.lin1(x))
x = F.relu(self.lin2(x))
x = F.relu(self.lin3(x))
x = self.lin4(x)
return x
get_data_train_and_show(Scenario_3_Model_1(), lr=0.001)
Scenario 3 — A classic overfitting

Scenario 3 — Possible Reason — ‘It’s overfitting’

The extremely low training loss and the high accuracy, while the validation loss and training loss are getting wider and wider apart, are all classic overfitting indicators.

Fundamentally, your model has too much capacity to learn. It is memorizing the training data too well, but that of course means it can’t generalize to new data as well. (Note that the model we used (Scenario_3_Model_1) has several layers and quite a high number of parameters)

First thing we could try is reduce the complexity of the model.

class Scenario_3_Model_2(nn.Module):
def __init__(self, in_features=30, out_features=2):
super().__init__()
self.lin1 = nn.Linear(in_features, 50)
self.lin2 = nn.Linear(50, out_features)
def forward(self, x):
x = F.relu(self.lin1(x))
x = self.lin2(x)
return x
get_data_train_and_show(Scenario_3_Model_2(), lr=0.001)
Scenario 3 — With reduced model complexity

That made it a lot better, but we can introduce a little L2 Weight Decay regularization to make it even better again (suitable for shallower models).

get_data_train_and_show(Scenario_3_Model_2(), lr=0.001, wd=0.02)
Scenario 3 — With reduced model complexity and an L2 Weight Decay

But we could also have kept the model as deep and large as it was, but introduce dropout (suitable for deeper models).

class Scenario_3_Model_3(nn.Module):
def __init__(self, in_features=30, out_features=2):
super().__init__()
self.lin1 = nn.Linear(in_features, 50)
self.lin2 = nn.Linear(50, 150)
self.lin3 = nn.Linear(150, 50)
self.lin4 = nn.Linear(50, out_features)
self.drops = nn.Dropout(0.4)
def forward(self, x):
x = F.relu(self.lin1(x))
x = self.drops(x)
x = F.relu(self.lin2(x))
x = self.drops(x)
x = F.relu(self.lin3(x))
x = self.drops(x)
x = self.lin4(x)
return x
get_data_train_and_show(Scenario_3_Model_3(), lr=0.001)
Scenario 3 — Still using a deep model, but now with Dropout

Scenario 4 — Train and Val behaving, but Accuracy just not getting high

  • Learning Rate = 0.001
  • Batch Size = 128 (default)
  • Number of possible classes = 5
class Scenario_4_Model_1(nn.Module):
def __init__(self, in_features=30, out_features=2):
super().__init__()
self.lin1 = nn.Linear(in_features, 2)
self.lin2 = nn.Linear(2, out_features)
def forward(self, x):
x = F.relu(self.lin1(x))
x = self.lin2(x)
return x
get_data_train_and_show(Scenario_4_Model_1(out_features=5), lr=0.001, n_classes=5)
Scenario 4 — Train and Val behaving, but Accuracy just not getting higher than 38%

Scenario 4 — Possible Reason — ‘Not enough capacity to learn’

One of the layers in your model has fewer parameters than there are classes in the possible output of the model. In this case, a layer of 2 in the middle when there are 5 possible output classes.

This means the model is losing some of the individual data about those classes as it is having to cram it through a smaller layer, and so struggles to get that information back once the layers get bigger again.

So make sure the layers never go smaller than the output size of the model.

class Scenario_4_Model_2(nn.Module):
def __init__(self, in_features=30, out_features=2):
super().__init__()
self.lin1 = nn.Linear(in_features, 50)
self.lin2 = nn.Linear(50, out_features)
def forward(self, x):
x = F.relu(self.lin1(x))
x = self.lin2(x)
return x
get_data_train_and_show(Scenario_4_Model_2(out_features=5), lr=0.001, n_classes=5)
Scenario 4 — Ensuring the layers never go smaller than the number of classes, and accuracy up at 88%

Conclusion

And there you have it. Examples of different Training, Validation and Accuracy graphs, what they are telling you and how you can make them better.

I hope this has been helpful.

--

--

--

Data Science and Machine Learning are hugely exciting, enabling better data understanding to see beyond the obvious and help make decisions and help patients.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Vid2Vid — Conditional GANs for Video-to-Video Synthesis

Try Tensorflow Line by Line Tutorial Part 1

Audio Source Separation with Deep Learning

Further reduce your time to AI with deep learning templates

Logistic Regression: The baseline classification model Explained

Making data meaningless so AI can map its meaning

(PPS) Dynamic Routing Between Capsules

Revealing what neural networks see and learn: PytorchRevelio

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Martin Keywood

Martin Keywood

Data Science and Machine Learning are hugely exciting, enabling better data understanding to see beyond the obvious and help make decisions and help patients.

More from Medium

ML model evaluation using Cross Validation

Top Activation Functions at a glance

Explore and Validate Datasets with TensorFlow Extended

Pytorch Vs Keras Deep Neural Network for Mobile Price Prediction.