Why Loading a Previously Saved Keras Model Yields Different Results: Lessons Learned

6 min readFeb 3, 2022

Thinking why my model isn’t loading properly — Photo by Markus Winkler on Unsplash

The usage of machine learning models in production is now bigger than ever. One such popular library for creating powerful machine learning and deep learning models is Keras. However, the training process of these models is often very computationally expensive and lengthy, depending on the data at hand and the model architecture. Some models take weeks to months to train. This makes it so important to be able to store our models locally and retrieve them once again when we need to make predictions. But what do we do if for some reason the saved model is not loaded properly? I will try to give answers based on my experiences.

I will not go into detail in how to use and save Keras models, I’ll just assume that the reader is familiar with the process and just skip ahead to how to deal with unexpected model behavior upon loading. Namely, after training a Keras model, which is stored in a model variable, we like to save it as it is so that on next loading we can skip training and just make predictions. My preferred way of doing this is by saving the model’s weights, which are randomized at the beginning of the model creation, and are updated as the model is trained. So I hit model.save_weights(“model.h5”). A “model.h5” file gets created that contains the weights that our model learned. Next, in another session I recreate the a model with an identical architecture as I did before, and load the trained weights I saved to it with new_model.load_weights(“model.h5”) . All seems fine. Except, when I hit new_model.predict(test_data), I get accuracy of zero. And have no idea why.

As it turns out, there is a bunch of reasons why your model does not make correct predictions. Here I will try to summarize the most common ones and give you tips on how to work them around.

1. First things first, double check your data.

I know it seems way to obvious, but minor oversights can lead to poor performance when reloading your model from disk. If for instance, you are building language models, make sure that in every new session you:

Recheck the order of your class labels. If you are mapping them to numbers, recheck that in every session every class label gets the same number. This might happen if you retrieve them with a list(set()) function, which will return your labels in different order every time. This might mess up your label predictions in the end.
Check your data sets. In case you don’t have your test data already in another file, check that your train-test split isn’t random so that each time you make predictions, you predict on different data, therefore your prediction accuracy ends up not consistent.

You can, of course, run into other data-relate issues, depending on the domain that you work on. However, always check for data representation consistency.

2. The metrics issue

Another cause of error or inconsistent results is the choice of accuracy metrics. Usually, when build a model and we save its weights, we do something of the sort:

def build_model(max_len, n_tags): 
   input_layer = Input(shape=(max_len, ))
    output_layer = Dense(n_tags, activation = 'softmax')(input_layer)
    model = Model(input_layer, output_layer)
    
    return modelmodel = build_model()
model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])model.fit(..)
model.save_weights("model.h5")

If we need to open it in a new session/script, we’d do the following:

def build_model(max_len, n_tags): 
   input_layer = Input(shape=(max_len, ))
    output_layer = Dense(n_tags, activation = 'softmax')(input_layer)
    model = Model(input_layer, output_layer)
    
    return modelmodel = build_model()
model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])model.load_weights("model.h5")
model.evaluate()

Depending on the specific Keras/Tensorflow version that you are using, this might throw an error. The problem occurs when compiling the model and choosing “accuracy” as a metric. Keras recognizes various definitions for accuracy: “sparse categorical accuracy”, “categorical accuracy” and so on, depending on the data that you’re using, a different one is the best solution. This is due to the fact that if we set the metrics as “accuracy”, Keras will try to assign one of the specific accuracy types, depending on which one it thinks fits best the data distribution. It may infer a different accuracy metrics on different runs. The best workaround here is to always explicitly set the accuracy metric and not let Keras choose itself. For instance, replace

model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])

with

model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["sparse_categorical_accuracy"])

3. Randomness

When re-training a Keras neural network on the same data as before, you’ll rarely get the same results twice. This is due to the fact that neural networks in Keras are using randomness when initializing their weights, so on every run weights are initialized differently, therefore during the learning process these will get updated differently, so the same accuracy results when making predictions are unlikely.

If for whatever reason, you need the weights to be equal prior to training, you can set the random number generator before your code:

from numpy.random import seed
seed(42)
from tensorflow import set_random_seed
set_random_seed(42)

The numpy random seed is sat for Keras, whereas for the Tensorflow backend, we need to set its own random number generator to an equal seed. This code snippet will make sure that each time you run your code, your neural network weights will be initialized equally.

4. Watch out for custom layers usage

Keras offers a wide variety of layers (Dense, LSTM, Dropout, BatchNormalizaton, and many more), but sometimes we want to apply some specific operation to our data within our model, and there is not a specific defined layer for it. In general, there are two types of layers that Keras offers: Lambda and the base Layer class. Be very careful with these two, especially if you save your model architecture as json format. The tricky part with a Lambda layer is its serialization limitation. Due to the fact that it’s saved with serialization of the Python bytecode, it can only be loaded in the same environment where it was saved, i.e. it is not portable. When faced with this issue, it is usually recommended to override a keras.layers.Layer layer, or instead saving the entire model, just save its weights and rebuild the model from scratch.

5. Custom objects

Very often you will want to use custom functions that you apply to the data, or functions that calculate loss/accuracy etc. Keras allows for such usage by letting us specify the additional parameters when saving/loading the models. Let’s say we want to load our previously saved model with a special loss function that we created ourselves:

model = load_model("model.h5", custom_objects={"custom_loss":custom_loss})

If we load this model in a new environment, we have to be careful to define our custom_loss function there, as by default these are not remembered when saving the model. Even if we save the entire architecture of our model, it will save the name of our custom function, but the function body is something we have to provide in addition.

6. Global variables initializer

This one’s especially relevant if you are using Tensorflow 1.x as backend, which you will probably still need for many applications. When running a tf 1.x session, you need to run tf.global_variables_initializer() which initializes all variables randomly. A side effect of this is, that, when you try to save your model, it might reinitialize all of the weights. You can stop this behavior manually by running:

from keras.backend import manual_variable_initialization manual_variable_initialization(True)

Summary

This text was a quick summary of the factors which (at least in my experience) are most often causing your Keras models not to be loaded properly in new environments. Sometimes these issues cause unpredictable results, and in other cases they just throw an error. When and how they occur also depends a great deal on the Python version that you are using, as well as your Tensorflow and Keras versions, as some of these versions are incompatible and interact in a way that causes unexpected behavior. I hope this short overview gives you and idea of where to start looking if faced with such issues. Cheers!

Sources

Mlearning.ai Submission Suggestions

How to become a writer on Mlearning.ai

medium.com