Saving and Loading a Trained Machine Learning Model for Inferencing

Peter Woodman
3 min readMar 26, 2023

--

Prerequisites

  • Machine Learning Basics
  • Python Basics

What happens after a model is trained?

  • Model training is a very important stage in modeling. It involves feeding an untrained model with sufficient data to understand and recognize the patterns in data. Typically, based on the training data and the complexity of the model involved, the training make even take hours if it involves complex data like images, videos, etc.
  • After a machine learning / deep learning model is trained successfully, the trained model will be used for inferencing. In the normal scenario, the trained model is stored in the python kernel’s memory as an object. So during inferencing, we can use this object reference to access the model and get the desired output for the respective inputs.
  • Though this method looks direct and straightforward, there are some challenges associated with it when it comes to industrializing it.

Challenges

  • If I restart the python kernel, the trained model object will be removed from the memory, so do I have to train the model again from scratch?
  • If I want to share this model for inferencing with another person, how can I achieve it without involving the training phase?
  • If I want to industrialize this trained model, how will I set it up in the server without involving the training phase?

Solution

  • To avoid training each time to invoke a model and remove the above-discussed bottlenecks we have to save this model in a reusable manner.
  • Once we have saved the trained model, we can initialize this trained model in the required instance and take it forward for inferencing. This activity is commonly called pickling.

“Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy. — Python Documentation

  • The process followed in model pickling is called serialization. And, initializing the saved model back to a variable (or unpickling) is called deserializing.

Most Commonly Used Methods

  • Pickling using the Pickle module
  • Pickling using the Joblib module

Steps for using Pickle module

1. Import the pickle module

import pickle

2. Train a model

model.fit(xtrain, ytrain)

3. Save the model using pickle.dump.

pickle.dump(model, open(picklePath, "wb"))
  • Here, picklePath represents the path to store the generated pickle file
  • Once the pickle file is generated it can be shared and used for inferencing

4. Load the model for inferencing use pickle.load

model = pickle.load(open(picklePath, "rb"))
  • Here, picklePath represents the path where the pickle file is stored.
  • The model variable has reference to the trained model.

5. Use the loaded model for inferencing

model.predict(data)
  • Similar to how we would have treated a trained model variable we can treat this model variable and take it forward for getting output.

Steps for using Joblib module

1. Import the joblib module

import joblib

2. Train a model

model.fit(xtrain, ytrain)

3. Save the model using joblib.dump.

joblib.dump(model, picklePath)
  • Here, picklePath represents the path to store the generated pickle file
  • Once the pickle file is generated it can be shared and used for inferencing
  • Additionally, parameters like compression can be used to reduce the size of the generated pickle file

4. Load the model for inferencing use pickle.load

model = joblib.load(picklePath)
  • Here, picklePath represents the path where the pickle file is stored.
  • The model variable has reference to the trained model.

5. Use the loaded model for inferencing

model.predict(data)
  • Similar to how we would have treated a trained model variable we can treat this model variable and take it forward for getting output.

Check GitHub for Code — Real Use Case

  • For Pickling — Link
  • For Unpickling — Link

Conclusion

  • In this blog, the methodology for saving and loading the model for inferencing is discussed. The process of storing python objects as a byte stream (pickling) and its inverse operation (unpickling) was discussed.
  • The commonly used modules, Pickle and Joblib for achieving this requirement were discussed. The python implementation steps for both modules were listed with sample code and example.

References

--

--