Persist & Reuse Trained Machine Learning Models using Joblib or Pickle in Python

Once machine learning model trained, it’s required to use the built model to use in classification or regression problems

Harsha Moraliyage
2 min readMay 23, 2020

In a real-world, testing and training machine learning models is one of the main phase in a machine learning model development life cycle. Training a machine learning model may take from few seconds to several days even months. Hence it’s not practical to train a machine learning model again and again. Hence it’s required to save trained model and reuse it.

In this article I will explain how trained model can be export and reused in Python environment. There are two main methods available in python to export a machine learning model and reuse it. Sample application that used boblib library to export a machine learning model and expose via a Flask application can be found in here.

  • Joblib
  • Pickle

These two approaches use simple serialization and deserialization approach to save model to a file and load it from it.

Dataset: Boston house price

Algorithm: MLPRegressor from sklearn package

Joblib Approach

Joblib is a python library which is significantly fast over large numpy arrays. Joblib intended to replace Pickle as alternative to save and load machine learning models. You may look at the implementation details in here.

Joblib offers a simple API compared to Pickle

Joblib has several compression models including gzip, zlib and bz2 which is useful when saving large machine learning models

  • Importing data manipulation libraries and joblib
# Data Manipulation libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPRegressor
import joblib
  • Loading and preparing dataset
df = pd.read_csv('tp3_boston_data.csv')  # Load the dataset

df_x = df[['crim', 'zn', 'indus', 'chas', 'nox', 'rm', 'age', 'dis', 'rad', 'tax', 'ptratio', 'lstat']]
df_y = df[['medv']]

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(df_x)

df_x_scaled = scaler.transform(df_x)
df_x_scaled = pd.DataFrame(df_x_scaled, columns=df_x.columns)
  • Split train and test data
X_train, X_test, Y_train, Y_test = train_test_split(df_x_scaled, df_y, test_size = 0.33, random_state = 5)
  • Train and building the model
mlp = MLPRegressor(hidden_layer_sizes=(60), max_iter=1000)
mlp.fit(X_train, Y_train)
  • Exporting the model to a file using joblib
#Saving the machine learning model to a file
joblib.dump(mlp, "rf_model.pkl")
  • Loading the machine learning model for predict the test data
model = joblib.load('rf_model.pkl')
y_predict = model.predict(X_test)

Simple code on saving machine learning model and expose it via a Flask API can be found in link1, link2.

Pickle Approach

Pickle is a very fast model saving library when there is less amount of large numpy arrays. It’s because pickle the pickle module of standard library is implemented in C. For simple models, pickle is very fast.

  • Saving model to a file using pickle
# Import pickle module

import pickle
#Save the model to file in the current working directory
pickle_file_name = "my_model.pkl"

with open(pickle_file_name, 'wb') as file:
pickle.dump(mlp, file)
  • Load and predict using the saved model from pickle
# Load the model from the saved file
with open(pickle_file_name, 'rb') as file:
pk_model = pickle.load(file)
y_predict = model.predict(X_test)

[1] https://github.com/harsha89/ml-model-tutorial

Done!

--

--

Harsha Moraliyage

I am an Data Scientist at Centre For Data Analytics and Cognition, where I design and develop AI, machine learning, and CI/CD solutions