Persist & Reuse Trained Machine Learning Models using Joblib or Pickle in Python

Once machine learning model trained, it’s required to use the built model to use in classification or regression problems

2 min readMay 23, 2020

In a real-world, testing and training machine learning models is one of the main phase in a machine learning model development life cycle. Training a machine learning model may take from few seconds to several days even months. Hence it’s not practical to train a machine learning model again and again. Hence it’s required to save trained model and reuse it.

In this article I will explain how trained model can be export and reused in Python environment. There are two main methods available in python to export a machine learning model and reuse it. Sample application that used boblib library to export a machine learning model and expose via a Flask application can be found in here.

Joblib
Pickle

These two approaches use simple serialization and deserialization approach to save model to a file and load it from it.

Dataset: Boston house price

Algorithm: MLPRegressor from sklearn package

Joblib Approach

Joblib is a python library which is significantly fast over large numpy arrays. Joblib intended to replace Pickle as alternative to save and load machine learning models. You may look at the implementation details in here.

Joblib offers a simple API compared to Pickle

Joblib has several compression models including gzip, zlib and bz2 which is useful when saving large machine learning models

Importing data manipulation libraries and joblib

# Data Manipulation libraries
 import pandas as pd
 from sklearn.model_selection  import train_test_split
 from sklearn.neural_network import MLPRegressor
 import joblib

Loading and preparing dataset

df = pd.read_csv('tp3_boston_data.csv')  # Load the dataset
 
 df_x = df[['crim', 'zn', 'indus', 'chas', 'nox', 'rm', 'age', 'dis', 'rad', 'tax', 'ptratio', 'lstat']]
 df_y = df[['medv']]
 
 from sklearn.preprocessing import StandardScaler
 scaler = StandardScaler()
 scaler.fit(df_x)
 
 df_x_scaled = scaler.transform(df_x)
 df_x_scaled = pd.DataFrame(df_x_scaled, columns=df_x.columns)

Split train and test data

X_train, X_test, Y_train, Y_test = train_test_split(df_x_scaled, df_y, test_size = 0.33, random_state = 5)

Train and building the model

mlp = MLPRegressor(hidden_layer_sizes=(60), max_iter=1000)
mlp.fit(X_train, Y_train)

Exporting the model to a file using joblib

#Saving the machine learning model to a file
joblib.dump(mlp, "rf_model.pkl")

Loading the machine learning model for predict the test data

model = joblib.load('rf_model.pkl')
y_predict = model.predict(X_test)

Simple code on saving machine learning model and expose it via a Flask API can be found in link1, link2.

Pickle Approach

Pickle is a very fast model saving library when there is less amount of large numpy arrays. It’s because pickle the pickle module of standard library is implemented in C. For simple models, pickle is very fast.

Saving model to a file using pickle

# Import pickle module

import pickle #Save the model to file in the current working directory
pickle_file_name = "my_model.pkl"  

with open(pickle_file_name, 'wb') as file:  
    pickle.dump(mlp, file)

Load and predict using the saved model from pickle

# Load the model from the saved file
with open(pickle_file_name, 'rb') as file:  
    pk_model = pickle.load(file)y_predict = model.predict(X_test)

[1] https://github.com/harsha89/ml-model-tutorial

Done!

Persist & Reuse Trained Machine Learning Models using Joblib or Pickle in Python

Once machine learning model trained, it’s required to use the built model to use in classification or regression problems

Joblib Approach

Pickle Approach

Written by Harsha Moraliyage