Saving Your Machine Learning Model In Python: pickle.dump()

This article introduces you to the concept of object serialization and deserialization in python with the pickle library. By the time you finish reading this article, you will be able to serialize/save your model seamlessly as a pickle file and deserialize/open it in the same or another python file for later use, troubleshoot issues you may run into while working with pickle.

Einstein EBEREONWU
7 min readDec 28, 2022

Content

  • Basic Introduction to Pickle.
  • Working with Pickle.
  • Common mistakes and their solutions.
  • Conclusion.

Basic Introduction to Pickle.

Pickle is one of the python standard libraries for object serialization/deserialization. Pickle serialization saves objects into a file in byte format while deserialization is the reverse of serialization. While trying to serialize an object into a pickle file, it is necessary to state the byte/binary format first otherwise, you will run into an error. Several objects such as lists, tuples, dictionaries, transformers, models, and many others can be pickled/serialized however, this article focuses on serializing and deserializing Machine Learning models.

Working with Pickle: Model Serialization

To make use of the pickle library, you have to first of all import the library with the line of code “import pickle”, you can also make use of an alias if you wish in such style “import pickle as pk” as seen below:

# Import style 1 (Without Alias)
import pickle

# Import style 2 (With Alias)
import pickle as pk

After importing the pickle library with the style of your choice, it’s time to open/create a file and dump the object into the file. While opening the file, it is necessary to specify the text modes which are write (w) and binary mode (b) for serializing operations. I’ll be discussing two methods for model serializing which are;

Method 1

To save a file using the “long method”, you do the following:

# Saving model to pickle file
with open("desired-model-file-name.pkl", "wb") as file: # file is a variable for storing the newly created file, it can be anything.
pickle.dump(model, file) # Dump function is used to write the object into the created file in byte format.

The first line opens a file in binary write mode and stores the newly created file in the “file” variable, the second line writes the model object into the file and saves it by default in the same directory/folder as the file where the python file is stored but you can always specify a file path of your own within the open function together with the desired-model-file-name.pkl.

Method 2

To save a file using the short-hand method, you do the following:

pickle.dump(model, open("desired-model-file-name.pkl", "wb"))

This amazing one-liner code does everything the previous method does but in just a single line of code. As a beginner, I advise you stick with the first method till you have a better understanding of file creation before moving on to this one-liner method. The advantage of the first method over this is that the with function automatically closes the file for you after saving the model whereas with the one-liner method, you have to make use of the close() function to close the file.

Working with Pickle: Model Deserialization

Deserialization, the reverse of the serialization process is just as easy. In order to deserialize a file, it is also necessary to specify the text modes which are read (r) and binary mode (b) for deserialization operations. The read mode tells the compiler that the file is already in existence and all that needs to be done is to read in the binary text and convert it back to the original object. To do this, I’ll also be discussing two methods.

Method 1

# Opening saved model
with open("desired-model-file-name.pkl", "rb") as file:
model = pickle.load(file)

# The model has now been deserialized, next is to make use of it as you normally would.
prediction = model.predict([[2,112,68,22,94,34.1,0.315,26]]) # Passing in variables for prediction
print("The result is",prediction[0]) # Printing result

The first line of code specifies the file to open in byte format and then stores the file in the “file” variable, the second line of code loads the content of the file which is the model, and stores it inside the “model” variable which can then be used normally.

Method 2

# Open saved model using 2nd method
model_mtd2 = pickle.load(open("desired-model-file-name.pkl", "rb"))

# Now, to make use of the model as you normally would
result = model_mtd2.predict([[3,193,70,31,0,34.9,0.241,25]])
print("The result is", result[0])

Just as before, this amazing one-liner code does exactly what the first method does in just a single line of code after which the model can be used as usual.

Common mistakes and their solutions

Common Mistake 1: Attempting to serialize without specifying byte/binary format.

# Attempting to serialize without specifying byte/binary format.
with open("desired-model-file-name.pkl", "w") as file:
pickle.dump(model, file)

>>> Output Below
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
c:\User\model-serialization-deserialization\model-serialization-deserialization.ipynb Cell 10 in <cell line: 2>()
1 # Attempting to serialize without specifying byte/binary format.
2 with open("desired-model-file-name.pkl", "w") as file:
----> 3 pickle.dump(model, file)

TypeError: write() argument must be str, not bytes

When trying to serialize an object, if you forget to specify the binary text mode (b), the compiler will throw up the error above. The simple solution is to include ‘b’ before or after the ‘w’ open text mode.

Common Mistake 2: Attempting to serialize with read text mode

# Attempting to serialize with read text mode
with open("desired-model-file-name.pkl", "rb") as file:
pickle.dump(model, file)

>>> Output Below
---------------------------------------------------------------------------
UnsupportedOperation Traceback (most recent call last)
c:\Users\model-serialization-deserialization\model-serialization-deserialization.ipynb Cell 11 in <cell line: 2>()
1 # Attempting to serialize with read text mode
2 with open("desired-model-file-name.pkl", "rb") as file:
----> 3 pickle.dump(model, file)

UnsupportedOperation: write

Using the read text mode tells your compiler you are trying to get and not put information into a file therefore when such a mistake is made, the compiler simply suggests that you make use of the write text mode ‘w’ which you can place before or after the ‘b’ text mode.

Common Mistake 3: Attempting to serialize/deserialize without w/r text mode

# Attempting to serialize without specifying write text mode
with open("desired-model-file-name.pkl", "b") as file:
pickle.dump(model, file)

>>> Output Below
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
c:\Users\model-serialization-deserialization\model-serialization-deserialization.ipynb Cell 12 in <cell line: 2>()
1 # Attempting to serialize without specifying write text mode
----> 2 with open("desired-model-file-name.pkl", "b") as file:
3 pickle.dump(model, file)

ValueError: Must have exactly one of create/read/write/append mode and at most one plus
# Attempting to deserialize without specifying read text mode
with open("desired-model-file-name.pkl", "b") as file:
pickle.load(file)

>>> Output Below
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
c:\Users\model-serialization-deserialization\model-serialization-deserialization.ipynb Cell 13 in <cell line: 2>()
1 # Attempting to deserialize without specifying read text mode
----> 2 with open("desired-model-file-name.pkl", "b") as file:
3 pickle.load(file)

ValueError: Must have exactly one of create/read/write/append mode and at most one plus

Attempting to access a file without specifying the access mode will throw a value error. You can not read without specifying the read mode nor can you write without specifying the write mode. To fix this issue, simply specify the appropriate mode for the functionality you would like to perform either before or after the ‘b’ text mode.

Common Mistake 4: Attempting to deserialize an invalid file

# Attempting to deserialize invalid file
with open("desired-model-file-name.pkll", "rb") as file:
pickle.load(file)

>>> Output Below
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
c:\Users\model-serialization-deserialization\model-serialization-deserialization.ipynb Cell 14 in <cell line: 2>()
1 # Attempting to deserialize invalid file
----> 2 with open("desired-model-file-name.pkll", "rb") as file:
3 pickle.load(file)

FileNotFoundError: [Errno 2] No such file or directory: 'desired-model-file-name.pkll'

It’s very easy to forget a letter or symbol while trying to type in the name of a file, if you get the no such file error message, it simply means you have made a typographical error. To fix this, you have to either re-write the file path/name carefully or better still, copy and paste the exact path/name in order to avoid any errors.

Conclusion

Pickle is a very powerful python library that allows users to store and later open objects. It provides you with the ability to store several versions of your model and most importantly, it makes it easy to integrate your model into applications such as web applications. A saved model can easily be deserialized into a server-side file, the model parameters are collected through a form on the front end or any other method, passed into the model in the form of an array after any feature engineering if necessary and a result will be provided seamlessly. Gain access to the GitHub repo here.

By reading this article, you now know how to store and open your models and various other objects in python using the pickle library.

You can also read my article on creating full-stack web applications for machine learning model integration using Flask, HTML & CSS here.

That is all for now, see you again next time…

--

--

Einstein EBEREONWU

Google Certified Data analyst • DataScientist 💻• ML/AI 🤖• Pythonist.