Pickling in Python

Alison Salerno
The Startup
Published in
4 min readMay 25, 2020

Unpickling Pickle

When you think about pickles in the real world, a jar full of small cucumbers or some other vegetable immersed in a vinegary brine probably comes to mind. And if you take a moment to ponder pickles a bit further, a few additional thoughts may come to mind — for one, pickles are delicious. But two, the process of pickling is all about preserving and extending the shelf life of a food for later use.

The Pickle in Python is more or less the same idea; preserving Python objects for later use.

I was first introduced to Pickle when I started diving into Machine Learning algorithms and creating my own models. Pickle allows you to preserve your model that you’ve just trained, fit and tested to predict on new data at a later time. This is particularly useful if your model took hours to train — instead of having to train it again, you can just preserve (pickle) it and then unload it later for immediate use! But, how exactly does it work, and why?

In technical terms, the Python pickle is a Python module used to serialize (and deserialize) an object structure into (and from) a byte stream. Essentially, serializing an object means transforming it into a format that can be stored, and then deserializing it later, recreating the original object from the serialized format. The character (byte) stream contains all the information necessary to reconstruct the object in another Python script.

Pickle (serialize) example

#import the pickle module and dump method 
from pickle import dump
#trained model
model.fit(x_train,y_train)
#pickle (serialize) the model to an opened file for writing
dump(model,open('model.pkl','wb'))

In the above code snippet , on line 3, the model object was converted to a binary format and stored. I first used dump(), which takes two arguments: the object to pickle and the file to which the object has to be saved. In the second argument, I also simultaneously opened the file for writing, using the open() function. And within this function, the first argument is setting the name of the file. The second argument is ‘wb’. The w means that you’ll be writing to the file, and b refers to binary mode — “write binary”. This means that the data will be written in the form of byte objects.

Now, a new file named model.pkl will appear in the same directory as the Python script.

Unpickling (deserialize) example

#import the pickle module and load method
from pickle import load
#deserialize (unpickle) the model to use on a new data set
rfc=load(open('model.pkl','rb'))
rfc.predict(new_data)

The process of loading, unpickling, the pickled object back into a Python program is similar. Again, theopen() function is used, but this time with 'rb' . The r refers to read, b refers to binary mode — thus, the data is being deserialized from the byte stream,(read binary).

Other use cases and when not to use Pickle

According to the Pickle module documentation, the following types can be pickled:

  • None, true, and false
  • Integers, long integers, floating point numbers, complex numbers
  • Strings, bytes, bytearrays
  • Tuples, lists, sets, and dictionaries containing only picklable objects
  • Functions defined at the top level of a module (using def, not lambda)
  • Built-in functions defined at the top level of a module
  • Classes that are defined at the top level of a module

It should also be noted that there are instances when Pickle should not be used. One major takeaway here is that if you want to use data across different programming languages, Pickle’s protocol is specific to Python so Pickle should not be used.

Additionally, generators, inner classes, and lambda functions cannot be pickled with Pickle. There are some workarounds here, for example, in the case of lambda functions, an additional package known as dill will need to be used.

Concluding thoughts

Python’s pickle module is incredibly useful to convert an object hierarchy to a stream of bytes that can be saved to a disk or transmitted over a network. However, the below warning must be kept in mind during deserialization as noted on the Pickle module documentation

“The pickle module is not secure. Only unpickle data you trust. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling. Never unpickle data that could have come from an untrusted source, or that could have been tampered with.”

This is an important word of warning to keep in mind when unpickling data that you did not pickle. There is no processing or interpretation of the Python object as it’s pickled or unpickled, so it is essential to be cautious around unpickling a file of unknown origin.

Python also offers two other modules that allow you to serialize and deserialize objects:

  1. The marshal module
  2. The json module — works with other languages and is human readable.

Resources

  1. https://realpython.com/python-pickle-module/
  2. https://www.geeksforgeeks.org/understanding-python-pickling-example/
  3. https://pythontips.com/2013/08/02/what-is-pickle-in-python/

--

--