Pickling in python
From Python Documentation, Pickling is the process whereby a Python object hierarchy is converted into a byte stream, and Un-pickling is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy.
Pickling and Un-pickling is alternatively known as “serialization”, “marshalling,” or “flattening” Also, note that
Warning : The
pickle
module is not secure. Only un-pickle data you trust.It is possible to construct malicious pickle data which will execute arbitrary code during un-pickling. Never un-pickle data that could have come from an untrusted source, or that could have been tampered with.
In this article let’s dive deep into pickling concepts
Introduction to serialisation
Serialising an object turns it into a linear stream of bytes. This can be done to save an object on a file, or to transmit it to another process. The byte stream can be deserialised (un-marshalled) to reconstruct the original object.
The most common way to serialise Python objects is called pickling. Python can also use JSON and XML for serialisation.
Python values and most built-in objects can be pickled, including user-defined classes at the top level of a module. Recursive and interconnected objects can also be pickled. However, generators, lambda functions, database connections, and threads, are a few things that cannot be pickled.
Pickling and un-pickling methods
To pickle or un-pickle objects, you first have to import its module. This can be done by the import pickle
command. Now, you can use the following methods:
pickle.dump(object, file)
savesobject
onto thefile
, which must be opened inwb
(write binary) mode.variable = pickle.load(file)
reconstructs the object previously written tofile
, which must be opened inrb
(read binary) mode.str = pickle.dumps(object
) savesobject
into thestr
variable as a string.object = pickle.loads(str)
reconstructs the object previously written to the stringstr
.
The following figure will help you better understand the functionality of these methods.
import os
import picklex = {'a': 1, 'b': 2}
y = [x, 3, x]
x['c'] = y
print('x {} \nBefore pickling: {}'.format(x, y))fi = open('ptest', 'wb')
pickle.dump(y, fi)
fi.close()fi = open('ptest', 'rb')
z = pickle.load(fi)
print('After pickling:', z)
Advantages of using Pickle Module:
- Recursive objects (objects containing references to themselves): Pickle keeps track of the objects it has already serialised, so later references to the same object won’t be serialised again.
- Object sharing (references to the same object in different places): This is similar to self- referencing objects; pickle stores the object once, and ensures that all other references point to the master copy. Shared objects remain shared, which can be very important for mutable objects.
- User-defined classes and their instances: Marshal does not support these at all, but pickle can save and restore class instances transparently. The class definition must be importable and live in the same module as when the object was stored.
As I said earlier, Pickling is not secure. It can contain code objects and data that can attack your system. Make sure that anything you un-pickle comes from a trusted source and has not been tampered with in transit. Until next time Adios!