Serialization and Deserialization Techniques in Python

Asha
7 min readAug 28, 2020

--

Using pickle, joblib, Yaml, JSON, dill …

This post is focused on different ways to interact with JSON with Python, pickl, joblib and yaml.

Machine learning models Serialization and deserialization using JSON, pickle, joblib, yamel and manual methods.
Here, will see the different ways to interact with JSON with Python, pickl, dill, joblib and yaml.

What is Serialization and Deserialization ?

Serialization is the process of translating data structures or object state into a format that can be stored (for example, in a file or memory buffer) or transmitted and reconstructed later.

In serialization, an object is transformed into a format that can be stored, so as to be able to deserialize it later and recreate the original object from the serialized format in the context of data storage.

There are a few ways to put trained machine learning (ML) models into production. The most common method is to serialize the model using some particular format after training, and deserialize that model in the production environment.

The common methods for serializing and deserializing methods are JSON, Pickle, Joblib, onnx or pmml, keras supports hdf5 and alternative serialization packages dill which generalizes pickle at the cost of performance.

Lets see few practical examples of JSON, Pickle, joblib, Yaml

JSON Serialization and Deserialization

  • Any Python object can be serialized into JSON format and vice versa.
  • Without involving any objects as well, JSON strings can be formed and interchanged between any two processes, client and server as data.
  • Several REST APIs and web services return data as JSON. Even the error messages from the REST APIs are returned as JSON strings.
  • JSON is a text based data interchange format
  • Though the name has “JavaScript” on it, JSON is a language independent data interchange format. its abbreviation of JavaScript Object Notation.
JSON installation in python using pip

Encoding of Python Objects in JSON

  • A python string is converted into a JSON string.
  • Integers and floating-point numbers are converted into JSON numbers.
  • The Boolean value True is converted into JSON constant true.
  • The Boolean value False is converted into JSON constant false.
  • The python module json converts a python dictionary object into JSON objects ,where as list and tuple are converted into JSON array.

Can see the examples in my github Jupyter notebook

Pickle Serialization and Deserialization

Pickling is the process whereby a Python object hierarchy is converted into a byte stream (usually not human readable) to be written to a file, this is also known as Serialization. Unpickling is the reverse operation, whereby a byte stream is converted back into a working Python object hierarchy.

install pickle in pyhton using pip

Pickle is operationally simplest way to store the object. The Python Pickle module is an object-oriented way to store objects directly in a special storage format.

Python pickle module is used for serializing and de-serializing a Python object structure. Any object in Python can be pickled so that it can be saved on disk. What pickle does is that it “serializes” the object first before writing it to file. Pickling is a way to convert a python object (list, dict, etc.) into a character stream. The idea is that this character stream contains all the information necessary to reconstruct the object in another python script.

The pickle module is used for implementing binary protocols for serializing and de-serializing a Python object structure.

  • Pickling: It is a process where a Python object hierarchy is converted into a byte stream.
  • Un pickling: It is the inverse of Pickling process where a byte stream is converted into an object hierarchy.

Methods provided by the pickle module :

The pickle interface provides four different methods.

dump() − The dump() method serializes to an open file (file-like object).

dumps() − Serializes to a string

load() − Deserializes from an open-like object.

loads() − Deserializes from a string.

Advantages & Disadvantages of pickle

  1. Object sharing : This is similar to self- referencing objects; pickle stores the object once, and ensures that all other references point to the master copy. Shared objects remain shared, which can be very important for mutable objects.
  2. Cross language : Cross language compatibility is not guaranteed and Unpickling a file that was pickled in a different version of Python may not always work properly
  3. User-defined classes and their instances: Marshal does not support these at all, but pickle can save and restore class instances transparently. The class definition must be importable and live in the same module as when the object was stored.
  4. Recursive objects: Pickle keeps track of the objects it has already serialized, so later references to the same object won’t be serialized again.
  5. cross-language compatibility is not guaranteed.

we can pickle objects with the following data types like Booleans, Integers, Floats, Complex numbers, Strings, Tuples, Lists, Sets, Dictionaries and classes and functions.

see the major differences between JSON and Pickle

Pickle vs JSON

JSON is a lightweight format for data-interchange, that is easily readable by humans. Although it was derived from JavaScript, JSON is standardized and language-independent. This is a serious advantage over pickle. It’s also more secure and much faster than pickle.

However, if you only need to use Python, then the pickle module is still a good choice for its ease of use and ability to reconstruct complete Python objects.

For Machine learning model deployment usually pickl is a good choice than JSON especially while working with complex models.

Joblib Serialization and Deserialization

The second tool we describe is pickle the standard Python tool for object (de)serialization. Afterwards, we look at the joblib library which offers easy (de)serialization of objects containing large data arrays, and finally we present a manual approach for saving and restoring objects to/from JSON (JavaScript Object Notation). None of these approaches represents an optimal solution, but the right fit should be chosen according to the needs of our project.

install joblib using pip

The joblib package provides dump and load functions for serializing Python objects, with particular optimizations for large numpy arrays. It is intended to be a drop-in replacement for pickle and can be effective for sklearn models which store lots of data internally, such as random forest or cluster-based classifiers.

Pickle requires a file object to be passed as an argument, Joblib works with both file objects and string filenames. In case your model contains large arrays of data, each array will be stored in a separate file, but the save and restore procedure will remain the same. Joblib also allows different compression methods, such as ‘zlib’, ‘gzip’, ‘bz2’, and different levels of compression.

biggest drawback of the Pickle and Joblib tools is its compatibility over different models and Python versions.

Sometimes if we face compatible issues we might need to save and re store model manually in the form of JSON. See my github repository for manual way of serializing and deserializing model

YAML Serialization and Deserialization

YAML is my favorite format. It is a human-friendly data serialization format. Unlike Pickle and JSON, it is not part of the Python standard library, so you need to install it

insatll YAML using pip

Anyway, whenever you want to have full control over the save and restore process, the best way is to build your own functions manually.

“dill” Serialization and Deserialization

The dill package extends the functionality of pickle by enabling the serialization of a much larger set of Python objects

Example of manually saving and restoring objects using JSON. This approach allows us to select the data which needs to be saved, such as the model parameters, coefficients, training data, and anything else we need.

h5 Serialization and Deserialization

Keras is a simple and powerful Python library for deep learning.
Lets see how to save and load them from disk using .h5 file
In this post, you will discover how you can save your Keras models to file and load them up again to make predictions using pimas indian diabets dataset.

check my github repository for the demonstration of serializing and deserializing deep learning models here

Thanks for reading :) and Clapping is known to improve the overall heart health and improve blood pressure.So waiting for your claps…

Don’t wait… join hands TOGETHER !!!

Ref: https://github.com/Asha-ai/serialization-and-deserialization/blob/master/README.md

--

--