QUATNRIUM GUIDES

Storing Images into MongoDB using “Blosc” Module

Optimized Storage and Retrieval of Images in MongoDB

Kailash S
Quantrium.ai

--

In this blog post, I will explain how store larger/multiple images efficiently into a MongoDB document using Blosc module. By using this method, you can preserve the size of the image along with its quality while storing and retrieving images from MongoDB. The entire implementation is explained using Python.

The Conventional Way

The conventional way of storing images into MongoDB is by converting binary images into Base64 format and then storing them into MongoDB. In this case, the Base64 conversion will increase the size of the images by 3 to 4 times. If we are about to store larger/multiple images in a single MongoDB document using this method, this might cause the problem of document exceeding maximum size (Maximum size allowed for a MongoDB document is 16 MB).

Introduction to Blosc

Blosc is a high performance compressor optimized for binary data. Blosc works well for compressing numerical arrays.

Blocking technique is used to reduce activity on the memory as much as possible. The blocking technique works by dividing datasets in blocks that are small enough to fit in L1 cache of modern processor and perform compression and decompression there. It also makes use of the Single Instruction, Multiple Data (SIMD) and multi-threading capabilities present in multi-core processors so as to speed up the compression/decompression process.

SIMD Architecture

python-blosc is a Python package that wraps Blosc. Python 3.6 or higher versions are being supported by python-blosc.

Though blosc can be used on binary data, in this blog we are going to look on using blosc to compress numpy arrays as this serves our two purposes:

  • Storing images (in numpy array format) into MongoDB.
  • Storing numpy arrays itself into MongoDB.

I am mentioning this separately because MongoDB does not support storing numpy arrays directly. So, blosc can be used as a method to store numpy arrays into MongoDB without getting the arrays distorted with respect to dimensions or data.

Installing Blosc

To install python-blosc package using Conda, use the following command:

$ conda install -c conda-forge python-blosc

To install python-blosc package via pip, use the following command:

$ pip install blosc

NOTE: python-blosc is dependent on cmake and scikit-build packages. So it is necessary to have these packages installed before installing python-blosc. Use the pip command to install cmake and scikit-build as follows.

$ pip install cmake$ pip install scikit-build

Compressing Numpy Array using Blosc

In first place, image needs to be converted into numpy array followed by which we need to make use of the pack_array function in blosc to compress the numpy array. Steps involved are,

Modules to be imported for the operations illustrated here are mentioned below,

import numpy as np
from numpy import asarray
from PIL import Image
import blosc

1. Opening the image using Pillow module.

img = Image.open(path/to/image)

2. Converting image into numpy array using asarray function.

image_array = asarray(img)

3. Compressing numpy array using pack_array function of blosc module.

compressed_bytes = blosc.pack_array(image_array)

On executing this function, numpy array will be converted into compressed bytes. The compressed bytes can be stored with ease into MongoDB document. This compression keeps the image data unadulterated. Size of the image will also be minimized to considerable extent without the image data getting distorted.

Decompressing Numpy Array using Blosc

On retrieving the compressed bytes from MongoDB document, you need to get back the image from compressed bytes.

  1. To perform decompression, we make use of unpack_array function in blosc module to convert compressed bytes back to numpy array.
decompressed_array = blosc.unpack_array(compressed_bytes)

2. From the numpy array that we got in the previous step, Image object can be retrieved using fromarray function.

im = Image.fromarray(decompressed_array)

3. Image object can be stored as an image file of desired extension using save function

im.save("filename.png", quality = 95)

On executing these functions, the retrieved compressed bytes can be decompressed back to numpy array and image can be restored. The size and quality of image will be as same as the image before compression.

Also, the compression and decompression happens very quickly making it time efficient as well. The pack_array andunpack_array methods use pickle and unpickle respectively, behind the scenes.

There are many more compression/decompression techniques that blosc module supports. If you are interested in knowing more about blosc, the following documentation will be of more use.

Hope you have got a fair idea on blosc and how it can be used to resolve the issue of storing larger/multiple images and numpy arrays into a MongoDB document. Will be glad to hear your suggestions and comments on the blog.

--

--