Storing Images into MongoDB using “Blosc” Module
In this blog post, I will explain how store larger/multiple images efficiently into a MongoDB document using Blosc module. By using this method, you can preserve the size of the image along with its quality while storing and retrieving images from MongoDB. The entire implementation is explained using Python.
The Conventional Way
The conventional way of storing images into MongoDB is by converting binary images into Base64 format and then storing them into MongoDB. In this case, the Base64 conversion will increase the size of the images by 3 to 4 times. If we are about to store larger/multiple images in a single MongoDB document using this method, this might cause the problem of document exceeding maximum size (Maximum size allowed for a MongoDB document is 16 MB).
Introduction to Blosc
Blosc is a high performance compressor optimized for binary data. Blosc works well for compressing numerical arrays.
Blocking technique is used to reduce activity on the memory as much as possible. The blocking technique works by dividing datasets in blocks that are small enough to fit in L1 cache of modern processor and perform compression and decompression there. It also makes use of the Single Instruction, Multiple Data (SIMD) and multi-threading capabilities present in multi-core processors so as to speed up the compression/decompression process.
python-blosc is a Python package that wraps Blosc. Python 3.6 or higher versions are being supported by
Though blosc can be used on binary data, in this blog we are going to look on using blosc to compress numpy arrays as this serves our two purposes:
- Storing images (in numpy array format) into MongoDB.
- Storing numpy arrays itself into MongoDB.
I am mentioning this separately because MongoDB does not support storing numpy arrays directly. So, blosc can be used as a method to store numpy arrays into MongoDB without getting the arrays distorted with respect to dimensions or data.
python-blosc package using Conda, use the following command:
$ conda install -c conda-forge python-blosc
python-blosc package via
pip, use the following command:
$ pip install blosc
python-blosc is dependent on
scikit-build packages. So it is necessary to have these packages installed before installing
python-blosc. Use the
pip command to install
scikit-build as follows.
$ pip install cmake$ pip install scikit-build
Compressing Numpy Array using Blosc
In first place, image needs to be converted into
numpy array followed by which we need to make use of the
pack_array function in
blosc to compress the numpy array. Steps involved are,
Modules to be imported for the operations illustrated here are mentioned below,
import numpy as np
from numpy import asarray
from PIL import Image
1. Opening the image using Pillow module.
img = Image.open(path/to/image)
2. Converting image into numpy array using asarray function.
image_array = asarray(img)
3. Compressing numpy array using pack_array function of blosc module.
compressed_bytes = blosc.pack_array(image_array)
On executing this function, numpy array will be converted into compressed bytes. The compressed bytes can be stored with ease into MongoDB document. This compression keeps the image data unadulterated. Size of the image will also be minimized to considerable extent without the image data getting distorted.
Decompressing Numpy Array using Blosc
On retrieving the compressed bytes from MongoDB document, you need to get back the image from compressed bytes.
- To perform decompression, we make use of unpack_array function in blosc module to convert compressed bytes back to numpy array.
decompressed_array = blosc.unpack_array(compressed_bytes)
2. From the numpy array that we got in the previous step, Image object can be retrieved using fromarray function.
im = Image.fromarray(decompressed_array)
3. Image object can be stored as an image file of desired extension using save function
im.save("filename.png", quality = 95)
On executing these functions, the retrieved compressed bytes can be decompressed back to numpy array and image can be restored. The size and quality of image will be as same as the image before compression.
Also, the compression and decompression happens very quickly making it time efficient as well. The
unpack_array methods use
unpickle respectively, behind the scenes.
There are many more compression/decompression techniques that blosc module supports. If you are interested in knowing more about blosc, the following documentation will be of more use.
Tutorials - python-blosc 1.9.2 documentation
Using python-blosc (or just , because we are going to talk always on how to use it in a Python environment) is pretty…
Hope you have got a fair idea on blosc and how it can be used to resolve the issue of storing larger/multiple images and numpy arrays into a MongoDB document. Will be glad to hear your suggestions and comments on the blog.