How good is Quantization in Milvus?

7 min readJun 25, 2024

Milvus, Zilliz, Vector Database, Open Source, Python, Quantization, AI, Machine Learning, Artificial Intelligence

In this article I will be referring and running examples in Milvus 2.4.x branch, always run the latest unless you have a reason not to. There are lots of easy tools to upgrade if you need to.

Latest Release as of time of writing.

Release milvus-2.4.4 · milvus-io/milvus

v2.4.4 Release date: May 31, 2024 Milvus version Python SDK version Java SDK version Node.js SDK version 2.4.4 2.4.3…

github.com

Milvus supports a lot of options for quantization and these all have multiple parameters and options to tweak these to optimize them for your use case. There is a lot involved in determining which one to use when, this includes things like what kind of hardware you have, how fast a query you need, how much memory you have, how many vectors and how long of a recall time.

So the first good thing about Quantization in Milvus is that it provides many options based on your requirements including memory, hardware, GPU, recall rate and speed. Choose your own quantization-based index adventure in vector databases!

First off, make sure you know what Milvus is and does.

Milvus Vector Database

Milvus is a highly flexible, reliable, and blazing-fast cloud-native, open-source vector database. Learn how Milvus…

zilliz.com

Next take a close look at Vector Databases.

What is a Vector Database? - Zilliz blog

A vector database is a fully managed, no-frills solution for storing, indexing, and searching unstructured data through…

zilliz.com

Now take a look a hands-on dive into Scalar Quantization.

Scalar Quantization and Product Quantization - Zilliz blog

A hands-on dive into scalar quantization (integer quantization) and product quantization with Python.

zilliz.com

Ok, now you are ready for some more details.

Quantization-based Indexes in Milvus for Floating Point Embeddings

IVF_FLAT

High-speed query
Requires a recall rate as high as possible

This index uses the original vector representation as encoding.

Inverted File FLAT Index uses approximate nearest neighbors (ANNs) and divides embeddings into several non-intersecting partitions.

Parameters: nlist, nprobe

IVF_SQ8 (Scalar Quantization)

High-speed query
Limited memory resources
Accepts minor compromise in recall rate

Scalar quantization involves mapping floating-point numbers representing each vector dimension to integers.

See:

vector-search/tutorials/2023-03-02_data_science_dojo_00_ann_algorithms.ipynb at main ·…

Approximate nearest neighbor search in Python. Contribute to fzliu/vector-search development by creating an account on…

github.com

IVF_PQ (Product Quantization)

Very high-speed query
Limited memory resources
Accepts substantial compromise in recall rate

Product quantization divides vector embeddings into subvectors, performs clustering within each subvector to create centroids, and encodes each subvector with the ID of the nearest centroid. This method creates non-intersecting partitions within subvectors, similar to IVF-FLAT.

Scalar Quantization and Product Quantization - Zilliz blog

A hands-on dive into scalar quantization (integer quantization) and product quantization with Python.

zilliz.com

SCANN (Score-aware quantization loss) — SIMD (Single-Instruction / Multi-data)

Very high-speed query
Requires a recall rate as high as possible
Large memory resources
CPU based
In-Memory

This index can use raw data in the index or not and can use from one to 65,536 cluster units. This is most similiar IVF_PQ but utilizes SIMD for parallel computing.

Quantization-based Indexes in Milvus with GPUs

GPU_IVF_FLAT

Requires memory equal to the size of the original data.

When conducting searches, you can set the top-K up to 256 for any search against a GPU_IVF_FLAT-indexed collection.

GPU_IVF_PQ

Utilizes a smaller memory footprint, which depends on the compression parameter settings.

When conducting searches, note that you can set the top-K up to 8192 for any search against a GPU_IVF_FLAT-indexed collection.

GPU Index | Milvus Documentation

GPU index mechanism in Milvus.

milvus.io

Index with GPU | Milvus Documentation

This guide explains how to build an index with GPU support in Milvus to enhance search performance.

milvus.io

Quantization-based Indexes in Milvus for Binary Embeddings

Vector Database Stories

The Zilliz blog includes technical tutorials, customer stories, and industry developments related to the world's most…

zilliz.com

“Binary quantization represents a transformative approach to managing and searching vector data within Milvus, offering significant enhancements in both performance and efficiency. By simplifying vector representations into binary codes, this method leverages the speed of bitwise operations, substantially accelerating search operations and reducing computational overhead.” — Mostafa Ibrahim

There are two indexes for that listed below.

BIN_FLAT

Depends on relatively small datasets.
Requires perfect accuracy.
No compression applies.
Guaranteed exact search results.

Warning: the slowest index on our list

BIN_IVF_FLAT

High-speed query
Requires a recall rate as high as possible

This is the same as IVF_FLAT but for Binary Embeddings.

This requires a parameter ofnlistto represent a nubmer of cluster units from 1 to 65,536. The encoded data stored in each unit is consistent with the original data. Using the nprobe parameter to set the number of units to query. The final parameter max_empty_result_buckets lets you stop the query quickly when nothing is returning. This is the maximum number of buckets not returning any search results. This is a range-search parameter and terminates the search process whilst the number of consecutive empty buckets reaches the specified value between 1 and 65535. This parameter defaults to 2 and is probably good for most use cases.

In-memory Index | Milvus Documentation

Index mechanism in Milvus.

milvus.io

A great resource for picking your vector index is here:

How to Pick a Vector Index in Your Milvus Instance: A Visual Guide - Zilliz blog

In this post, we'll explore several vector indexing strategies that can be used to efficiently perform similarity…

zilliz.com

So how good is Quanization in Milvus? As good as it gets, but with everything in the evolving work of AI, improvements and enhancements are always on the horizon as AI innovation is constant.

Milvus Recent Release — Milvus 2.4.5

Release Notes | Milvus Documentation

Milvus Release Notes

milvus.io

On June 18, 2024 Milvus 2.4.5 was released.

Milvus 2.4.5 simplifies sparse, float16, and bfloat16 vector search with auto-indexing, improvements to search speed, deletions, and compactions with Bloom filter optimizations. The Milvus 2.4.5 release also improves data management through faster loading times and by adding support for import L0 segments. It also introduces the sparse HNSW index for efficient high-dimensional sparse data search, enhances the RESTful API with sparse float vector support, and fixes critical bugs for better stability. If you are running a Milvus 2.4.X database I recommend you install a new 2.4.5 in Docker and do a test run. After which prepare moving one cluster over to the 2.4.5 release.

New Features

Added RBAC support to Describe/Alter Database API
Support added for building HNSW indexes for Sparse Vectors
Support for building Disk indexes on binary vectors
Support added for Sparse Vector types on RESTful v2 API calls
Added a new RESTful API command to stop a component

Bug Fixes

Fixed a bug that could cause Milvus to be unable to create AutoIndex on binary and sparse vectors
Prevent possible data loss during deletion

Release milvus-2.4.5 · milvus-io/milvus

v2.4.5 Release date: June 18, 2024 Milvus version Python SDK version Java SDK version Node.js SDK version 2.4.5 2.4.4…

github.com

Enable RBAC | Milvus Documentation

Learn how to manage users, roles, and privileges.

milvus.io

https://github.com/milvus-io/milvus/releases/download/v2.4.5/milvus-standalone-docker-compose.yml

https://github.com/milvus-io/milvus/releases/download/v2.4.5/milvus-standalone-docker-compose-gpu.yml

The Future is: Milvus Roadmap

Milvus Roadmap | Milvus Documentation

Milvus is an open-source vector database built to power AI applications. Here is our roadmap to guide our development.

milvus.io

RESOURCES

Vector Database Stories

The Zilliz blog includes technical tutorials, customer stories, and industry developments related to the world's most…

zilliz.com

In-memory Index | Milvus Documentation

Index mechanism in Milvus.

milvus.io

Release Notes | Milvus Documentation

Milvus Release Notes

milvus.io

Similarity Metrics | Milvus Documentation

Milvus supports a variety of similarity metrics, including Euclidean distance, inner product, Jaccard, etc.

milvus.io

Milvus support for multiple Index types

medium.com

Efficient Vector Similarity Search in Recommender Workflows Using Milvus with NVIDIA Merlin

An introduction to NVIDIA Merlin and Milvus integration in building recommender systems and benchmarking its…

milvus.io

Exploring Vector Databases with Milvus

This post introduces vector databases with Milvus, an open-source vector data management system to efficiently store…

medium.com

How to Pick a Vector Index in Your Milvus Instance: A Visual Guide - Zilliz blog

In this post, we'll explore several vector indexing strategies that can be used to efficiently perform similarity…

zilliz.com

Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval

We're on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Let me know in the comments if you liked what you saw, how I can improve and what should I show next? Thanks, hope to see you soon at a Meetup in Princeton, Philadelphia, New York City or on Youtube.

Get Milvused!

Vector database — Milvus

Milvus is a powerful vector database tailored for processing and searching extensive vector data. It stands out for its…

milvus.io

Read my Newsletter every week!

AIM Weekly 17 June 2024

17-June-2024

medium.com

For more cool Unstructured Data, AI and Vector Database videos check out the Milvus vector database videos here:

Zilliz

Zilliz is a leading vector database company for production-ready AI. Built by the engineers who created Milvus, the…

www.youtube.com

x.com

Edit description

x.com

Edit description

x.com

https://www.linkedin.com/company/zilliz/

https://www.linkedin.com/in/timothyspann/

Join the Milvus Discord Server!

Check out the Milvus community on Discord — hang out with 1734 other members and enjoy free voice and text chat.

discord.com

https://milvusio.medium.com

Open Source Vector Databases

Open Source Vector Databaseswww.opensourcevectordb.cloud

How good is Quantization in Milvus?

Release milvus-2.4.4 · milvus-io/milvus

v2.4.4 Release date: May 31, 2024 Milvus version Python SDK version Java SDK version Node.js SDK version 2.4.4 2.4.3…

Milvus Vector Database

Milvus is a highly flexible, reliable, and blazing-fast cloud-native, open-source vector database. Learn how Milvus…

What is a Vector Database? - Zilliz blog

A vector database is a fully managed, no-frills solution for storing, indexing, and searching unstructured data through…

Scalar Quantization and Product Quantization - Zilliz blog

A hands-on dive into scalar quantization (integer quantization) and product quantization with Python.

Quantization-based Indexes in Milvus for Floating Point Embeddings

vector-search/tutorials/2023-03-02_data_science_dojo_00_ann_algorithms.ipynb at main ·…

Approximate nearest neighbor search in Python. Contribute to fzliu/vector-search development by creating an account on…

Scalar Quantization and Product Quantization - Zilliz blog

A hands-on dive into scalar quantization (integer quantization) and product quantization with Python.

Quantization-based Indexes in Milvus with GPUs

GPU_IVF_FLAT

GPU_IVF_PQ

GPU Index | Milvus Documentation

GPU index mechanism in Milvus.

Index with GPU | Milvus Documentation

This guide explains how to build an index with GPU support in Milvus to enhance search performance.

Quantization-based Indexes in Milvus for Binary Embeddings

Vector Database Stories

The Zilliz blog includes technical tutorials, customer stories, and industry developments related to the world's most…

In-memory Index | Milvus Documentation

Index mechanism in Milvus.

How to Pick a Vector Index in Your Milvus Instance: A Visual Guide - Zilliz blog

In this post, we'll explore several vector indexing strategies that can be used to efficiently perform similarity…

Milvus Recent Release — Milvus 2.4.5

Release Notes | Milvus Documentation

Milvus Release Notes

New Features

Bug Fixes

Release milvus-2.4.5 · milvus-io/milvus

v2.4.5 Release date: June 18, 2024 Milvus version Python SDK version Java SDK version Node.js SDK version 2.4.5 2.4.4…

Enable RBAC | Milvus Documentation

Learn how to manage users, roles, and privileges.

The Future is: Milvus Roadmap

Milvus Roadmap | Milvus Documentation

Milvus is an open-source vector database built to power AI applications. Here is our roadmap to guide our development.

RESOURCES

Vector Database Stories

The Zilliz blog includes technical tutorials, customer stories, and industry developments related to the world's most…

In-memory Index | Milvus Documentation

Index mechanism in Milvus.

Release Notes | Milvus Documentation

Milvus Release Notes

Similarity Metrics | Milvus Documentation

Milvus supports a variety of similarity metrics, including Euclidean distance, inner product, Jaccard, etc.

Milvus support for multiple Index types

Efficient Vector Similarity Search in Recommender Workflows Using Milvus with NVIDIA Merlin

An introduction to NVIDIA Merlin and Milvus integration in building recommender systems and benchmarking its…

Exploring Vector Databases with Milvus

This post introduces vector databases with Milvus, an open-source vector data management system to efficiently store…

How to Pick a Vector Index in Your Milvus Instance: A Visual Guide - Zilliz blog

In this post, we'll explore several vector indexing strategies that can be used to efficiently perform similarity…

Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval

We're on a journey to advance and democratize artificial intelligence through open source and open science.

Vector database — Milvus

Milvus is a powerful vector database tailored for processing and searching extensive vector data. It stands out for its…

AIM Weekly 17 June 2024

17-June-2024

Zilliz

Zilliz is a leading vector database company for production-ready AI. Built by the engineers who created Milvus, the…

x.com

Edit description

x.com

Edit description

Join the Milvus Discord Server!

Check out the Milvus community on Discord — hang out with 1734 other members and enjoy free voice and text chat.

Open Source Vector Databases

Open Source Vector Databases

Written by Tim Spann