How good is Quantization in Milvus?

Tim Spann
7 min readJun 25, 2024

--

Milvus, Zilliz, Vector Database, Open Source, Python, Quantization, AI, Machine Learning, Artificial Intelligence

In this article I will be referring and running examples in Milvus 2.4.x branch, always run the latest unless you have a reason not to. There are lots of easy tools to upgrade if you need to.

Latest Release as of time of writing.

Milvus supports a lot of options for quantization and these all have multiple parameters and options to tweak these to optimize them for your use case. There is a lot involved in determining which one to use when, this includes things like what kind of hardware you have, how fast a query you need, how much memory you have, how many vectors and how long of a recall time.

So the first good thing about Quantization in Milvus is that it provides many options based on your requirements including memory, hardware, GPU, recall rate and speed. Choose your own quantization-based index adventure in vector databases!

First off, make sure you know what Milvus is and does.

Next take a close look at Vector Databases.

Now take a look a hands-on dive into Scalar Quantization.

Ok, now you are ready for some more details.

Quantization-based Indexes in Milvus for Floating Point Embeddings

IVF_FLAT

  • High-speed query
  • Requires a recall rate as high as possible

This index uses the original vector representation as encoding.

Inverted File FLAT Index uses approximate nearest neighbors (ANNs) and divides embeddings into several non-intersecting partitions.

Parameters: nlist, nprobe

IVF_SQ8 (Scalar Quantization)

  • High-speed query
  • Limited memory resources
  • Accepts minor compromise in recall rate

Scalar quantization involves mapping floating-point numbers representing each vector dimension to integers.

See:

IVF_PQ (Product Quantization)

  • Very high-speed query
  • Limited memory resources
  • Accepts substantial compromise in recall rate

Product quantization divides vector embeddings into subvectors, performs clustering within each subvector to create centroids, and encodes each subvector with the ID of the nearest centroid. This method creates non-intersecting partitions within subvectors, similar to IVF-FLAT.

SCANN (Score-aware quantization loss) — SIMD (Single-Instruction / Multi-data)

  • Very high-speed query
  • Requires a recall rate as high as possible
  • Large memory resources
  • CPU based
  • In-Memory

This index can use raw data in the index or not and can use from one to 65,536 cluster units. This is most similiar IVF_PQ but utilizes SIMD for parallel computing.

Quantization-based Indexes in Milvus with GPUs

GPU_IVF_FLAT

Requires memory equal to the size of the original data.

When conducting searches, you can set the top-K up to 256 for any search against a GPU_IVF_FLAT-indexed collection.

GPU_IVF_PQ

Utilizes a smaller memory footprint, which depends on the compression parameter settings.

When conducting searches, note that you can set the top-K up to 8192 for any search against a GPU_IVF_FLAT-indexed collection.

Quantization-based Indexes in Milvus for Binary Embeddings

Binary quantization represents a transformative approach to managing and searching vector data within Milvus, offering significant enhancements in both performance and efficiency. By simplifying vector representations into binary codes, this method leverages the speed of bitwise operations, substantially accelerating search operations and reducing computational overhead.” — Mostafa Ibrahim

There are two indexes for that listed below.

BIN_FLAT

  • Depends on relatively small datasets.
  • Requires perfect accuracy.
  • No compression applies.
  • Guaranteed exact search results.

Warning: the slowest index on our list

BIN_IVF_FLAT

  • High-speed query
  • Requires a recall rate as high as possible

This is the same as IVF_FLAT but for Binary Embeddings.

This requires a parameter ofnlistto represent a nubmer of cluster units from 1 to 65,536. The encoded data stored in each unit is consistent with the original data. Using the nprobe parameter to set the number of units to query. The final parameter max_empty_result_buckets lets you stop the query quickly when nothing is returning. This is the maximum number of buckets not returning any search results. This is a range-search parameter and terminates the search process whilst the number of consecutive empty buckets reaches the specified value between 1 and 65535. This parameter defaults to 2 and is probably good for most use cases.

A great resource for picking your vector index is here:

So how good is Quanization in Milvus? As good as it gets, but with everything in the evolving work of AI, improvements and enhancements are always on the horizon as AI innovation is constant.

Milvus Recent Release — Milvus 2.4.5

On June 18, 2024 Milvus 2.4.5 was released.

Milvus 2.4.5 simplifies sparse, float16, and bfloat16 vector search with auto-indexing, improvements to search speed, deletions, and compactions with Bloom filter optimizations. The Milvus 2.4.5 release also improves data management through faster loading times and by adding support for import L0 segments. It also introduces the sparse HNSW index for efficient high-dimensional sparse data search, enhances the RESTful API with sparse float vector support, and fixes critical bugs for better stability. If you are running a Milvus 2.4.X database I recommend you install a new 2.4.5 in Docker and do a test run. After which prepare moving one cluster over to the 2.4.5 release.

New Features

  • Added RBAC support to Describe/Alter Database API
  • Support added for building HNSW indexes for Sparse Vectors
  • Support for building Disk indexes on binary vectors
  • Support added for Sparse Vector types on RESTful v2 API calls
  • Added a new RESTful API command to stop a component

Bug Fixes

  • Fixed a bug that could cause Milvus to be unable to create AutoIndex on binary and sparse vectors
  • Prevent possible data loss during deletion

https://github.com/milvus-io/milvus/releases/download/v2.4.5/milvus-standalone-docker-compose.yml

https://github.com/milvus-io/milvus/releases/download/v2.4.5/milvus-standalone-docker-compose-gpu.yml

The Future is: Milvus Roadmap

RESOURCES

Let me know in the comments if you liked what you saw, how I can improve and what should I show next? Thanks, hope to see you soon at a Meetup in Princeton, Philadelphia, New York City or on Youtube.

Get Milvused!

Read my Newsletter every week!

For more cool Unstructured Data, AI and Vector Database videos check out the Milvus vector database videos here:

https://www.linkedin.com/company/zilliz/

https://www.linkedin.com/in/timothyspann/

https://milvusio.medium.com

--

--

Tim Spann

Principal Developer Advocate, Zilliz. Milvus, Attu, Towhee, GenAI, Big Data, IoT, Deep Learning, Streaming, Machine Learning. https://www.datainmotion.dev/