Demystifying the Milvus Sizing Tool

3 min readApr 28, 2024

In today’s rapidly evolving data landscape, selecting the optimal configuration for your Milvus deployment is crucial for ensuring efficient performance and resource utilization. With the many options available, choosing the right configuration can feel overwhelming.

Here are 3 crucial points to consider when using the Milvus sizing tool.

Index Selection: Balancing Memory, Disk, Cost, Accuracy, and Speed

Milvus offers various index algorithms (HNSW, FLAT, IVF_FLAT, IVF_SQ8) with trade-offs in memory usage, disk space, cost, speed, and accuracy. HNSW is usually the recommended choice since it balances performance and memory. See this blog for more details about these indexes.

HNSW:

Combines two concepts: skip lists and Navigable Small Worlds (NSWs) graphs. HNSW creates a hierarchical list of NSWs. HNSW search starts at the top layer, moving down layer-by-layer to find the nearest neighbor in each layer. The top layer has the fewest and the bottom layer has the most nodes in the graph.
Very fast querying and excellent recall. Requires the most memory per vector, so will likely cost the most.

FLAT:

100% recall (exhaustive search).
Queries speed is incredibly slow, (O(n) for data size n), and the index is the same size as the vector data.

IVF_FLAT:

Divides the vector space into clusters, search is conducted only over nlist clusters, improving search speed compared to IVF_FLAT.
Medium-high recall, and medium query speed (slower than HNSW but faster than FLAT).
Requires more memory than HNSW, but less memory than FLAT.

IVF_SQ8:

Utilizes scalar quantization to reduce disk, compute, and memory consumption by 70–75%.
Medium recall, medium-high query speed.
Offers a better option than IVF_FLAT when resources are limited, at the cost of lower accuracy.

Besides the most common floating point indexes listed above, Milvus also supports ScANN (20% faster on CPU than HNSW), Binary-, Sparse-, and disk-based indexes, see the Milvus index doc pages.

DISKANN is a hybrid disk/memory index, and is a good option if you’re okay with slightly longer latency (~100ms or so) but need to support a lot of vectors with high recall. AUTOINDEX just defaults to HNSW in open source Milvus (or higher-performing proprietary indexes in Zilliz).

GPU_CAGRA is the fastest of the GPU indexes, but it requires an inference card with GDDR memory rather than the one with HBM. Other GPU indexes supported are: GPU_BRUTE_FORCE, GPU_IVF_FLAT, GPU_IVF_PQ.

Segment Size and Deployment Configuration

The sizing tool offers three segment sizes (512 MB, 1024 MB, 2048 MB). The default segment size is 512 MB. Fewer, larger segments typically means faster search, so if you have large data, 2GB is usually recommended.

Think of segments as chunks of data; they’re the smallest units in Milvus used for load balancing and enabling distributed search on indexes. Our quick rule of thumb:

For query node sizes of 4GB-8GB, use 512MB segments.
For query nodes <16GB, use 1GB segments.
For query nodes >16GB, opt for 2–4GB segment sizes.

Between Pulsar or Kafka, for new projects (green field installations), Pulsar is the recommended way to go since there’s less overhead per topic.

Additional cost and speed configurations are available in the Enterprise version of Zilliz Cloud. For more information, see our cloud sizing tool:

Out of Memory (OOM) reduction and compaction optimization for peak performance.
Lazy Load Storage Savings:
Store hot data efficiently with standard compute units (CUs).
Tiered storage CUs for cost-effective storage of rarely accessed (cold) data.

Conclusion

Remember, This is just a starting point! Milvus offers extensive customization options.

The Milvus sizing tool focuses on a single index. If you need different index algorithms for various collections, create separate collections with custom configurations. This might require a more complex deployment setup.

Reference Links

Resource planning: https://docs.zilliz.com/docs/resource-planning

Zilliz cloud pricing calculator: https://zilliz.com/pricing#estimate_your_cost

Intro Milvus indexes: https://thesequence.substack.com/p/guest-post-choosing-the-right-vector

Docs Milvus Indexes: https://milvus.io/docs/index.md

Milvus GPU CAGRA index: https://zilliz.com/blog/Milvus-introduces-GPU-index-CAGRA

Demystifying the Milvus Sizing Tool

Index Selection: Balancing Memory, Disk, Cost, Accuracy, and Speed

Segment Size and Deployment Configuration

Conclusion

Reference Links

Written by Zilliz