Blog Cover

Milvus Reference Architectures

Zilliz
4 min readMay 9, 2024

This blog addresses some commonly asked questions regarding Milvus resource allocation based on specific use cases. Those questions include:

  • How much CPU and memory resources are needed for Milvus, based on a specific number of users or requests per second (RPS)?
  • How much CPU and memory resources are needed for Milvus, based on different mixes of READ and WRITE?

Understanding Your Workload Characteristics

The first step in allocating resources to Milvus is to understand your workload characteristics. These factors play a crucial role in determining Milvus’ computational power and memory requirements.

Below is an example list of Linux package-based reference architectures, where RPS means Requests Per Second:

Estimating Resource Requirements

To estimate the resource requirements for Milvus, we need to make a few assumptions:

  1. Reads: Each web request and Git pull is a READ operation.
  2. Writes: Each Git push is considered a WRITE operation.
  3. Volume and ratio of reads/writes: Milvus are assumed to be the same as the read/write ratio of API calls per number of users.
  4. Queries per second (QPS): must match the API RPS (requests per second) requirement per number of users.

We also need to estimate the data size per read/write request. We’ll assume a common GenAI use case:

  • Vector dimension: 1024 floating point numbers
  • Size in bytes per floating point number: 4 KB
  • Top_k (number of returned vectors): 10 vectors per search request
  • Size of a collection (database table): 1 million vectors per write request
  • Database index type: HNSW

Based on these assumptions, we can do back-of-napkin math to estimate the data size per read or write. Assume the vector dimension is 1024, and each vector takes 1024 * 4 bytes = 4 KB. Assume a typical top_k = 10 vectors per read. With these assumptions:

  • Each Milvus read operation processes around 40 KB of data.
  • Each Milvus write operation is estimated to involve 40 MB of data.

Milvus offers both insert (create an entirely new collection) and upsert (modify a few rows) functionalities (See the blog Milvus insert, upsert, delete for more information). We’ll over-estimate each WRITE operation to be a whole collection insert rather than just a few rows upserts.

Databases need to consider not only data size but search and insert speed. We will assume the collection is indexed using the popular HNSW index, which has Big-O notation, O(log n), search time.

With these assumptions, here is our conversion of Web-based users/RPS/reads/writes to Vector Database QPS/data size architecture tiers:

  • Up to 1,000 users = 20 QPS with 1 million vectors
  • Up to 2,000 users = 40 QPS with 1 million vectors
  • Up to 3,000 users = 60 QPS with 1 million vectors
  • Up to 5,000 users = 100 QPS with 2 million vectors
  • Up to 10,000 users = 200 QPS with 4 million vectors
  • Up to 25,000 users = 500 QPS with 10 million vectors
  • Up to 50,000 users = 1000 QPS with 20 million vectors

Load Testing and Benchmarking

To ensure the accuracy of our resource estimations, we load-tested and benchmarked the architecture tiers on VectorDBBench. We assumed the default Segment, Partition, Shard, Data node, Query node, and Index node sizes for the Milvus architecture itself.

Due to Milvus’s autoscaling capabilities, performance is linear with the data size and cluster resources! Below is a table showing the recommended Milvus and Zilliz Cloud (the fully managed Milvus) resource sizes for different data capacities and QPS requirements.

The table below shows data capacity in millions of 1024_dimension vectors. Milvus resources are given in several CPUs and GB of memory. For cost comparison, we show Zilliz Cloud resource sizes, given in Compute Units (cu), in either performance or capacity types.

Table of recommended Milvus and Zilliz resource sizes per User/RPS tiers

UsersData capacityQPS benchmarkedRPS requiredMilvus resourceZilliz resource 3,0001m_1024d vectors1200608CPU, 32G1cu-perf 3,0001m_1024d vectors24006016CPU, 64G2cu-perf 3,0001m_1024d vectors36006024CPU, 96G4cu-perf 10,0003.7m_1024d vectors36020016CPU, 64G2cu-cap 10,0003.7m_1024d vectors70020064CPU, 256G4cu-cap 25,00010m_1024d vectors600500196CPU, 768G12cu- cap 250,000100m_1024d vectors6000500019200CPU, 76800G1200cu- cap

Table of recommended Milvus and Zilliz resource sizes per number of Users/RPS tiers. Milvus scaling is linear with respect to data size and required QPS.

From the table above, when data size and QPS need to reach a certain threshold, it could be more cost-effective to run Milvus from Zilliz Cloud instead of on-premises.

Conclusion

By understanding your workload characteristics, estimating resource requirements based on assumptions, and leveraging load testing and benchmarking tools such as VectorDBBench, you can confidently provision the necessary resources for your Milvus deployment.

Refer to our cluster sizing guide for a deeper dive. Remember, as your workload evolves, it’s essential to regularly review and adjust your resource allocation to maintain peak performance.

References

--

--

Zilliz

Building the #VectorDatabase for enterprise-grade AI.