Unstructured Data Service
Testing IVF_SQ8 Index in the Milvus Vector Database
How is the performance of Milvus, a database for AI
This article introduces the test reports of IVF_SQ8 index on SIFT1B dataset on standalone Milvus.
TLDR: When nq < 500, searching with CPU is faster. However, as nq becomes larger, GPU wins over CPU significantly.
Test objective
To compare the search time and recall rate as nq and topk varies.
Test metrics
- Query Elapsed Time: Time cost (in seconds) to run a query. The variable that affects query time is nq - number of queried vectors.
- Recall: The fraction of the total amount of relevant instances that were actually retrieved . Variables that affects recall rate are: a) nq - number of queried vectors; b) topk - top k result of a query.
Hardware/software conditions
- Operating System: CentOS Linux release 7.6.1810 (Core)
- CPU: Intel(R) Xeon(R) CPU E5–2678 v3 @ 2.50 GHz
- GPU0: GeForce GTX 1080
- GPU1: GeForce GTX 1080
- Memory: 503 GB
- Docker version: 18.09
- NVIDIA Driver version: 430.34
- Milvus version: 0.5.3
- SDK interface: Python 3.6.8
- pymilvus version: 0.2.5
Parameter setup:
Dataset (SIFT1B)
- Data base: 1,000,000,000 vectors, 128-dimension
- Data type: hdf5
Table attributes
- nlist: 16384
- metric_type: L2
Query configuration
- nprobe: 32
Milvus configuration
- cpu_cache_capacity: 150
- gpu_cache_capacity: 6
- use_blas_threshold: 1100
Other condition
- Whether to restart Milvus after each query: No
Performance test
GPU mode (search_resources: gpu0, gpu1)
When nq is 1000, the query time of a 128-dimension vector is around 17 ms in GPU Mode.
CPU mode (search_resources: cpu, gpu0)
When nq is 1000, the query time of a 128-dimension vector is about 27 ms in CPU Mode.
Conclusion
When nq is small, the search time in CPU Mode is much less than that in GPU Mode. However, as nq becomes larger, GPU Mode is sinificantly faster.
The query elapsed time in GPU Mode consists of two parts: (1) CPU-to-GPU index copy time; (2) nprobe buckets search time. When nq < 500, CPU-to-GPU index copy time cannot be amortized efficiently, thus CPU Mode is a better choice; when nq > 500, GPU Mode is tpyically faster.
Recall test
GPU mode (search_resources: gpu0, gpu1)
CPU mode (search_resources: cpu, gpu0)
Conclusion
In both GPU and CPU modes, as nq increases, the recall gradually stabilizes to over 93%.
For more detailed testing results, check IVF_SQ8 test report.