Pain points in the candidate retrieval system

3 min readJul 25, 2022

This article discusses a list of pain points in the candidate retrieval system. For the context of the candidate retrieval, please check a previous article on the overall idea of the candidate retrieval step in a recommendation system.

There are many pain points to build, scale and maintain the candidate retrieval system.

(1) Build the candidate retrieval system

Assume that this step only builds a E2E system to support the retrieval on millions of vectors, here is a list of pain points:

Take a lot of engineering effort to build multiple components, such as the vector/metadata storage, index publisher and index server. In addition, In medium/big organizations, cross team efforts are needed to coordinate the design and implementation. One example is which retrieval strategy and library to use. For retrieval strategy, we should use the approximate nearest neighbor (ANN) instead of KNN exact strategy. For retrieval library, there are many libraries (i.e. Faiss, HNSWLib, Annoy, ScaNN) for ANN search. The final decision can be based on many factors, two of which are recall and speed. ANN benchmark can give a high level comparison across different libraries.
Metadata filtering. Open source libraries doesn’t provide metadata filtering (only find similar shoe products). But this is a very important feature of vector search in a variety of AI applications, like e-commerce. it means that extra work on metadata filtering is needed.
Operational work to deploy to production environment. One example is regionalization to reduce the E2E latency. Assume we want to deploy this system to multiple region so that user requests in a specific region can be fast served by that specific region infra. This way, the operational work is needed to set up infra footprint per region.

(2) Scale the candidate retrieval system

In order to scale the system to handle 100M or B level vectors, we need to shard vectors. Here is a list of pain points.

Cost increased accordingly. Before, we only have a single shard. Assume that a single shard has 10 instances. Currently, we have 2 shards, then we need to have 20 instances to support the QPS and the number of vectors. If the cost budget is constrained, then additional engineering effort is needed to do cost optimization.
No real time update as the index publisher does batch index. More vectors, the longer the batch index takes which increases the time (SLA) the vector changes show up in the serving time.
Custom logic to support multiple indices. Difference indices can be used for experiments of different embeddings. The complexity to support multiple indices depends on the implementation of the candidate retrieval system.
Global elasticity. Additional work is needed to increase the number of shards. The worst case can happen is that if the capacity can’t support the incoming number of vectors, the whole system just breaks.

(3) Maintain the system

Observability. A dedicated observability system is needed to monitor the production system to ensure everything is going well. This also can help give insight on how the delivery performance looks like in regards to specific vector.
Operational. Normally, the system is built on GKE. More shards means more operational work. This way, a lot of tools needs to add to debug, prevent and fix production issues.

The above list is not an exhaustive list of pain points. As we can see, it is hard to evolve the candidate retrieval system along the time based on the business requirements.

Pain points in the candidate retrieval system

Written by Jun Xie