Building a modern ads serving platform with Redisearch

Chih Wei Hsieh
Dcard Tech Blog
Published in
6 min readJun 7, 2022

At Dcard ads team, we strive to build the best advertising serving platform to provide tailored ads for our users. For a typical ad request-to-impression round trip, a user’s profile needs to be retrieved firstly, followed by refining lists of available ads to high quality candidates, and finally ranking or reordering them by trained model or by applying predefined rules. Our ads serving platform needs to be scalable, available all the time, and the response latency should not be compromised even going through a series of time-consuming data I/O and calculation.

round trip of an ad request

When we started building our ads serving platform, we ran all our services on Kubernetes cluster and had an Elasticsearch cluster setup to find best matches to our users. Behind the scene, our online ads-user matching was a simple text search problem. Collections of ads were sorted based on their score, which was calculated by matching features on users and ads. Occasionally we found noticeable performance drops in Elasticsearch and significant resources consumption due to the growing features and to the evolving queries complexity. Poor performance such as slow P95 response time and the low availability caused by the constant request timeout during the peak traffic were issues we could barely ignore.

time spending of an ad request
P95 response time on the peak traffic when serving with Elasticsearch

Text search by Redisearch

With inverted indexes built on top of Redis, Redisearch is a new search engine that is capable to achieve extremely fast document searching and indexing by leveraging in-memory data I/O. It seems to be a perfect replacement for Elasticsearch, considering the relatively small datasets we had, the frequent data read and write, and the little tolerance on slow query. For a Redisearch document, data are stored in fields with different data types, while manual type casting should be applied upon data retrieval due to the fact that data are compressed into compact strings in memory regardless of their user-defined types. Fields (but not limited to) includes:

TextField

Records stored under TextField are sorting and stemming supported, and are tokenised by special characters. For each term hit, the score calculated by term frequency–inverse document frequency (tf-idf) is not limited to the requested field. Instead, it reflects the overall term frequency in other fields that belongs to a document.

TagField

Unlike TextField, the calculated scores of a query applying on a specific TagField is limited to the same field. TagField does not support stemming and sorting, and only tokenised by comma in default.

NumericField

NumericField supports basic ranging for filtering. Misleading by its definition, it’s easy to assume data stored under NumericField are number like integer or float, while they’re actually stored in string.

Most properties of our ads data were stored in TagField, with few in NumericField, e.g. member’s age for filtering. We used query attributes to add extra scores on several fields. For demonstration, an example of an ad request for a member can be:

ad query example

Here, lists of ad contents are stored in index named “dcard_ad_index”. We want to query qualified ads that either has id “foo” or “bar”, or those ads having location label “TW” and tags label “EC” or “beauty shopping”. For the later scenario, we add additional 100 points for those ads having “is_from_dcard” set to 1. Redisearch also support converting query string to human-interpretable query, which can be quite handy for examining the query and for debugging.

explained query

Extension

Extension allows user to build custom scorer or query expander freely. They are required to be written in C and compiled into dynamic libraries, and will be loaded at run-time. We used extension to write our custom scorer to pick up the missing scoring function such as random score that was previously supported by Elasticsearch. The custom library was compiled during the image build, and was loaded when a container was initialised.

Availability and Scalability

To serve our ads in realtime, we have Airflow scheduled to periodically update users’ feature profile and to synchronise the latest ads status. On the other hand, the ads server needs to be running 24/7 and should not be interfered by constant data write. If we have only one instance available, there’s a risk of single point of failure that will lead to disastrous consequences if the node failed.

Options like Redis Sentinel or Redis Cluster are well supported in terms of Redis nodes scalability. We’re in favour of the former due to the relatively easier setup and lesser resources it required to meet its minimal requirements. To illustrate this more, it is recommended to have at least 3 master nodes and 6 replicas (3 per master node) to setup a Redis Cluster. It also distributes data to multiple shards, while we only have relatively small dataset at the moment. We’ll not revisit our system design until the data grows to a considerable amount that a basic master-replicas setup could be an appropriate solution.

Due to the data write limitation on Redis Sentinel as the write is only allowed to be performed on the master node, we had to expose Redis Sentinel to a service so the master node can be easily identified. In the current setup, three sentinel nodes are running in the cluster for nodes discovery and for failover. It gives us the flexibility to adjust the number of Redisearch nodes according to the requests traffic, and to ensure service availability won’t be jeopardized in case of nodes failure.

system architecture for ads platform

Alert and Monitoring

Metrics for Redis monitoring are widely supported by the open sources like Prometheus. With Redis exporter running in Redisearch nodes, our Prometheus server is able to pull the latest status of each Redisearch node. For data visualisation and alerting, we integrated our services with Grafana so we can easily monitor system’s health, review nodes historical state, and get notified on Slack in case of critical failure.

Performance comparison

We came a long journey to build a production-ready ads server with Redisearch, and the result fortunately was far from disappointing. Comparing the benchmarks made by our load testing suits, we saw a significant performance improvement in the new design. Platform served with Redisearch behaved almost twice faster and was allowed almost doubled requests under the same setup. Similar result was seen in production after release, our new platform easily bear almost tripled queries on the peak traffic, and responses tripled faster than before. Nodes are constantly healthy, with low cpu usage pressure under heavy loads.

response time comparison, green: p95 served with Elasticsearch, pink: p95 served with Redisearch

Sounds good, but …

Redisearch did an excellent job to resolve our performance bottleneck, while it could be quite challenging in terms of the technical difficulty in setting up scalable nodes and the shorts on the search features. It’s inevitable to build and compile custom extension with C if more sophisticated text analysing, filtering and scoring are required, while they could be possibly supported in Elasticsearch. Better documentation and larger communities supports could make life easier to maintain a Elasticsearch cluster. In memory data storage could also be costly under the circumstances of large datasets. However, if the complexity of the deployment or the memory cost won’t be an issue, or a low search latency is a must-have, Redisearch is highly recommended.

I hope you enjoy this article and find it useful in terms of choosing appropriate search engine and building a stable and scalable platform. We’re also looking for talented developers to join our team! Besides advertising, we’re now focusing on recommendation, search and other data driven and machine learning product to improve our user experience. Join Dcard to build the first-class social network with us!

References

https://redis.io/docs/manual/sentinel/

--

--