Adaptive Throttling of Indexing for Improved Query Responsiveness
How to reactively control indexing rate to preserve query performance for bursty traffic
Search systems generally make use of a datastore for storing information. This datastore is responsible for providing searching capabilities and persisting all updates. At Myntra, we use a Lucene backed inverted index to provide search capability. Solr and Redis clusters are the data stores which power Myntra’s search and product pages respectively. These data stores are under constant read load (serving products to users) and write load (indexing updates to products/documents).
The data stores are queried by different microservices to retrieve documents matching a user query. Also, the index is created and continuously updated by the indexing pipeline. Our indexing pipeline consists of Kafka Queues linked to Apache Storm topologies, which make frequent updates to the documents.
One of the challenges with an index backed system is to scale query performance during indexing. As with real time systems like Search, users start to drop off when response times increase.
The current setup makes use of Solr in cloud mode. This setup suffers from poor read performance during indexing due to costly segment merges and frequent Solr searcher drops.Also, different kinds of updates have different delay tolerance. For instance, inventory and discount updates have high throughput and require stronger consistency guarantees, whereas, catalog updates can tolerate some delays. Hence, we try to control the periods of indexing manually during sale events which is inflexible. A typical complaint is to allow the pricing teams to react faster to changing user, traffic and buying patterns.
The main idea is to have a bias towards read performance as compared to write performance. We set out to develop a throttling system which will reactively throttle the rate of indexing, basis health of the data stores w.r.t. read performance. Also, different kinds of updates will be throttled at different rates.
- The basic idea is to gather read performance metrics from data stores, calculate the permitted rate of updates and then enforce that rate in the indexing system.
- The new system incorporates a throttling engine which polls the data stores periodically to get the latest response time or other similar metrics. In our instance the throttling engine queries Solr to get median response times for the cluster.
- Once we have the read performance metrics, we calculate the permitted rate of updates a given datastore can process without adversely affecting its read performance. This rate is pushed to a permitted rate store at regular intervals.
- The Kafka Spout implementation is enhanced to enable throttling. The various spout instances read the permitted rate from permitted rate store, and regulates the emission of tuples to maintain this rate.
- The permitted rate is allocated on the basis of amount of pending messages and priority for different kinds of updates.
- This ensures that the read performance is not degraded during heavy indexing by delaying the processing of updates. In extreme cases like prolonged periods of very low permitted rate manual intervention is desired. Thus, alerts are setup on indexing rates and response times.
The calculation of permitted rate of updates using the aggregated metrics is an interesting problem. A test setup is used to mimic business as usual(BAU) and high revenue days(HRD) traffic to experiment with various algorithms.
The test setup consists of a Solr machine with an Apache Storm topology continuously writing updates to it. A vertical load test service is used along with a large file of sample user queries to mimic the read traffic. We analyse the performance of querying and indexing for the following algorithms.
1. In-house Limit Algorithm
The In-house Limit Algorithm works on basic mathematical functions which we tuned to suit the current production setup.
- Acceptable average and maximum response times are determined for the system using historical data.
- The current response time is used, along with average and maximum values for the system, to calculate a load factor between 0 and 1.
- This load factor is then converted into a percentage of tuples that should be emitted.
2. AIMD Limit
AIMD Limit algorithm stands for Additive Increase Multiplicative Decrease Limit Algorithm. It calculates a limit by doing an additive increment (
new_limit = prev_limit+1) as long as the response times are under a threshold. Also, it does a multiplicative decrement when the response times exceed the threshold (
new_limit = prve_limit*backoff_ratio where back-off ratio lies between 0.5 and 1.0).
3. Gradient2 Limit
Gradinet2 limit algorithm calculates average response time over a short time window and a long time window. A divergence in these averages is used as an indicator of change in read load on the system. At this point the algorithm aggressively reduces the permitted rate of updates.
4. TCP Vegas Limit
TCP Vegas Limit algorithm is developed from TCP congestion avoidance algorithm that emphasises packet delay, rather than packet loss. The main principle is to estimate a bottleneck queue using change in response times, and continually update the limit to keep the bottleneck queue in control.
It makes use of current response times, response time when system is under no load and limit calculated at previous time step to calculate a bottleneck queue.
This queue size is used along with number of inflight updates, number of failures and other parameters to adjust the limit continuously.
At every time step, the previous limit is adjusted linearly (incremented or decremented by log of current value). This algorithm is highly reactive, as it adjusts the limit at every time step.
All these algorithms are tested to find how they perform for our use case.
Performance Test Results
We share below the performance of various algorithms in the test setup for HRD traffic.
AIMD and Gradient2 both performed poorly in the test, the response times were higher than other algorithms. Also the test for these algorithms had to be stopped early as the rate of indexing was very slow.
TCP Vegas takes less time to process all updates as well as maintains a similar and sometimes lower response times.
On deeper analysis, TCP Vegas performs better than in-house algorithm as it calculates the rate of tuples as compared to percentage of tuples. Also, it reacts to changes in response times more proactively than AIMD Limit, which only reduces the limit when a threshold is crossed. Gradient2 Limit performed worse than TCP Vegas as it is very aggressive in reducing the limit.
Hence, TCP Vegas is selected as the limit calculation algorithm.
The adaptive throttling system is live for Search systems of both Myntra and Jabong. It has reduced the manual monitoring effort on sale days and on-call support during bursty indexing periods like discount expiry. It has also helped us to prioritise inventory and discount updates over other updates.
The new system is able to process 15 GB of messages in an hour under high-revenue-day read load (QPS of ~5k).The performance gains means an improved experience on sale days as customers can see more relevant prices. This is possible due to better targeting via frequent refresh of price. This resulted in an uptick in revenue and an improvement in the conversion funnel. We also see fewer drop-offs on checkout while keeping leakages under control.
We are working towards improving as well as incorporating the adaptive throttling system across various systems at Myntra.