Query Laning in Apache Druid

Kyle Hoondert
4 min readMay 17, 2022

--

Introduction

With Apache Druid growing to become the database for modern analytics applications, the kinds of queries developers choose Druid for have grown as well. We can sometimes become too successful for our own good!

Apache Druid users often use it to power mission-critical applications as well as exploratory and reporting workloads. These different types of workloads can each have very different quality-of-service expectations, and often one workload must be protected from the others. This is the use case query laning was created to help with.

With Druid 0.18.0, query laning was introduced as a tool to protect the Broker service from resource exhaustion, allowing critical workloads a configurable QOS.

Understanding query laning starts with understanding Druid’s fan-out query architecture.

The Broker process is the start and endpoint for all queries. Each query is executed in a Jetty thread. The number of configured threads (druid.server.http.numThreads) determines the number of concurrent queries which can be run by the Broker. Once the number of configured threads is exceeded, queries are queued by default. The Broker can also be configured to reject excess queries. Rejected queries receive an HTTP 429 response (Too Many Requests)

Enabling Broker thread protection

The Jetty thread pool has a queue associated with it (druid.server.http.queueSize). By default, it is unbounded and linked to OS limits. Setting the Broker configuration option druid.server.http.enableRequestLimitsor druid.query.scheduler.numThreads disables Jetty queuing. Once the configured number of threads has been consumed, additional queries will be rejected by the Broker with an HTTP 429 response.

This thread pool protection can also be configured for separation of resources into different configurations called lanes. Each lane can have different limits set, allowing different workloads to access different quantities of resources. With laning, the Broker examines and classifies a query for the purpose of assigning it to a ‘lane’.

One important note about query laning: it does not provide pre-emption. This means accepted queries cannot be interrupted to allow other queries to run

Options for laning

Query laning can be configured in 3 different ways using the configuration option druid.query.scheduler.laning.strategy. The 3 options are: none, manual and hilo

None

The none strategy (which is the default) does not configure different lanes to be used. This configuration could be applied in a situation where Broker thread protection is enabled, but configurations for different query workloads are not required.

An example of no laning strategy (broker runtime.properties)

druid.server.http.numThreads=45
druid.query.scheduler.numThreads=40
druid.server.http.enableRequestLimit=true

Manual

The manual laning strategy can be used to create one or more laning configurations, and is best suited when external applications are capable of deciding the appropriate lane. Query lanes are assigned to queries using the Druid query context parameter. For example:

{ "query": "SELECT * FROM wikipedia WHERE channel = '#en.wikipedia'",
"context": {
"lane": "speedyLane"
}
}

This type of laning setup can be configured with the parameters druid.query.scheduler.laning.strategy=manual and druid.query.scheduler.laning.lanes.{name}

An example of a 3 lane configuration of small, medium and large lanes might be:

druid.server.http.numThreads=45
druid.query.scheduler.numThreads=40
druid.server.http.enableRequestLimit=true
druid.query.scheduler.laning.strategy=manual
druid.query.scheduler.laning.lanes.large=20
druid.query.scheduler.laning.lanes.medium=15
druid.query.scheduler.laning.lanes.small=5

High/Low

The hilo laning strategy assigns a lane to queries using priority. This laning strategy splits queries with a priority below zero into a low query lane, automatically. Queries in the low lane are not guaranteed the capacity allocated with the druid.query.scheduler.laning.maxLowPercent parameter. Query priority can be set using either the context parameter, or utilising an automated prioritization strategy discussed in the next article. An example of a query with a low priority manually configured:

{ "query": "SELECT * from wikipedia",
"context": {
"priority": -10
}
}

An example of a hilo laning configuration might be:

druid.server.http.numThreads=45
druid.query.scheduler.numThreads=40
druid.server.http.enableRequestLimit=true
druid.query.scheduler.laning.strategy=hilo
druid.query.scheduler.laning.maxLowPercent=20

Some examples of query behaviour using hilo laning:

Example 1:

While the total number of queries (8) does not exceed the scheduler threads (10), one query is rejected because it exceeds the capability of the Low lane (10 threads * 20% maxLowPercent = 2)

Example 2:

In this example the number of queries (11) exceeds the scheduler threads (10) leaving one query rejected as expected — this query is rejected from the Low lane as High priority queries are prioritized.

--

--