Apache Druid: Query Level Monitoring via Request Logging

Published in

PlaySimple Engineering

5 min readAug 30, 2024

Motivation

Individual query level monitoring is an integral requirement of any database system. A system that allows for logging and monitoring of individual queries is useful for:

Identifying Slow Queries: By monitoring queries, you can pinpoint slow or inefficient queries that might be degrading the performance of the database.
Bottleneck Identification: Monitoring can reveal bottlenecks in the database, such as lock contention or deadlocks caused by certain queries.
Growth Prediction: By analyzing query patterns and their resource usage, you can predict future growth and plan for capacity upgrades.
Query Optimisation: By monitoring, you can identify and optimize frequently run queries, improving overall response times and user experience.

Quick summary

To set up Individual query level monitoring: one can configure Request logging in some way and set up an external service to read the log added by request logging.

Pre requisite knowledge

Druid does not provide a system directly for query level monitoring, but it does give you the tools to set up query level monitoring using Request logging. All Apache Druid services that can serve queries can also log the query requests they process. Request logs contain information on query metrics, including execution time and memory usage. You can use information in the request logs to monitor query performance, determine bottlenecks, and analyze and improve slow queries.

Learn more about Request Logging here: Request logging | Apache® Druid

Emitters are objects that emit metrics, alerts and request logs when configured. You can read more about emitters here.
For this example we will be discussing how to use the emitter style of request level logging. Our druid deployment will be via Helm on a K8s cluster.
All druid emissions follow the following structure:

[
    {
        "feed": "metrics",
        "timestamp": "2022-01-06T20:32:06.628Z",
        "service": "druid/broker",
        "host": "localhost:8082",
        "version": "2022.01.0-iap-SNAPSHOT",
        "metric": "sqlQuery/bytes",
        "value": 9351,
        "dataSource": "[wikipedia]",
        "id": "56e8317b-31cc-443d-b109-47f51b21d4c3",
        "nativeQueryIds": "[2b9cbced-11fc-4d78-a58c-c42863dff3c8]",
        "remoteAddress": "127.0.0.1",
        "success": "true"
    },
    {
        "feed": "myRequestLogFeed",
        "timestamp": "2022-01-06T20:32:06.585Z",
        "remoteAddr": "127.0.0.1",
        "service": "druid/broker",
        "sqlQueryContext":
        {
            "useApproximateCountDistinct": false,
            "sqlQueryId": "56e8317b-31cc-443d-b109-47f51b21d4c3",
            "useApproximateTopN": false,
            "useCache": false,
            "sqlOuterLimit": 101,
            "populateCache": false,
            "nativeQueryIds": "[2b9cbced-11fc-4d78-a58c-c42863dff3c8]"
        },
        "queryStats":
        {
            "sqlQuery/time": 43,
            "sqlQuery/planningTimeMs": 5,
            "sqlQuery/bytes": 9351,
            "success": true,
            "context":
            {
                "useApproximateCountDistinct": false,
                "sqlQueryId": "56e8317b-31cc-443d-b109-47f51b21d4c3",
                "useApproximateTopN": false,
                "useCache": false,
                "sqlOuterLimit": 101,
                "populateCache": false,
                "nativeQueryIds": "[2b9cbced-11fc-4d78-a58c-c42863dff3c8]"
            },
            "identity": "allowAll"
        },
        "query": null,
        "host": "localhost:8082",
        "sql": "SELECT * FROM wikipedia WHERE cityName = 'Buenos Aires'"
    },
    {
        "feed": "myRequestLogFeed",
        "timestamp": "2022-01-06T20:32:07.652Z",
        "remoteAddr": "",
        "service": "druid/broker",
        "sqlQueryContext":
        {},
        "queryStats":
        {
            "query/time": 16,
            "query/bytes": -1,
            "success": true,
            "identity": "allowAll"
        },
        "query":
        {
            "queryType": "scan",
            "dataSource":
            {
                "type": "table",
                "name": "wikipedia"
            },
            "intervals":
            {
                "type": "intervals",
                "intervals":
                [
                    "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
                ]
            },
            "virtualColumns":
            [
                {
                    "type": "expression",
                    "name": "v0",
                    "expression": "'Buenos Aires'",
                    "outputType": "STRING"
                }
            ],
            "resultFormat": "compactedList",
            "batchSize": 20480,
            "limit": 101,
            "filter":
            {
                "type": "selector",
                "dimension": "cityName",
                "value": "Buenos Aires",
                "extractionFn": null
            },
            "columns":
            [
                "__time",
                "added",
                "channel",
                "comment",
                "commentLength",
                "countryIsoCode",
                "countryName",
                "deleted",
                "delta",
                "deltaBucket",
                "diffUrl",
                "flags",
                "isAnonymous",
                "isMinor",
                "isNew",
                "isRobot",
                "isUnpatrolled",
                "metroCode",
                "namespace",
                "page",
                "regionIsoCode",
                "regionName",
                "user",
                "v0"
            ],
            "legacy": false,
            "context":
            {
                "populateCache": false,
                "queryId": "62e3d373-6e50-41b4-873b-1e56347c2950",
                "sqlOuterLimit": 101,
                "sqlQueryId": "cbb3d519-aee9-4566-8920-dbbeab6269f5",
                "useApproximateCountDistinct": false,
                "useApproximateTopN": false,
                "useCache": false
            },
            "descending": false,
            "granularity":
            {
                "type": "all"
            }
        },
        "host": "localhost:8082",
        "sql": null
    },
    ...
]

As can be seen above : the emissions have a key called “feed”. A metric emission feed has the value “metrics” and when you use emitter for request logging you must also specify druid.request.logging.feed as the feed used for request log emission.

Steps to implement query level monitoring:

Enable request level logging for Druid services in your values file:
Add the following lines to your values.yaml:


druid_request_logging_type: emitter
druid_request_logging_feed: requestLog <set this to whatever request log feed you desire>

By adding these configurations to your services, you will start recieving request level logs for those services. For individual query monitoring enable request level logs for the Broker service as it is the point which emits the request log for the entire query and not subqueries.

2. Set up your druid emitter:
Now your druid request logs will be sending request logs in whichever way the emitter is configured. For this example we will be using an http emitter for ease of use.

druid_emitter_http_recipientBaseUrl: <URL for your custom server to receive the query monitoring>
druid_emitter_http_flushMillis: "30000"
druid_emitter: http

3. Set up your query monitoring server:
In this step we will define the server that receives the request log emission. One can define this in any way they desire, the only concern to keep in mind is that your druid deployment should be able to reach that URL(in case you are using private networks etc).
We will be deploying this as a simple server deployment in the same namespace with a service to reach the deployment pod. Example files below:

# Server.py
from http.server import BaseHTTPRequestHandler, HTTPServer
import json
class RequestHandler(BaseHTTPRequestHandler):
    def _set_response(self):
        self.send_response(200)
        self.send_header('Content-type', 'text/html')
        self.end_headers()
    def do_GET(self):
        print("Received GET request")
        self._set_response()
        self.wfile.write("GET request received".encode('utf-8'))
    def do_POST(self):
        content_length = int(self.headers['Content-Length'])
        post_data = self.rfile.read(content_length)
        print(f"Received POST request with data: {post_data.decode('utf-8')}")
        self._set_response()
        self.wfile.write("POST request received".encode('utf-8'))
def run(server_class=HTTPServer, handler_class=RequestHandler, port=8080):
    server_address = ('', port)
    httpd = server_class(server_address, handler_class)
    print(f'Starting server on port {port}')
    httpd.serve_forever()
if __name__ == '__main__':
    run()

# Dockerfile
# Use the official Python base image
FROM python:3.9-slim
# Set the working directory in the container
WORKDIR /app
# Copy the current directory contents into the container at /app
COPY . /app
# Expose port 8080
EXPOSE 8080
# Run the server
CMD ["python", "server.py"]

# k8s Deployment file
apiVersion: apps/v1
kind: Deployment
metadata:
  name: python-server-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: python-server
  template:
    metadata:
      labels:
        app: python-server
    spec:
      containers:
      - name: python-server
        image: your-dockerhub-username/python-server:latest
        ports:
        - containerPort: 8080

# k8s service file
apiVersion: v1
kind: Service
metadata:
  name: python-server-service
spec:
  selector:
    app: python-server
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080
  type: LoadBalancer

What if your Druid Setup is already using an emitter?

It is quite possible that your Druid setup is already using an emitter such as statsd or prometheus emitter
To ensure that the above continues to work so effectively we need a way to have metrics emitted via the preconfigured emitter and request logs emitted via http or logger emitter
Here we introduce you to : switchingemitter which initializes and emits to multiple emitter modules based on the event feed.
By specifying the request logging feed using druid.request.logging.feed, and by using the switching emitter we are able to send request logs to an http location by using http emitter for requestLogFeed.

 # Configure metrics and anything else to go to statsd, configure requestlog to go to http
druid_emitter: switching
druid_emitter_switching_emitters: '{"metrics":["statsd"], "requestLog":["http"]}'
druid_emitter_switching_defaultEmitters: '["statsd"]'

Any caveats?

Because druid internally uses SQL queries to run its internal data, it was not possible to separate the SQL queries of Druid system and SQL queries by actual clients. Hence when viewing the query request logs, SQL queries sent by clients are not readily identifiable.

Conclusion

The above set-up will allow for the defined query monitoring server to receive request log emissions from Druid. You can then inspect the payload and add it to an external source. We have tried to focus on explaining the building blocks for this while keeping the example implementation flexible for individual needs. We hope this will help teams who are self-hosting open source Druid.