snorql — diagnosing databases like a boss

Lokesh Devnani
engineering-udaan
Published in
7 min readMar 2, 2022

Imagine playing with your furry friends on a fine evening after a long productive day at work.

Suddenly your phones rings, you pick up your phone thinking...

“This must be Swiggy, I cannot wait for those burgers to be here!!”

But the world had other plans for you. You receive a couple of slack notifications followed by a call from the escalation team.

  • The site has been loading forever for a large percentage of users
  • Latencies have shot up
  • Timeouts going through the roof and

Now everyone is looking at you to do some wizardry!!

You have a magic toolbox of SQL DMV queries scattered around in your consoles. Struggling to figure out which query is going to tell you the reason for the slowness while you are staring at the computer screen.

The room starts feeling like a bomb diffuse site, with everyone’s eyes on you. Every second feels like an hour as queries are piling up, timeouts are increasing.

With your toolkit — you keep trying to look for the right queries, orchestrating runs, correlating data looking at resource health metrics, logs, active queries, locks, transactions, blocking queries & trying to make sense.

Eventually, it turns out that it was some DDL index operation hogging the CPU & blocking current transactions

You promised that once this headache is resolved, you will clean your SQL cabinet, make investments for improving observability & build systems that make your life easier. Just like 100 other times, you had reminded yourself the same — but this time it is getting to your nerves.

Fuelling Motivation

snorql took off as a project in udaan, when a few of us were just fed up with the countless number of scratch files and notes with nothing but SQL queries.

Each one is slightly different from the other. And yet, all of those SQL queries were written (in most cases, copied 😅) for a specific kind of monitoring / debugging operation on the SQL server.

“Problems are nothing but wake-up calls for creativity” — Gerhard Gschwandtner

At every debugging session due to SQL performance degradation, we realized that we were exchanging queries from our personal repository and then orchestrating some set of queries to identify certain patterns.

If this is your life — I understand your pain

Solution #1

Our first solution was to create a notion page having a collection of all the commonly used SQL queries used for performance monitoring, diagnostics, gotchas, etc.

This certainly did help with the knowledge transfer and making sure everyone has access to commonly used queries, but news flash!! It had some problems -

  • As the number of queries grew, it became increasingly confusing to look for the right query.
  • Requires spawning a console and copy-pasting a bunch of queries from doc to console for each debugging session.
  • The result sets from the queries are not always perfect for quick analysis and often require extensive eyeballing of lots of data. This can be made simpler by post-processing the result set in a high-level programming language like Java.
  • No easy way to provide key actionables/recommendations based on the query output.

Solution #2

snorql — The SQL Diagnostician we all deserve!

Given all the limitations of maintaining a central repository of queries, we decided to build a dashboard through which we can monitor common metrics on demand.

We started with Active Running Queries and then kept adding more.

SQL Diagnostics & Monitoring Dashboard at udaan built using snorql

We could immediately see some patterns that could neatly fit into well-defined interfaces. We started abstracting out metric inputs, outputs, configs, connections & even execution contexts and separating it from concrete implementations of the same.

These abstract classes and interfaces served as a blueprint for all the queries that we added to the dashboard. With that in place, adding a new metric was as easy as writing an implementation of the interface, and you have another executable metric. This led to the inception of snorql.

Inception of snorql

What is snorql anyways?

snorql is an open-source project developed at Udaan aimed at monitoring & diagnosing database-related problems using SQL metrics.

It is an extensible & easy to use framework through which you can achieve all your monitoring and diagnostic goals regardless of what your database stack looks like. It provides a solid skeleton for laying down your SQL server metrics collection & provides flexibility in terms of choosing the database, database library, connection, metrics, configuration & storage engine. With this foundation, you can choose from a set of metrics that are already present or rollout a new implementation of metric that you think is useful.

We sub-divided it into two sub-modules:

1. snorql-framework

This contains a set of interfaces, abstract classes, and the glue logic which serves as the foundation of any metric that needs to be added. This is independent of any specific SQL database, driver, or flavor — and can be used as an underlying metrics framework for any relational database due to its extensive nature. This does not come with any metrics as we wanted to keep it lean without introducing any specific dependencies.

Now, we never want to be in a state where queries are being piled up blocked by certain long-running sessions & having no clue of what is happening to the database. So, we implemented BlockedQueriesMetric using snorql-framework.

The framework gave us a structured approach towards building a metric encapsulating all the details starting from preprocessing, executing, post-processing to deriving inferences all packed together to return an insightful & actionable result.

This was an easy-breezy task as the framework provides the outline for implementing any metric, see here. We added necessary queries & post-processing logic as follows and Voila!

sql-monitoring-conf.json for blocked query metric
Blocked query metric implementation

With blocked queries metric as part of our metric repository, we added a UI on top of it. Never again did we struggle to figure out the blocking queries hierarchy.

Blocked queries visualization powered by snorql

System degradations due to blocking queries went down, and we got a box of doughnuts for this contribution from the developers.

2. snorql-extensions

While you can build your own metrics, we have spent some time packaging some of it we found really useful into a module snorql-extensions.

It is built on top of snorql-framework and contains implementations for commonly used metrics for debugging/monitoring purposes specifically for SQL server.

While SQL server is one of the databases that just works out of the box — querying system views for meta-information or debugging use cases can be a grueling task. Thus sophisticated queries are required for specific use cases.

This extensions library uses highly customized DMV queries and stored procedures to query data from SQL server, uses post-processing for filtering/cleaning the data, and adds recommendations on top of it all to make the insights actionable.

Some of the prominent metrics included are -

Performance Metrics

  • Active Queries
  • Blocked Queries
  • Long-running Queries
  • IndexStats
  • ActiveDDL

Storage Metrics

  • Database Size
  • Database Tables Size
  • Database Index
  • Database Growth
  • Table Size
  • Table Unused Index
  • Persistence Version Store

The Aftermath:

After we built our own savior “snorql”, developers were empowered enough to diagnose SQL issues early on that stopped causing havoc. Awareness to follow best practices increased and we could once again go back to playing with our pets carefree.

We realized that the problem we were able to address using snorql is common across organizations, and decided to open source these modules so that more folks can benefit from our learnings and contributions.

Future of snorql

Our vision is to make snorql the only solution you’ll ever need for monitoring & diagnosis of any relational database. We are consistently researching & analyzing for potential capabilities, features, or ideas that can help developers have better visibility of their SQL database.

We are in a process of shipping a few key capabilities in the upcoming months like:

  • Historical metrics

By laying down a persistence framework for the storage of metrics, we want to extend querying capabilities by storing the result set of a query output periodically and allowing users to revisit past data. This will help in postmortems, triaging issues & finding anomalies.

  • Alerting

Capturing metrics is a ladder on top of which we can add proactive alerting rules. We are currently working on an alerting framework through which alerting rules can be configured based on which certain alerts will be triggered.

If you have ideas, feature requests, or suggestions that can help with this vision — please feel free to raise an issue on GitHub with the appropriate label.

Also, we love community contributions and would encourage folks to contribute in any way possible, you can find the contributing guidelines on our Github repo.

--

--