Using TimescaleDB + Prometheus to monitor and troubleshoot CockroachDB
If you’ve read any of my older posts, you’ll know that I used to work for CockroachDB. I recently moved to join Timescale, an open-source time-series database, so if it’s not obvious, I have a soft spot for open-source databases.
Today, I wanted to share how you can combine four open source technologies (CockroachDB, TimescaleDB, Prometheus, Grafana) to super power your monitoring stack. TimescaleDB just wrote a blog post about this, so I won’t bore you with details. As a TLDR, this stack is cool because:
- Although CockroachDB has a nifty admin UI, the database downsamples and culls those metrics over time. It’s also not optimized for time-series, although it’s great for OLTP.
- CockroachDB comes with a Prometheus metrics exporter. Why not use it? Prometheus has a legit following and a strong community, so you won’t be alone.
- Prometheus doesn’t focus on long-term storage and durability, but it does support remote storage options. Rather than throwing away older data, you can store it in remote storage.
- TimescaleDB has a Prometheus adapter you can use so that writes and reads appear as if they are going to Prometheus, but are persisted to TimescaleDB.
- You can also directly query TimescaleDB using SQL if you want. This means that you don’t have to just store metrics scraped from CockroachDB. You can store other metrics as well and cross-correlate them to analyze your infrastructure (or whatever else you are monitoring).
- Added bonus. Grafana can talk to either Prometheus or TimescaleDB to query metrics. Yay, open source!
Using this stack means that the insights you can gain from your metrics can extend beyond whatever storage period you set for Prometheus metrics. You can go back in time to see how usage has changed month over month for the last year. You can correlate CockroachDB metrics with other infrastructure metrics to analyze why service outages occurred. The sky is unlimited, as long as you have the data stored!
Ok, obviously, I’m not saying that using this stack is the only way you can store long-term Prometheus metrics, but it is an interesting one. I’ll leave the bike-shedding for a later (and un-scheduled date).
Alright, so how does this actually work? Let’s take a look at a diagram.
So, want to get started? There are some nuances, and although I’m not out to write a tutorial, here are some high level steps and resources to get you started.
Quick, rough, not-detailed tutorial
I actually spun up most of these services on separate machines just to prevent resource contention if I eventually scale up the workload.
- Get a server. Install CockroachDB. I cheated and just spun up one node.
- Now, you want to set Prometheus up so that it scrapes from CockroachDB’s metrics exporter (/_status/vars). Follow the steps here.
- At this point, you can decide to either follow the official TimescaleDB docs using Docker or run the tools you need yourself. I was using a bunch of machines that didn’t have Docker installed, so I went with a more hands-on approach, detailed in the following steps:
- Install TimescaleDB. You need to have PostgreSQL 9.6+. I found these instructions helpful.
Pro tip: if you don’t know where your postgresql.conf file is, you can run `SHOW config_file;` in psql.
- Now you need the pg_prometheus extension. Put it on the same machine as your TimescaleDB instance, since it’s another extension you need to load into PostgreSQL.
- You can choose to spin up another server for your Prometheus to TimescaleDB adapter. Since I used multiple machines, I did have to create some flags so that the adapter would write to my TimescaleDB instance. You can view the flags available to you in the command line when you run the adapter.
- Make sure you configure Prometheus to use your TimescaleDB instance as remote storage.
- Finally, Grafana. I set up Grafana to talk to Prometheus, since CockroachDB already offers templates that you can use. Regardless, Prometheus knows to query TimescaleDB through the adapter to serve up data to Grafana. You can also set up Grafana to query TimescaleDB directly. P.S. You can use psql to interact with TimescaleDB directly.
That’s it from me! I know that wasn’t super detailed, but I tried to highlight the top concepts and things you need to do to get started. Please comment and let me know if you have issues. Getting the networking and ports set up can be tricky.
It’s amazing what you can achieve by combining open source software. Cheers!