Highly Available time-series database on Kubernetes

Rudolf Ratusiński
Aug 19 · 4 min read

Introduction

Event sourced applications, revisions, metrics can generate enormous amounts of data. Many tend to use relational database for that purpose but as soon as the database tables grow significantly, the performance issues arise. This is where time-series databases shine.

If you’re asking yourself “What’s that? Do I need it?” I recommend reading this fantastic blog post to better understand this topic:
https://blog.timescale.com/what-the-heck-is-time-series-data-and-why-do-i-need-a-time-series-database-dcf3b1b18563/.

After reviewing many options, I decided to go with TimescaleDB — one of the most mature solutions, that can be deployed on your own server as it’s a PostgreSQL extension. Why? mainly because of its full behind the scenes handling of time-series data, where you just need to install a module into PostgreSQL, run a single query to enable it and bam! — PostgreSQL is now a Time-series database with exactly the same SQL syntax (with a few bonus statements added).

Cloud solution

TimescaleDB also offers a SaaS cloud solution available, but as with every fully managed service, it comes with a lengthy list of both pros and cons. Pros are obvious if you are a big company with plenty of money. They are also obvious if you don’t have enough experience with managing your own clusters — this is going to be your choice, although it will drain your wallet pretty quickly when you’re just exploring or if you’re a small start-up, mostly because of very limited “dev” plan.

For the project I was working on, a Kubernetes deployment of the development Stolon cluster has brought more than 1000 USD savings per month. It is largely thanks to the fully flexible way of configuring storage, CPU, and RAM independently from each other in your Helm chart. Whereas the SaaS Cloud solution makes it mandatory for you to choose one of the predefined RAM/HDD/CPU plans, which forces you to overpay for one or more attributes (eg. RAM and CPU when you only need more HDD storage).

There might be a case where it will perfectly fit your needs that’s why I recommend to check it out. It’s a quality product with free trial included:

https://www.timescale.com/cloud

Which cluster manager to choose?

From a few possible solutions available including Crunchy and Patroni i have chosen Stolon - highly available PostgreSQL cluster on K8s which is described as

.. a cloud native PostgreSQL manager for PostgreSQL high availability. It’s cloud native because it’ll let you keep an high available PostgreSQL inside your containers (kubernetes integration) but also on every other kind of infrastructure (cloud IaaS, old style infrastructures etc…)

It’s architecture model of Proxy, Keeper and Sentinel pods is simple, yet robust and easily manageable

Stolon PostgreSQL architecture model

it’s for PostgreSQL, but what about TimescaleDB?

Out of the box Stolon does very well what it is designed for, but only when used with vanilla PostgreSQL. Since TimescaleDB is an extension, I came up with an idea of creating customized Stolon cluster. To make it work, I created a fully functional Docker image, which I am happy to share with you now. It also comes along with the Dockerfile so you can build your own image, customize it or just read it to have better understanding of how it was done:

https://hub.docker.com/r/rudolfratusinski/timescaledb-stolon

Docker image is compiling Stolon binaries on top of the TimescaleDB server, which can then be used in a Helm chart.

Instructions will walk you through running it on your own Kubernetes cluster on Google Cloud Platform (I am assuming you have a basic understanding of Kubernetes, GCP account set, kubectl, gcloud, helm installed etc.):

Preparing the Helm chart

I want this example to be as simple as possible, although I believe that using Helm charts for every Kubernetes deployment is a must. It’s a clean and simple way of managing, configuring, and deploying your applications and that’s the reason why it’s also used in this project.

Let’s start from cloning helm charts repository. The most important part for us is the stolon chart: https://github.com/helm/charts/tree/master/stable/stolon

Update values.yaml with following setup

Few notes:

Creating K8s cluster and installing Helm chart

Now we’re ready to create K8s cluster and deploy Helm chart. If you want to deploy it on an existing cluster, feel free to omit steps of creating cluster and configuring tiller.

Below I provide all the instructions to get it running and test how it works:

After setup is done, try to remove one of the Keeper pods. Stolon Sentinels will delegate new Master from remaining Keepers, create new Keeper pod, replicate all the data and add it to the pool of available Keepers bringing balance to the cluster.

Please let me know in the comments if you have any feedback or questions.

Thank you for reading and click Clap below if you liked it!

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade