How to deploy HA PostgreSQL cluster on Kubernetes

Creating a high available PostgreSQL cluster always was a tricky task. Doing it in the cloud environment is especially difficult. I found at least 3 projects trying to provide HA PostgreSQL solutions for Kubernetes.


Patroni is a template for you to create your own customized, high-availability solution using Python and — for maximum accessibility — a distributed configuration store like ZooKeeper, etcd or Consul. Database engineers, DBAs, DevOps engineers, and SREs who are looking to quickly deploy HA PostgreSQL in the datacenter — or anywhere else — will hopefully find it useful.


Crunchy Container Suite provides Docker containers that enable rapid deployment of PostgreSQL, including administration and monitoring tools. Multiple styles of deploying PostgreSQL clusters are supported.


Stolon is a cloud native PostgreSQL manager for PostgreSQL high availability. It’s cloud native because it’ll let you keep an high available PostgreSQL inside your containers (kubernetes integration) but also on every other kind of infrastructure (cloud IaaS, old style infrastructures etc…)

Nice diagram in the repo and few guest posts [1] [2] on convinced me to try crunchy-containers. But after some time spent with it I changed my mind.

I don’t want to say that something is broken or badly designed. But, for me, it looks like a manually installed PostgreSQL packed in a docker containers. It doesn’t feel like it was made for clouds.

So, I went with a stolon. After several iterations of installs/destroys I took the statefulset example and created helm chart.

If you want to read more about stolon — check out this introduction post from the author.

Below I will go through the installation process and how to test failover of the cluster. I assume the installation using my helm chart.

Stolon architecture

(taken from the stolon introduction)

Stolon is made up of 3 main components:

  • keeper: it manages a PostgreSQL instance converging to the clusterview provided by the sentinel(s).
  • sentinel: it discovers and monitors keepers and calculates the optimal clusterview.
  • proxy: the client’s access point. It enforces connections to the right PostgreSQL master and forcibly closes connections to unelected masters.

Stolon uses etcd or consul as a main storage for cluster state.


During install, a few things will happen.

First of all 3 node etcd cluster will be created using statefulset. Stolon-proxy and stolon-sentinel will also be deployed. Singe time job for cluster initialisation will wait until etcd cluster became available.

The chart will also create 2 services:

  • stolon-proxy — service from official example. It always points to the current master and should be used for writes.
  • stolon-keeper — Stolon itself do not currently provide any solution for balancing reads. But Kubernetes services can handle it. for us. This one will balance reads between all keeper pods.

After everything will be in the RUNNING state we can try to connect.

We can deploy services with type NodePort for simplify connection. Use separate terminals to connect to the master and slave services. During this post, I will assume that stolon-proxy service (RW) is exposed to port 30543 and stolon-keeper service (RO) is exposed to 30544

Connect to the master and create test table

Connect to the slave and check the data. You can try to write something to make sure that requests are handled by slave.

After everything works, let’s test the failover.

Testing failover

This case is covered in the statefullset example in the official repository. In short to simulate death of the master we need to kill statefulset and then delete the current master pod. Statefulset needs to be deleted since otherwise, it will recreate pod before the sentinel notice that it’s dead.

After this, we can see in the sentinel logs that new master was successfully elected.

Now if we redo the previous command in both psql sessions we should see something like this.

Kubernetes service will remove failed pods and balance requests only between live pods. So, new read sessions will be routed to healthy pods.

Afterward we need to recreate our statefulset. The easiest way to do it is by upgrading deployed helm chart.

2. Simulate random pods death with chaoskube

Another good way to test the resilience of the cluster is to use chaoskube. Chaoskube is a small service which periodically kills random pods in the cluster.

It also could be deployed using helm charts.

This will run chaoskube which will delete one pod every five minutes. It will select pods with label release=factual-crocodile , but will ignore etcd pods.

After few hours of such testing my cluster was still consistent and worked pretty stable.


I’m still trying stolon on my staging servers. But I’m pretty happy with it so far. It really feels like a cloud native service. Easily scalable and with good automatic failover.

If you interested in trying it — check out the official repository and my chart.

Like this article?

Click the 💚 below so other people will see it here on Medium.
Subscribe to get new stories delivered to your inbox or follow me on twitter.

Originally published at on March 6, 2017.

Software/DevOps engineer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store