DBaaS in 2024: Which PostgreSQL operator for Kubernetes to select for your platform? Part 2 — Bitnami Helm Chart, DIY solutions as Baseline

David Pech
7 min readJun 11, 2024

--

Here’s the previous part of the series describing the general DBaaS approach and considerations.

Let’s start small. Let’s build something just to get a few early adopters running our new shiny platform. They are geeks, they can handle basic Postgres administration, right? Not again, please. After the first month, you’ll get 10–20 custom-made dev Postgres instances running without any governance. After 2 months, someone will say: “Hey this Postgres thing is so cool, but we need to go to production now (or yesterday). We just need a little help with security, observability, H-A, backups, …”

When to consider DIY:

  • I would advise not to consider this except for some very niche experimentation (PG extensions, benchmarking)
  • You are doing your company a professional disservice this way. Just go straight on and use any “product” instead of baking your own solution.

Problems:

  • Postgres is not a “product”. Postgres is a “linux kernel”, that needs a lot of tooling and experience around. As with kernel, you don’t run it yourself, you find a “linux distribution” to solve a lot for you.
  • Please still for some reason try to find “the simplest to implement” solution possible. With Postgres, this is just not possible.

Do-It-Yourself solution (please don’t try this!)

It is so strange to me, how large the demand is for any DIY template and “simple” solution. Just googling around a little bit — consider this: https://www.digitalocean.com/community/tutorials/how-to-deploy-postgres-to-kubernetes-cluster

What a nice step-by-step tutorial, right? Community-driven on DO, right? In short, you provision a single PVC (volume) and run a Deployment with 3 replicas. You provision a Service to all 3 replicas to connect your cluster. You can even see that 3 Pods are up-and-running after a while. All right!

(!!!) PVCs for the Deployments are shared. This means, that you have just connected a single volume to 3 different processes running on the same machine sharing it. Behind the scenes — Postgres has protection — it creates a PID file noting what is the process ID of a started application. Guess what — in containers, the starting process has PID=1. So all Postgres instances think they are the correct process themselves. But you’ll soon see extremely strange errors, that just mean that the instances override each other actions and corrupt data basically.

The Service part is also a complete non-sense — you need to access your Primary PG instance for read-write access, and you can’t pick “random” of out the cluster. Some instances might be in a recovery etc., so this won’t even work for a read-only approach.

So this approach is completely incorrect, even if you don’t lose the PVC, you will get corrupted data files just by multiple instances writing to the same dir.

I’ve done some experiments, trying to slightly enhance this — you can check the GitHub, but still is just a scratch repo of how not to do it. The basic idea was to at least let only one instance run on the same PVC and let others serve as “standbys”, but of course, this doesn’t much help.

Bitnami Helm Chart for Postgres

We use Bitnami packages all the time, right? Why don’t we just give it a go?

This is far better than the previous approach, but just when you open the docs for the first time, you should immediately understand:

  • People use this in production. This took a lot of effort to create and maintain.
  • It’s not going to be easy to deploy unless I know exactly the target setup I want to achieve. Also, some scenarios are not easily possible, so for a larger setup, a lot of in-house automation will probably be required.
  • Cognitive load is extremely high here given Bitnami’s approach in general — so not only the application itself but also the Helm chart is highly customizable, so you basically need to fight on two very different fronts.

Biggest problems:

  • 2nd-day operations — switch-over is not possible with a single primary
  • StatefulSet approach
  • Upgrades — not much support for any advanced use case
  • Backups — any traditional battle-proven solution is not included, so you can either use SQL directly or script something yourself.
  • Cluster status reporting

But let’s consider this approach as a baseline and do some testing. I’ll compare the results with operator-driven approaches later on to understand the benefit.

Basic chart construction:

  • Secrets — this is a good counter-example of why to use Helm Charts in general. Can you easily understand from this source, which attribute will be the password? I can’t. And working with the chart it has happened numerous times that I just lost the password as it was regenerated when changing some of the Helm values.
  • Primary StatefulSet (1 Pod) — you’ll get a separate StatefulSet just for primary, but you can of course run only a single instance. Except for some initdb startup scripts, there is not much special about this.
  • Replica StatefulSet — all replicas start provisioning a pg_basebackup from the Primary on start. Is this a good or a bad thing? It’s just not enough for production I would say. It’s nice for showcasing that the replicas will just come online with the first installation and if there is no disruption, this might work for a long time. But replicas are otherwise not controlled anyhow later on except for the basic Postgres replication. This means that for example if you recreate a Primary from scratch, Replicas are just stuck with the old cluster.
  • Considering the previous situation, it is quite difficult to understand the cluster status holistically. This is for example ArgoCD overview — everything looks green, and no restarts or problems are apparent, except that beneath the surface, replicas are completely disconnected.

So the result of the installation is quite frankly that you’ll get a working Primary and several Replicas just after the installation. You are on your own from now on almost to the whole Day2 operations. And of course, as we all know, the Day2 operations are the most expensive ones in the long term.

Default base-line for testing

Let’s consider, how well can Postgres as an installation perform in this non-ideal setup.

Primary Pod restart means a 2–3 min outage.

Replica Pod restart

  • 1 replica: 1–2 min outage, (!) primary won’t take over
  • 2+ replicas: minimal downtime (reconnect only)

(Switch-over not possible — single Primary only)

PVC deleted or damaged

  • For Replicas — provision with pg_basebackup — this is fine unless DB is large.
  • For Primary (cluster DEATH and data loss) — cluster failure.

Is this bad or not? Let’s talk about restarts — typically, this is not sufficient for a 24/7 web application, but this might be perfectly fine if the application has minimum load during the nights (for maintenance), or for any batch application. For any operator, we should expect values below 10s.

What is alarming in this setup is minimal automation around backup/recovery. It’s possible to set up a cron backup with pg_dumpall. This is the main reason why it’s not usable unless you have a very small and unchanging database.

Production-ready vs. Platform-ready

The Bitnami chart is a good example of a great difference between “production-ready” and something else entirely. Can you run a single Postgres Pod in production for years, given you have some kind of PVC backups? Oh yes, why not? If have only a single Postgres app in the whole company, you have no experience, and no SLA to the App this can certainly make sense, don’t get me wrong. This approach is also easier to maintain than using any operator, learning how to manage it. But when building a platform, consider running 20 clusters and another 50 next year. This would just mean that you need to create an enormous amount of tooling yourself around the chart. So it’s definitely not what I would call “platform-ready”.

Let’s discuss, if operators can do better than this. Here’s the next part discussing Zalando PG operator.

--

--

David Pech
David Pech

Written by David Pech

Kubernetes, ArgoCD, AWS, OCI, Postgres fan & Co-founder at Emmy https://www.linkedin.com/in/david-pech

No responses yet