DBaaS in 2024: Which PostgreSQL operator for Kubernetes to select for your platform? Part 3 — Zalando Operator, Cybertec PG-operator

David Pech
7 min read6 days ago

--

Here’s the previous part of the series describing the Bitnami chart and DIY solutions.

Zalando Operator was originally created on top of the Patroni tool, which is a de facto standard for H-A outside containers. Let’s briefly discuss what it is and how it works.

So Patroni is a set of Python wrappers that orchestrate Postgres lifecycle. It handles a large number of problems — provisioning a PG node, reconciling PG cluster status, switch-over/fail-over, and many more. It uses a “distributed configuration store” (DCS) as a single source of truth (who is currently the master etc.). In fact, you need to run DCS like etcd or others next to your regular PG cluster. And this is quite painful outside of K8s, as you need some etcd know-how. The idea is completely the same as for example Kafka project used with ZooKeeper before moving to KRaft and ditching ZooKeeper as “DCS”. Having a DCS was very innovative at a time because different solutions (repmgr) suffered from split-brain problems with 2 nodes.

Postgres DBAs, in my experience, typically are well familiar with Patroni, normally they like it very much and trust it. A lot of them have vast experience with it even with quite complicated setups (multiple levels of replication) or niche error scenarios (data corruption, pg_rewind extension). Typically some Ansible playbook is used to bootstrap it all, but it’s required to mention that most of the DBAs have some “secret ingredient” in the playbooks, or tune something entirely else than the rest of the world.

Because there are many moving parts (PG itself, backup package, Patroni, etcd) the Patroni authors created project Spilo. It’s a Postgres container (distribution you could say) bundled with all the tools that you need. Also, it’s not just a “pile of tools”, but a lot of scripts to make this work together, make it configurable easily etc. Again, in the early Docker container days, this was a leap forward as Postgres is notoriously hard to containerize with all the tooling around.

So the main idea for the Zalando Operator was to use Spilo and create a relatively “thin” layer above as K8s operator to start containers. So typically you use patronictl to control the cluster:

It is possible to configure patronictl to connect fully remotely, but I’ve seen typically something like this:

So a lot of users are typically back to good old “SSH into the Pods” practice. But I think that for seasoned DBA this is much more convenient than using the tool only remotely, they are used to inspect PG installation on the disk and typically want to see the content of all config files etc. first hand. Even though Patroni can manage this transparently, still the practice prevails. Also you use other operator features like backup-restore, you will need this approach for some debugging.

The construction of the Spilo container looks similar to this. I have simplified this quite a bit, especially the fact that there are multiple containers running — but I’d just like to point out that this is “old school” very similar to when people tried to migrate multiple programs inside a single Pod under supervisord.

I am not saying that this construction is wrong, but nowadays you can do much better. Main problems:

  • A Single type of StatefulSets — you can’t mix different params of Pod types (either asymmetric Primary/Replica or some other affinity rules for example). Also, StatefulSets themselves have a lot of limitations (there is an excellent talk for Kafka operator describing this problem)
  • Spilo container contains a lot of content, so keeping up-to-date means not just checking PG security itself, but also Patroni, Python, backup tools, etcd (note — etcd itself from Spilo container is not used typically in K8s, but typically a scanner might pick us old version of this)
  • RunSVC projects keep Patroni and some other less important tools up-to-date. So we have a traditional problem defining Liveness and Readiness probes — they just can’t be connected to a single program and it’s hard to understand what the Pod can report.
  • The whole PG lifecycle is happening inside this single Pod. That means for example for a replica — the base backup for creating the Replica, regular serving the Postgres as Replica, switching over to Primary etc. The Pod still signals it’s alive and ready typically — so from the K8s observability, there is not much you can do to get an insight into what is happening. You will need to use patronictl at some point.
  • Spilo is mostly configured via ENV variables, and this was quite confusing to me, as you can also use this approach in the operator CRDs, or there are some “smart abstractions” (for example for backup). At the end of the day, it is also quite difficult to debug problems — you basically need to kubectl exec and get logs from RunSVC, or try to guess, why the particular combination of ENV variables did not work as intended. Also Spilo has support for multiple backup tools and also multiple object storage providers (S3, Azure, GCP), which makes this even more complicated.
  • Spilo containers use root user by default for the management. To connect without a password to Postgres, you’ll need to swich su — postgres . Modern trends typically prefer to drop root privileges, which is not the case.

Let’s now consider running Zalando Operator in a GitOps environment. Operator can install it nicely, it’s a simple Helm chart. Basic CRDs are easy to understand and use. However the main problems, I have encountered, are how to understand the cluster status. For example in ArgoCD, you can’t see what’s generated by the CRD (the operator does not use parent-child relations between CRD and generated resources):

On the CRD, some K8s events are emitted — this is useful:

Events have a very limited lifetime though (several hours). But the .status field of the CRD is just not enough to understand the current state:

So this is for example the status of my tests when the cluster was completely broken and basically it lost the Primary and got stuck in the Creating phase.

Selected Features

There is a simple web UI, but I am generally not a fan of this. The major use-case is to help someone without K8s knowledge construct the CRDs and there are several options for displaying logs and cluster status (all based on Events etc.). From my perspective, I can imagine that this brings value only for onboarding use cases, but typically you will very early disable some options etc — and of course, you would need to use Backstage or some other platform engineering project at that point.

Major version upgrades are possible with Spilo. You can either use Spilo’s upgrade script (in-container) which runspg_upgrade under the hoop. This might be of course much faster than reprovisioning the cluster from backup or via replication, but if there is any problem with pg_upgrade , you will need to use backup. The other option is just to bump .spec.postgresql.version . Which can’t be much easier, if things go OK:

Switch-over/fail-over is definitely super mature with Patroni. How it does fencing, how it can leverage pg_rewind and possibly save you reprovisioning of the old primary. There are many small but useful features built into the Patroni over the years.

Summary

It’s very important to note that Zalando Operator is mature, well-tested and run for years in production. According to the following talk from 2021, just Zalando used this for >1200 production PG clusters with the biggest cluster >12TB.

These numbers just speak for themselves any time of day. My points are mostly due to the fact that this solution is not very easy to operate with K8s native tooling and you’ll definitely need to build some tooling around patronictl yourself if you aim to build a platform service on top of it.

I currently have around a year of experience operating multiple small (< 30GB DBs) with Zalando operator and I would say — it works well with minimal maintenance. To be fair, this is only my personal small-scale production experience and I don’t find it conclusive as there were objectively minimal disruptions from node failures etc. (and this is where the major added value of the operator should be)

Note: Cybertec Support

Just a side note about Cybertec, which is a large and well-established Postgres consultancy, I’ve seen them promoting their own fork called Cybertec PG-operator in December 2023 at PGconf.eu. The fork is mainly because of different backup/restore tools (Zalando prefers Wall-E/Wall-G, Cybertec likes pgbackrest) and they use the operator for their enterprise offerings. The main logic is similar, but of course, we can expect that the fork will diverge from the mainline. Since the Czech P2D2 conference in June 2024, there has been an update from Cybertec that they are moving to CloudNativePG operator (except for their managed offerings). So for me personally, it’s unclear if Zalando operator will have a strong Postgres consultancy with active approach behind it or not.

Next up is the PGO operator. (Coming soon)

--

--

David Pech
David Pech

Written by David Pech

Kubernetes, ArgoCD, AWS, OCI, Postgres fan & Co-founder at Emmy https://www.linkedin.com/in/david-pech

No responses yet