Gracefully Scaling to 10k PostgreSQL Connections for $35/mo, Part Three
We recommend checking out Parts One and Two of this series where we discuss the PostgreSQL pooling concepts available in PgBouncer, the same design that we use to serve our clients’ planet-scale needs here at FutureTech Industries.
In Part Two of this series, we detailed the nuances and caveats of calculating an ideal PgBouncer configuration. When we’re designing planet-scale systems for our clients, we generally try to unify as much software management as possible, from the topmost layers of physical infrastructure to the bottommost layers that schedule and run software. There are a variety of options available to build a mini-stack of your own; we’ve seen and done some incredible things with tools like Salt and GCP Managed Instance Groups.
However, for the sake of simplicity, we prefer having a robust, battle-tested, ready-to-go solution from top to bottom, and for that reason, we’re big fans of Kubernetes for the majority of use cases. Alongside the more clear advantages, it offers effortless node and process scaling, health checks, configurable eviction, and a declarative and eventually consistent model. To that end, the most common questions we get go something like this:
Hi! We’ve made an investment in Kubernetes, and we’ve had a lot of success. How do we avoid rebuilding the wheel for the common parts of our infrastructure?
Working Towards The Solution
That’s a fairly wide question with a variety of factors at play and tools available, but our preference for single-cluster management is Helm, a nautically-themed package manager, capable of installing an isolated group of instances of software with your configuration in just one command.
Charts — package instances in Helm terminology — are YAML-generating Go templates which are applied to the Kubernetes cluster running Tiller by spawning a Job to create resources. Alongside the Stable and Incubator repositories that are readily available, Helm allows registering custom repositories to install software from.
We’re happy to distill all of our knowledge and experience to produce one of our first publicly available Helm charts, to provide the smoothest deployment possible of PgBouncer on Kubernetes. With the modification of a simple config file and an invocation of the
helm command, it’s possible to deploy PgBouncer pointing at a PostgreSQL database of your choosing. If you don’t already have a Kubernetes cluster, try Google Kubernetes Engine, and for a PostgreSQL database, Google Cloud SQL. We’ve used this setup with on-prem/off-prem, managed and unmanaged PostgreSQL and Kubernetes, so providers are mostly irrelevant.
To try out or deploy our chart, a Kubernetes cluster with Helm installed and a PostgreSQL database to connect to are required. Instructions for deploying Helm are here, but in summary, the procedure involves acquiring the Helm CLI binary, authenticating to the cluster in question, and running
helm init with a flag or two, depending on your setup. That’ll install Tiller, Helm’s server-side component which stores metadata for and manages/installs Helm charts. Finish by adding the FutureTech Industries charts repository to your CLI’s search path (you may have done this already for Part One of our HA Factom series; if so, you only need the
helm repo update command):
$ helm repo add fti 'https://futuretechindustriesllc.github.io/charts'
$ helm repo update
Charts read their default configuration from a file called
values.yaml in the chart itself. We can inspect our chart with
helm inspect values fti/pgbouncer and redirect its output to a file for easy editing and tweaking. For this article, we recommend basing off an empty configuration file (
pgbouncer-config.yaml for example) and setting the relevant values, because it’ll allow unspecified settings to change in the future if the chart changes its defaults. Nonetheless, it’s a great way to see what options are available to tune to your use case.
Installation instances are managed through four primary commands, which do precisely what they sound like they should:
helm list, and
helm delete. We’ll expand on deploying with them towards the end, but before we break down installation, it’s worth taking a moment to note one of Helm’s less obvious but more impactful behaviors.
IMPORTANT: In line with Kubernetes’ philosophies, Helm replaces configuration by default. There are options to
helm upgradewhich cause it to merge configurations. It’s tempting to use
--reuse-valueswhen tweaking configuration, but this is a disaster that won’t be obvious until your next chart upgrade. The day someone forgets
helm upgrade, the prod configurations that took months to perfect are lost. For something as critical as PgBouncer, this means a possible service-wide outage while cluster operators rush to rebuild the configuration.
In our opinion, it’s critical to tweak configuration by editing the
.yamlfiles, reapplying them, and storing them in a secure location that only the cluster operators can access, since it’ll store credentials to access the upstream database. For added security, it’s possible to store all of your configuration in your YAML file and pass
--setto prevent storage of sensitive credentials alongside the configuration parameters themselves. There are many other solutions like Hashicorp’s Vault and Bitnami’s SealedSecrets which can be utilized in the same way, but our chart doesn’t currently support these since they depend on non-standard extensions.
Kubernetes Isn’t Magic
Our chart provides a PgBouncer configuration that’s capable of connecting to one upstream database with one user. The application connects using that user’s credentials, and PgBouncer does its multiplexing magic. We support easy configuration of the most frequently used PgBouncer settings, the ability to inject custom configuration we don’t support into
pgbouncer.ini, and liveness and readiness probes, which allow automatic routing to healthy pods and graceful pod restart. Database credentials are stored in Secrets and configurations in ConfigMaps, following Kubernetes best practices.
It’s worth taking a moment to explain probes, as they are absolutely essential to any production deployment on Kubernetes. The benefit of liveness and readiness probes is their ability to detect problems with a Service’s backing pods, route traffic to only healthy instances, and restart pods as needed. A readiness probe detects whether a pod can serve traffic, and a liveness probe triggers a pod restart on failure. They are absolutely essential on upstream proxies and core microservices for which failures will cascade.
Our chart provides probes which run a SQL query routed through the PgBouncer instance itself, to both ensure integrity and proper configuration and prevent bad rollovers when the cluster operator changes configuration values. The configured query is run automatically by Kubernetes multiple times a minute for each replica, so it should be lightweight to avoid additional load on the database itself. Liveness probes may need to be turned off for session pooling, because the application can use all upstream connections for long periods of time, causing it to fail and consequently restarting the pod with connections going through it. Our chart doesn’t currently allow for disabling readiness probes because any deployment of PgBouncer without them is guaranteed to cause potentially user-facing application failures.
Configuring Our Chart
To proceed, start by making the decisions we outlined in Part One, using our sample if you’d like, then decide on a user, query, and database to use for the liveness and readiness probes. The chart uses
select 1; as the default query and falls back to the default username and database if none is specified, which is sufficient in most cases. The chart’s default
transaction, which is the default most applications want. Our readiness probe isn’t optional, but our liveness probe can be turned off by explicitly setting it to
null in the configuration file. It’s likely to be required if your application uses session pooling. The chart exposes all of the important settings Kubernetes itself supports, but has sane defaults for those not mentioned here.
The following is an example of a good configuration:
# The database credentials
host: 192.168.1.2# The number of instances to run.
replicasCount: 2# The maximum number of incoming client requests per instance.
maxClientConnections: 5000# The maximum number of upstream requests per instance.
poolSize: 20# Resources for the pods. For PgBouncer it's critical that these match to provide guaranteed QoS.
# The chart therefore doesn't allow setting requests and limits separately.
memory: 200Mi# The custom node selector.
# By default this is off; see below.
mylabel: myvalue# Set this to LoadBalancer to make PgBouncer available outside the cluster.
There are three important points about the above config. Two of them are simple. First is that the maximum number of upstream connections to the database is
(replicasCount + 1) * poolSize. We add one to account for rollover. Second is that we explicitly don’t allow setting both requests and limits. Kubernetes has a concept of Quality of Service to prioritize pod eviction upon resource contention. It’s important that PgBouncer be Guaranteed, which means that requests and limits must match.
The third important point is the ability to set the nodeSelector. Kubernetes provides a feature called node autoscaling, which allows it to buy and sell servers to match current demand dynamically. When Kubernetes detects that a pod can’t be scheduled on the currently available hardware, it can be configured to reach out to the underlying cloud provider to buy more. Most deployments benefit from this behavior, but the flip side causes a problem for long-running services such as proxies. When Kubernetes detects that the pods on the cluster are underutilizing resources and the cluster is configured for node autoscaling, it gracefully terminates pods on the least-utilized nodes and returns them to the cloud provider.
On the whole, scaling down is desirable and saves money, but it can interrupt proxy services such as PgBouncer with connections still going through them. This results in upstream requests failing because the connection to the database through PgBouncer gets terminated by the scale-down. To fix this, the cluster operator can create an isolated node pool which has node autoscaling disabled, label it uniquely, and set the nodeSelector to only allow PgBouncer to schedule there. In this setup, PgBouncer failures are legitimate because the pods will be unaffected by cluster scaling operations.
There are many configuration options which are far more expansive than what we’ve covered here to allow for maximum flexibility.
helm inspect values fti/pgbouncer displays the configuration base in its entirety, alongside explanatory comments.
With the configuration ready, run the following command:
$ helm install --name my-pgbouncer --namespace my-pgbouncer-namespace fti/pgbouncer -f /path/to/config.yaml
--namespace can be anything. In-cluster, PgBouncer will end up behind a load balancer at
my-pgbouncer-name.my-pgbouncer-namespace.svc.cluster.local, also accessible as
my-pgbouncer-name for applications within the same namespace. If health checks are enabled, they can be used to watch connection requests as well.
To verify the deployment, run the following and wait for the available pods to be non-zero:
$ kubectl get deploy -w my-pgbouncer-namespace
Follow that up with the following to tail logs for a random pod in real time:
$ kubectl logs -n my-pgbouncer-namespace deploy/my-pgbouncer -f
🎉 Congratulations! You can now stably handle even the worst of traffic spikes. 🎉
Though this is the model we use to design and deploy our clients’ configurations (and we’ve scaled it ridiculously big), we advise proving it works with your unique configuration before consideration in production. We’ve had the best luck on Google’s Kubernetes Engine. You can never be too careful.
Thanks to everyone who made this possible! Follow us on Medium for more articles on Kubernetes, PostgreSQL, and planet-scale deployment! There’s tons more to come.
Co-authored by Austin Hicks, Madison Steiner, and the rest of the wonderful team at FutureTech Industries.