Gracefully Scaling to 10k PostgreSQL Connections for $35/mo, Part Three

Alexander H. Black
Jul 12, 2018 · 8 min read

We recommend checking out Parts One and Two of this series where we discuss the PostgreSQL pooling concepts available in PgBouncer, the same design that we use to serve our clients’ planet-scale needs here at FutureTech Industries.

Image for post
Image for post

In Part Two of this series, we detailed the nuances and caveats of calculating an ideal PgBouncer configuration. When we’re designing planet-scale systems for our clients, we generally try to unify as much software management as possible, from the topmost layers of physical infrastructure to the bottommost layers that schedule and run software. There are a variety of options available to build a mini-stack of your own; we’ve seen and done some incredible things with tools like Salt and GCP Managed Instance Groups.

However, for the sake of simplicity, we prefer having a robust, battle-tested, ready-to-go solution from top to bottom, and for that reason, we’re big fans of Kubernetes for the majority of use cases. Alongside the more clear advantages, it offers effortless node and process scaling, health checks, configurable eviction, and a declarative and eventually consistent model. To that end, the most common questions we get go something like this:

Hi! We’ve made an investment in Kubernetes, and we’ve had a lot of success. How do we avoid rebuilding the wheel for the common parts of our infrastructure?

Working Towards The Solution

Charts — package instances in Helm terminology — are YAML-generating Go templates which are applied to the Kubernetes cluster running Tiller by spawning a Job to create resources. Alongside the Stable and Incubator repositories that are readily available, Helm allows registering custom repositories to install software from.

We’re happy to distill all of our knowledge and experience to produce one of our first publicly available Helm charts, to provide the smoothest deployment possible of PgBouncer on Kubernetes. With the modification of a simple config file and an invocation of the helm command, it’s possible to deploy PgBouncer pointing at a PostgreSQL database of your choosing. If you don’t already have a Kubernetes cluster, try Google Kubernetes Engine, and for a PostgreSQL database, Google Cloud SQL. We’ve used this setup with on-prem/off-prem, managed and unmanaged PostgreSQL and Kubernetes, so providers are mostly irrelevant.

Setting Up

$ helm repo add fti ''
$ helm repo update

Charts read their default configuration from a file called values.yaml in the chart itself. We can inspect our chart with helm inspect values fti/pgbouncer and redirect its output to a file for easy editing and tweaking. For this article, we recommend basing off an empty configuration file (pgbouncer-config.yaml for example) and setting the relevant values, because it’ll allow unspecified settings to change in the future if the chart changes its defaults. Nonetheless, it’s a great way to see what options are available to tune to your use case.

Installation instances are managed through four primary commands, which do precisely what they sound like they should: helm install, helm upgrade, helm list, and helm delete. We’ll expand on deploying with them towards the end, but before we break down installation, it’s worth taking a moment to note one of Helm’s less obvious but more impactful behaviors.

IMPORTANT: In line with Kubernetes’ philosophies, Helm replaces configuration by default. There are options to helm upgradewhich cause it to merge configurations. It’s tempting to use --set combined with --reuse-values when tweaking configuration, but this is a disaster that won’t be obvious until your next chart upgrade. The day someone forgets --reuse-values to helm upgrade, the prod configurations that took months to perfect are lost. For something as critical as PgBouncer, this means a possible service-wide outage while cluster operators rush to rebuild the configuration.

In our opinion, it’s critical to tweak configuration by editing the .yaml files, reapplying them, and storing them in a secure location that only the cluster operators can access, since it’ll store credentials to access the upstream database. For added security, it’s possible to store all of your configuration in your YAML file and pass --set to prevent storage of sensitive credentials alongside the configuration parameters themselves. There are many other solutions like Hashicorp’s Vault and Bitnami’s SealedSecrets which can be utilized in the same way, but our chart doesn’t currently support these since they depend on non-standard extensions.

Kubernetes Isn’t Magic

It’s worth taking a moment to explain probes, as they are absolutely essential to any production deployment on Kubernetes. The benefit of liveness and readiness probes is their ability to detect problems with a Service’s backing pods, route traffic to only healthy instances, and restart pods as needed. A readiness probe detects whether a pod can serve traffic, and a liveness probe triggers a pod restart on failure. They are absolutely essential on upstream proxies and core microservices for which failures will cascade.

Our chart provides probes which run a SQL query routed through the PgBouncer instance itself, to both ensure integrity and proper configuration and prevent bad rollovers when the cluster operator changes configuration values. The configured query is run automatically by Kubernetes multiple times a minute for each replica, so it should be lightweight to avoid additional load on the database itself. Liveness probes may need to be turned off for session pooling, because the application can use all upstream connections for long periods of time, causing it to fail and consequently restarting the pod with connections going through it. Our chart doesn’t currently allow for disabling readiness probes because any deployment of PgBouncer without them is guaranteed to cause potentially user-facing application failures.

Configuring Our Chart

The following is an example of a good configuration:

# The database credentials
username: "myusername"
password: "password123"
# The number of instances to run.
replicasCount: 2
# The maximum number of incoming client requests per instance.
maxClientConnections: 5000
# The maximum number of upstream requests per instance.
poolSize: 20
# Resources for the pods. For PgBouncer it's critical that these match to provide guaranteed QoS.
# The chart therefore doesn't allow setting requests and limits separately.
cpu: 150m
memory: 200Mi
# The custom node selector.
# By default this is off; see below.
mylabel: myvalue
# Set this to LoadBalancer to make PgBouncer available outside the cluster.
serviceType: ClusterIP

There are three important points about the above config. Two of them are simple. First is that the maximum number of upstream connections to the database is (replicasCount + 1) * poolSize. We add one to account for rollover. Second is that we explicitly don’t allow setting both requests and limits. Kubernetes has a concept of Quality of Service to prioritize pod eviction upon resource contention. It’s important that PgBouncer be Guaranteed, which means that requests and limits must match.

The third important point is the ability to set the nodeSelector. Kubernetes provides a feature called node autoscaling, which allows it to buy and sell servers to match current demand dynamically. When Kubernetes detects that a pod can’t be scheduled on the currently available hardware, it can be configured to reach out to the underlying cloud provider to buy more. Most deployments benefit from this behavior, but the flip side causes a problem for long-running services such as proxies. When Kubernetes detects that the pods on the cluster are underutilizing resources and the cluster is configured for node autoscaling, it gracefully terminates pods on the least-utilized nodes and returns them to the cloud provider.

On the whole, scaling down is desirable and saves money, but it can interrupt proxy services such as PgBouncer with connections still going through them. This results in upstream requests failing because the connection to the database through PgBouncer gets terminated by the scale-down. To fix this, the cluster operator can create an isolated node pool which has node autoscaling disabled, label it uniquely, and set the nodeSelector to only allow PgBouncer to schedule there. In this setup, PgBouncer failures are legitimate because the pods will be unaffected by cluster scaling operations.

There are many configuration options which are far more expansive than what we’ve covered here to allow for maximum flexibility. helm inspect values fti/pgbouncer displays the configuration base in its entirety, alongside explanatory comments.

The Finale

$ helm install --name my-pgbouncer --namespace my-pgbouncer-namespace fti/pgbouncer -f /path/to/config.yaml

--name and --namespace can be anything. In-cluster, PgBouncer will end up behind a load balancer at, also accessible as my-pgbouncer-name for applications within the same namespace. If health checks are enabled, they can be used to watch connection requests as well.

To verify the deployment, run the following and wait for the available pods to be non-zero:

$ kubectl get deploy -w my-pgbouncer-namespace

Follow that up with the following to tail logs for a random pod in real time:

$ kubectl logs -n my-pgbouncer-namespace deploy/my-pgbouncer -f

🎉 Congratulations! You can now stably handle even the worst of traffic spikes. 🎉

Though this is the model we use to design and deploy our clients’ configurations (and we’ve scaled it ridiculously big), we advise proving it works with your unique configuration before consideration in production. We’ve had the best luck on Google’s Kubernetes Engine. You can never be too careful.

Thanks to everyone who made this possible! Follow us on Medium for more articles on Kubernetes, PostgreSQL, and planet-scale deployment! There’s tons more to come.

Co-authored by Austin Hicks, Madison Steiner, and the rest of the wonderful team at FutureTech Industries.

FutureTech Industries

A collection of writings by the engineers of tomorrow.

Alexander H. Black

Written by

CEO of FutureTech Industries, former Tinder engineer and Data Scientist with MIT. Always working on honing my tech and political abilities to benefit the world.

FutureTech Industries

A collection of writings by the engineers of tomorrow.

Alexander H. Black

Written by

CEO of FutureTech Industries, former Tinder engineer and Data Scientist with MIT. Always working on honing my tech and political abilities to benefit the world.

FutureTech Industries

A collection of writings by the engineers of tomorrow.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store