A Practical Guide to using PgCat with AlloyDB

Harinderjit Singh
Google Cloud - Community
15 min readAug 27, 2024

Introduction

PgCat is a PostgreSQL connection pooler and proxy which scales PostgreSQL databases beyond a single instance.

Of course, an external connection pool like this should only be used if you cannot use an application connection pool or when the external pooler has the features which no one else offers. pgcat, which provides some useful features that I have not seen with the other poolers.

It supports replicas, load balancing, sharding, failover, and many more features of highly available enterprise-grade PostgreSQL deployment such as GCP’s AlloyDB and Cloud SQL.

PgCat is free and open source, distributed under the MIT license.

Purpose

My fascination with pgcat stems from its powerful synergy with Google Cloud’s managed database services, AlloyDB and Cloud SQL. In my work, I frequently encounter the challenges of optimizing performance and ensuring high availability in these environments. pgcat’s unique capabilities make it a perfect fit for addressing these needs.

One of the most compelling features is its ability to intelligently manage connections to an AlloyDB cluster. You can define the PostgreSQL endpoints (primary and replicas) within pgcat’s configuration. This empowers pgcat to distribute incoming transactions efficiently across these servers, routing read queries to replicas and write queries to the primary. It’s like having a built-in traffic director for your database.

Furthermore, pgcat’s automatic failover for reads mechanism adds a layer of resilience. If a database connection becomes unresponsive, pgcat seamlessly reconnects it to the next available server in the pool. This ensures that your applications remain operational even in the face of unexpected disruptions.

In this blog post, we’ll dive deep into a hands-on experiment, demonstrating how to configure and use pgcat with AlloyDB. You’ll learn how this powerful combination can help you build a highly available, ,resilient, performant, and scalable database infrastructure.

Using PgCat with multi-regional AlloyDB deployment

Create an AlloyDB Cluster

read -p "region : " REGION
read -p "projectid : " PROJECT_ID
read -p "vpcnet : " DB_VPC_NET
read -p "region2 : " REGION_SEC
read -p "password : " PASSWORD

## create primary cluster
gcloud alloydb clusters create alloydb-cls-$(date +%d%m%Y) --region=${REGION} --password=$PASSWORD --network=$DB_VPC_NET --project=${PROJECT_ID}

## create primary Instance
gcloud beta alloydb instances create alloydb-ins-001 --cluster=alloydb-cls-$(date +%d%m%Y) --region=${REGION} --instance-type=PRIMARY --cpu-count=2 --database-flags=pg_stat_statements.track=all,alloydb.iam_authentication=on,alloydb.enable_auto_explain=on,log_lock_waits=on,alloydb.pg_authid_select_role=postgres,pg_stat_statements.track=all,auto_explain.log_min_duration=100,auto_explain.log_nested_statements=on,auto_explain.log_buffers=on --availability-type=ZONAL --project=${PROJECT_ID}

## create user on primary Instance
gcloud alloydb users create sbuser --password=$PASSWORD --cluster=alloydb-cls-$(date +%d%m%Y) --type=BUILT_IN --region=${REGION} --project=${PROJECT_ID}

## create primary read pool Instance
gcloud beta alloydb instances create alloydb-ins-rd-001 --cluster=alloydb-cls-$(date +%d%m%Y) --region=${REGION} --availability-type=ZONAL --project=${PROJECT_ID} --instance-type=READ_POOL --read-pool-node-count=1 --cpu-count=2 --database-flags=pg_stat_statements.track=all,alloydb.iam_authentication=on,alloydb.enable_auto_explain=on,log_lock_waits=on,alloydb.pg_authid_select_role=postgres,pg_stat_statements.track=all,auto_explain.log_min_duration=100,auto_explain.log_nested_statements=on,auto_explain.log_buffers=on

## create a secondary cluster in a different region
gcloud alloydb clusters create-secondary alloydb-cls-$(date +%d%m%Y)-sec --primary-cluster=projects/${PROJECT_ID}/locations/${REGION}/clusters/alloydb-cls-$(date +%d%m%Y) --region=${REGION_SEC} --project=${PROJECT_ID}

## create a secondary Instance
gcloud beta alloydb instances create-secondary alloydb-ins-001-sec --cluster=alloydb-cls-$(date +%d%m%Y)-sec --region=$REGION_SEC --availability-type=ZONAL --project=${PROJECT_ID}

## create a secondary read pool Instance
gcloud beta alloydb instances create alloydb-ins-rd-001-sec --cluster=alloydb-cls-$(date +%d%m%Y)-sec --region=${REGION_SEC} --availability-type=ZONAL --project=${PROJECT_ID} --instance-type=READ_POOL --read-pool-node-count=1 --cpu-count=2 --database-flags=pg_stat_statements.track=all,alloydb.iam_authentication=on,alloydb.enable_auto_explain=on,log_lock_waits=on,alloydb.pg_authid_select_role=postgres,pg_stat_statements.track=all,auto_explain.log_min_duration=100,auto_explain.log_nested_statements=on,auto_explain.log_buffers=on

These commands will create:

  • A primary AlloyDB cluster with one primary and one readpool Instance in us-central1 region
  • An AlloyDB user called sbuser in primary Instance
  • A secondary AlloyDB cluster with one secondary and one readpool Instance in us-east4 region

Setup experiment database

  • Create a database called sbtest with owner sbuser
  • Grant all privileges on sbtestdatabase and public schema to sbuser
read -p "password : " PASSWORD
export PGPASSWORD=$PASSWORD

###as postgres user and postgres database
psql -h <Primary-Private-IP> -p 5432 -U postgres postgres -w -c "ALTER USER sbuser CREATEDB;"

###as sbuser user and postgres database
psql -h <Primary-Private-IP> -p 5432 -U sbuser postgres -w -c "CREATE DATABASE sbtest OWNER sbuser;"

###as postgres user and sbtest database
psql -h <Primary-Private-IP> -p 5432 -U postgres sbtest -w <<EOF
GRANT ALL PRIVILEGES ON DATABASE SBTEST TO SBUSER;
GRANT ALL ON SCHEMA public TO SBUSER;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO sbuser;
EOF

Populate data

Connect directly to database to populate some data for pgbench and sysbench

export PGPASSWORD=$PASSWORD
sysbench --db-driver=pgsql --pgsql-port=5432 --pgsql-user=sbuser \
--pgsql-host=<Primary-Private-IP> \
/usr/share/sysbench/oltp_write_only.lua prepare '--tables=10' \
'--table-size=1000000' --threads=20 --time=120 --db-ps-mode=disable

pgbench -h <Primary-Private-IP> -p 5432 -U sbuser -i -s 50 sbtest

You can edit the configuration i.e. — tables and — table-size as per your liking

Install and configure Pgcat

  • In this post, you will pgcat on a GCE VM, you can also use a Managed Instance group or GKE deployment as well

For simplicity you will create one pgcat container on GCE in us-central1. Connection pooler should ideally be in the region of the Database Instance and application/clients in the same region would experience better performance.

  • Create a GCE VM in the same VPC network (different subnet) where pgcat container will be deployed.
read -p "GCE subnet:" GCE_SUBNET
gcloud compute instances create instance-$(date +%d%m%Y) \
--project=${PROJECT_ID} --zone=${REGION}-a --machine-type=e2-medium \
--network-interface=network-tier=PREMIUM,stack-type=IPV4_ONLY,subnet=$GCE_SUBNET \
--provisioning-model=SPOT \
--scopes=https://www.googleapis.com/auth/cloud-platform \
--create-disk=auto-delete=yes,boot=yes,device-name=instance-$(date +%d%m%Y)-230433,image=projects/debian-cloud/global/images/debian-12-bookworm-v20240213,mode=rw,size=10,type=projects/${PROJECT_ID}/zones/${REGION}-a/diskTypes/pd-balanced \
--no-shielded-secure-boot --shielded-vtpm --shielded-integrity-monitoring \
--labels=goog-ec-src=vm_add-gcloud --reservation-affinity=any \
--preemptible --metadata=startup-script='#! /bin/bash
apt update -y
apt -y install golang unzip git'
  • SSH to the GCE VM
  • Install docker and containerd

curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh ./get-docker.sh
sudo systemctl enable --now docker.service
sudo systemctl enable --now containerd.service
echo "alias docker='sudo docker'" >> .bashrc
. .bashrc
  • Create the config file called pgcat.toml
  • You can see that all configuration keys have the explanation in form of comments
[general]
# What IP to run on, 0.0.0.0 means accessible from everywhere.
host = "0.0.0.0"
# Port to run on, same as PgBouncer used in this example.
port = 6432
# Connecting to that database allows running commands like `SHOW POOLS`, `SHOW DATABASES`, etc..
admin_username = "pgcat"
# Password to access the virtual administrative database
admin_password = "***********"
# Enable/disable server TLS
server_tls = true

# How long to wait before aborting a server connection (ms).
connect_timeout = 5000 # milliseconds

# How long an idle connection with a server is left open (ms).
idle_timeout = 30000 # milliseconds

# How much time to give the health check query to return with a result (ms).
healthcheck_timeout = 100 # milliseconds

# How long to keep connection available for immediate re-use, without running a healthcheck query on it
healthcheck_delay = 3000 # milliseconds

# How much time to give clients during shutdown before forcibly killing client connections (ms).
shutdown_timeout = 6000 # milliseconds

# How long to ban a server if it fails a health check (seconds).
ban_time = 5# seconds

prepared_statements_cache_size = 500

## Primary cluster pool
[pools.sbtest_usc]

# Pool mode (see PgBouncer docs for more).
# `session` one server connection per connected client
# `transaction` one server connection per client transaction
pool_mode = "transaction"

# If Query Parser is enabled, we'll attempt to parse
# every incoming query to determine if it's a read or a write.
# If it's a read query, we'll direct it to a replica. Otherwise, if it's a write,
# we'll direct it to the primary.
query_parser_enabled = true

# If the query parser is enabled and this setting is enabled, we'll attempt to
# infer the role from the query itself.
query_parser_read_write_splitting = true

# If the query parser is enabled and this setting is enabled, the primary will be part of the pool of databases used for
# load balancing of read queries. Otherwise, the primary will only be used for write
# queries. The primary can always be explicitly selected with our custom protocol.
primary_reads_enabled = true

# If the client doesn't specify, PgCat routes traffic to this role by default.
# `any` round-robin between primary and replicas,
# `replica` round-robin between replicas only without touching the primary,
# `primary` all queries go to the primary unless otherwise specified.
default_role = "any"


[pools.sbtest_usc.users.0]
# Maximum number of server connections that can be established for this user
# The maximum number of connection from a single Pgcat process to any database in the cluster
# is the sum of pool_size across all users.
pool_size = 50
username = "sbuser"
password = "********"

[pools.sbtest_usc.shards.0]
database = "sbtest"
# Array of servers in the shard, each server entry is an array of `[host, port, role]`
servers = [
["<Primary-Private-IP>", 5432, "primary"]
,["<Primary-readpool1-Private-IP>", 5432, "replica"]
]

## Secondary cluster pool
[pools.sbtest_use]
pool_mode = "transaction"
default_role = "replica"

[pools.sbtest_use.users.0]
pool_size = 50
username = "sbuser"
password = "********"

[pools.sbtest_use.shards.0]
database = "sbtest"
servers = [
["<Secondary-Private-IP>", 5432, "replica"]
,["<Secondary-readpool1-Private-IP>", 5432, "replica"]
]
  • Make changes to IPs and Password as per your environment
  • In above configuration pool_mode is “transaction
  • You can also tweak other configurations
  • Here we have configured two pools. Primary pool is “sbtest_usc” which has a Primary Instance and a readpool Instance and thats why the default role is “any”.
  • Other pool is secondary pool which is “replica” only pool and has two read Instances
  • https://github.com/postgresml/pgcat/blob/main/pgcat.toml is sample toml
  • https://postgresml.org/docs/open-source/pgcat/configuration shows all the configuration knobs available in pgcat. It also defines the usage of all the configuration parameters.
  • Once the pgcat.toml is ready proceed to next step.
  • Start the pgcat container on port 6432 (you can change that)
docker run -d --name pgcat -v $(pwd)/pgcat.toml:/etc/pgcat/pgcat.toml \
-p 6432:6432 ghcr.io/postgresml/pgcat:latest

docker container ls
docker ps -a
docker logs pgcat -f
docker stop pgcat
docker start pgcat
docker logs pgcat -f
  • Please review log for any issues
  • If there are no errors reported, pgcat is listening on 6432

PgCat cli

Using psql, you can connect to PgCat with the following command:

export PGPASSWORD=$PASSWORD 
psql -h 127.0.0.1 -p 6432 -U pgcat pgcat -w
show help

Note that the database name used in the connection string is the pool name, not the actual name of the database in AlloyDB.

Here pgcat pool name is used to query the metadata from pgcat.

You can use show help; after connection to pgcat and it will display many useful commands.

Few of them are listed below:

psql -h 127.0.0.1 -p 6432 -U pgcat pgcat -w <<EOF
show config;
show databases;
show pools;
show clients;
show servers;
show stats;
EOF

Unfortunately it doesn’t support data filtering.

Experimenting with Features

Now you are ready for some hands on experiments.

You will create some database workload using pgbench and sysbench for each experiment.

Query load balancing

PgCat seamlessly distributes AlloyDB queries across multiple replica servers. Clients connect to a single PgCat instance, which acts as a proxy for the actual AlloyDB database. Behind the scenes, PgCat manages its own pool of connections to the replicas, efficiently balancing the incoming query load.

The queries are evenly distributed to all available servers using one of the three supported load balancing strategies: random, round robin, or least active connections.

  • Random load balancing picks a replica using a random number generator. Random, on average, is the most fair strategy, and we recommended it for most workloads.
  • Round robin counts queries and sends them to replicas in order. Round robin assumes all queries have equal cost and all replicas have equal capacity to service requests.
  • Least active connections picks the replica with the least number of actively running queries. Least active connections assumes queries have different costs and replicas have different capacity, and could improve performance over round robin, by evenly spreading the load across replicas of different sizes.

Since we haven’t configured “load_balancing_mode”, PgCat will use random which the default mode.

sysbench

  • We can create read workload against “sbtest_use” pool (secondary cluster) using below command
sysbench --db-driver=pgsql --pgsql-port=6432 --pgsql-user=sbuser \
--pgsql-host=127.0.0.1 /usr/share/sysbench/oltp_read_only.lua run \
'--tables=5' '--table-size=1000000' --threads=30 --time=120 \
--pgsql-db=sbtest_use --db-ps-mode=disable
psql -h 127.0.0.1 -p 6432 -U pgcat pgcat -w -c "\x" -c  "show stats"
Output shows queries are load Balanced between Instances
  • Read queries load balance very well between both secondary cluster nodes as per load balancing strategy: random when query_parser_enabled and query_parser_read_write_splitting were not set.
  • Now, you can create read workload against “sbtest_usc” pool (Primary cluster) using below command and retest.
sysbench --db-driver=pgsql --pgsql-port=6432 --pgsql-user=sbuser \
--pgsql-host=127.0.0.1 /usr/share/sysbench/oltp_read_only.lua run \
'--tables=5' '--table-size=1000000' --threads=30 --time=120 \
--pgsql-db=sbtest_usc --db-ps-mode=disable
psql -h 127.0.0.1 -p 6432 -U pgcat pgcat -w -c "\x" -c  "show stats"
Output shows queries are not load Balanced between Instances
  • Read queries don’t load balance between both Primary cluster nodes. All read queries went to Primary node. That is because of query_parser_enabled = true and query_parser_read_write_splitting = true and when the query parsing is enabled, selects within Begin <statement> End; transaction block are considered as write transaction by query parser.
  • If parameters query_parser_enabled and
    query_parser_read_write_splitting were not set, reads would routed only to replica unless primary_reads_enabled = true

Read load balancing works well with session mode as well.

pgbench

  • We can create read workload against “sbtest_usc” pool (Primary cluster) using below command
export PGPASSWORD=$PASSWORD
pgbench -h 127.0.0.1 -p 6432 -U sbuser -c 10 -j 10 -t 20000 sbtest_usc -S -n
  • Read queries load balance well between both Primary cluster nodes as per load balancing strategy: random whether query_parser_enabled and
    query_parser_read_write_splitting were set to true or not. That is because pgbench doesn’t use transaction blocks for the read queries.

We don’t really need query parsing for secondary cluster (which is ready only unless promoted or if there is a switchover).

Read/write query separation

A typical application reads data much more frequently than writes it. To help scale read workloads, AlloyDB deployments add read pool nodes which can serve SELECT queries.

PgCat is able to inspect queries and determine if the query is a SELECT which, most of the time, will read data, or a write query like an INSERT or UPDATE.

If PgCat is configured with both the primary (or secondary) and corresponding read pool nodes, it will route all read queries to the read pool nodes, while making sure write queries are sent to the primary (unless reads are allowed on primary).

pool_mode = "transaction"
query_parser_enabled = true
query_parser_read_write_splitting = true
primary_reads_enabled = true

Removing read traffic from the primary can help scale it beyond its normal capacity using “primary_reads_enabled = false”, and can also help with high availability, as the primary is typically the most loaded instance in a deployment. This functionality can be seamlessly integrated into your existing applications without requiring any changes to their code.

In case of read write workload, all workload (even reads) was routed just to Primary Instance. This is true for both sysbench and pgbench.

sysbench --db-driver=pgsql --pgsql-port=6432 --pgsql-user=sbuser \
--pgsql-host=127.0.0.1 /usr/share/sysbench/oltp_read_write.lua run \
'--tables=5' '--table-size=1000000' --threads=30 --time=120 \
--pgsql-db=sbtest_usc --db-ps-mode=disable



### OR

pgbench -h 127.0.0.1 -p 6432 -U sbuser -c 10 -j 10 -t 20000 sbtest_usc -n
psql -h 127.0.0.1 -p 6432 -U pgcat pgcat -w -c "\x" -c  "show stats"
Output shows queries are only routed to primary Instance

We already discussed why even reads executed by sysbench are routed to Primary Instance instead of readpool Instance when query_parser_enabled = true and query_parser_read_write_splitting = true

If you run a custom workload that includes both SELECT (read) and UPDATE (write) queries (like the example below), you’ll see that the query parser efficiently separates and routes them appropriately. Writes will be directed to the Primary instance, while reads will be distributed across both the Primary and read pool nodes.

for x in {1..120}; do psql "sslmode=prefer hostaddr=127.0.0.1 port=6432 user=sbuser dbname=sbtest_usc" << EOF
select count(*) from public.sbtest1 limit 10;
update public.sbtest1 set id=floor(random() * 100000000 + 1)::int where id=$x;
EOF
sleep 1;done
Output shows read queries are routed to primary and replica while writes are routed only to primary

In Our last experiment, Query parsing works very well. We observe that read queries are routed to both primary and replica Instances because of primary_reads_enabled = true. While writes are routed only to primary Instance.

The query parser does its best to determine where the query should go, but sometimes that’s not possible.

In that case, the client can select which server it wants using this custom SQL syntax:

-- To talk to the primary for the duration of the next transaction:
SET SERVER ROLE TO 'primary';
-- To talk to the replica for the duration of the next transaction:
SET SERVER ROLE TO 'replica';

This method would work only in pool_mode = “transaction”.

High availability

PgCat supports health checks and failover for reads. It maintains an internal map of healthy and unavailable nodes, and makes sure queries are only routed to healthy instances.

If a read pool node fails a health check, it is banned from serving additional traffic for a configurable amount of time. This significantly reduces errors in production when instance hardware inevitably fails.

Broken read pool nodes are checked again after the traffic ban expires, and if they continue to fail, are prevented from serving queries. If a read pool node is permanently down, it’s best to remove it from the configuration to avoid any intermittent errors.

The configuration parameters for read failover.

healthcheck_timeout = 100 # milliseconds
healthcheck_delay = 30000
shutdown_timeout = 6000 # milliseconds
ban_time = 5 # seconds

To test the read workload failover, you can create a read write workload using below script and once the script is running restart readpool after every 30 seconds.

for x in {1..120}; do psql "sslmode=prefer hostaddr=127.0.0.1 port=6432 user=sbuser dbname=sbtest_usc" << EOF
select count(*) from public.sbtest1 limit $x;
select count(*) from public.sbtest3 limit $x;
update public.sbtest1 set id=floor(random() * 100000000 + 1)::int where id=$x;
EOF
sleep 1;done
psql -h 127.0.0.1 -p 6432 -U pgcat pgcat -w -c "\x" -c  "show stats"
Readpool Instance restarted 3 times within the test period and errors show that

Only the connection attempts being made at the time of restart or the current connections to the readpool resulted in user facing errors. All the new connections were routed to the healthy Instance which was primary Instance in this case.

The gap in “connections” metric for readpool when it was restarted.

From above stats and snapshot, we noticed that the read workload failover works well.

AlloyDB is very resilient when it comes to Primary Instance High availability as the failovers are transparent.

This feature can be helpful when you need to restart a readpool Instance.

If an Instance is permanently removed, its better to remove that from pool configuration as well.

Sharding

There is another very important feature which can help you scale your AlloyDB beyond maximum supported value of 64 TiB per cluster

Sharding allows to horizontally scale database workloads of all kinds, including writes. The data is evenly split into pieces and each piece is placed onto a different server. The query traffic is then equally split between the shards, as the application usage increases over time.

Since PgCat inspects every query, it’s able to extract the sharding key (typically a table column) from the query and route the query to the right shard.

Both read and write queries are supported, as long as the sharding key is specified. If that’s not the case, PgCat will execute queries against all shards in parallel, combine the results, and return all of them as part of the same request.

We are not testing this feature in this post.

Key Takeaways

In this blog post, we embarked on a journey to explore the powerful synergy between pgcat and Google Cloud’s AlloyDB. Through a hands-on experiment, we witnessed firsthand how pgcat’s intelligent connection management, load balancing, and failover mechanisms can elevate the performance, scalability, and resilience of your AlloyDB deployments.

We delved into pgcat’s configuration, demonstrating how to set up connection pools. We then put pgcat to the test, observing its ability to seamlessly distribute read queries across replicas, ensuring optimal resource utilization. Additionally, we explored its automatic failover capabilities, which safeguard your applications from unexpected disruptions.

All the features we discussed above work very well for Cloud SQL Postgresql as well. In case of Cloud SQL, you would use read replicas (same or different region) instead of using secondary and readpools.

Whether you’re seeking to optimize read workloads, enhance high availability, or scale your database infrastructure beyond its limits, pgcat proves to be an invaluable asset. Its seamless integration with AlloyDB empowers you to build a robust and performant database environment that can effortlessly handle the demands of your applications.

As you continue your cloud database journey, remember that pgcat is not just a connection pooler; it’s a strategic tool that can unlock the full potential of your AlloyDB investment. Embrace its capabilities, experiment with its features, and witness the transformative impact it can have on your database infrastructure.

--

--

Harinderjit Singh
Google Cloud - Community

Technical Solutions Developer (GCP). Writes about significant learnings and experiences at work.