Weighing the Pros and Cons of PostgreSQL

Observations after working with PostgreSQL for 35+ years and as one of the original Postgres developers

Curt Kolovson
6 min readDec 6, 2021

Curt Kolovson

A Brief History of Postgres and PostgreSQL

As one of the original developers of what is now known as “University Postgres” (I was in the right place at the right time as a PhD grad student), I have great respect and fondness for what the Postgres Project at UC Berkeley set out to accomplish in the mid- to late-1980s, under the brilliant leadership and guidance of Professor Michael R. Stonebraker. For those of you who don’t know of Mike Stonebraker, among his many distinctions is that he is a winner of the prestigious ACM Turing Award in 2014, which is often referred to as the “Nobel Prize of Computing”. Mike is also well-known as the inventor of Ingres, as well as having launched several successful startup companies, such as Ingres Corp, Illustra (acquired by Informix), Vertica (acquired by Hewlett-Packard), and VoltDB — just to name a few.

The original Postgres Project set out to advance the state of the art in database management systems in many ways. Among the chief innovations in Postgres were full extensibility through user-defined data types, operators, and access methods, a sophisticated rules system, a no-overwrite storage system which enabled historical (temporal) data queries and so-called “time-travel”, triggers, functions, and stored procedures — among other innovations. The project continued through the early 1990s. What was known as “University Postgres” was released for general use under the BSD License.

In the mid-1990s, Postgres became an open-source project and was renamed PostgreSQL to reflect its support for SQL. The project continues to be licensed under the free and open-source PostgreSQL License, and is now very widely used. Over the years, many significant enhancements have been made to the software. As of this writing, PostgreSQL version 14.1 is currently available. Each major release has added or enhanced major pieces of functionality. It remains one of the most widely used and successful open-source projects of all time. PostgreSQL is used in many commercial products, and there are several cloud-based database services that are either based on a PostgreSQL implementation, or are compatible with PostgreSQL.

Assessing the Evolution of PostgreSQL

Given it’s over 30 years of development, PostgreSQL is now a rather powerful and full-featured database system. PostgreSQL’s evolution is quite remarkable, given that its new features and enhancements have come from the open-source community. That community deserves a great deal of praise and respect.

However, as PostgreSQL has developed, it reflects the fact that it is the result of an open-source technology project, as opposed to a commercial product. My main issue with what PostgreSQL has become is that it is difficult to configure and maintain in terms of providing high availability (HA) and optimal performance. This observation comes from years of experience involved in the implementation and maintenance of applications that use PostgreSQL as the database. Though many, including myself, have found ways to use PostgreSQL in mission-critical environments, the required skills and level of ingenuity are quite high. It takes a rather high degree of knowledge and sophistication on behalf of DBAs, application architects, and application developers. Adding to this complexity is the need for adding extensions and components to PostgreSQL for things that are not included in the base distribution. This “toolkit” approach is both a strength and a weakness. The strength is that there are many available extensions, some of which have liberal open-source licenses while others are commercial products. The obvious weakness is that as a result of having “too many moving parts”, many users find that using PostgreSQL in combination with some of these extensions requires additional expertise in order to set up and maintain, and each of these components has its own release cycle and need for upgrades, etc.

As a result of all this inherent complexity, PostgreSQL is generally not well-suited for use in embedded applications or so-called “appliances”, whether they are based on virtual machines (VMs) or containers running on platforms like Kubernetes. And yet, due to its popularity and widespread use (largely a result of its liberal open-source license), PostgreSQL is often being used even in these environments. Therein lies its Achilles Heel, in my opinion.

Specifically…

Here is a short list of PostgreSQL components that must be provided by extensions or third-party products:

  • Monitoring tools for uptime and performance
  • Scale-out distributed processing based on partitioning or sharding
  • Connection pooling
  • HA clustering
  • Replication management
  • Disaster recovery (DR) solutions
  • Backup solutions
  • Log archiving

In addition to these “missing” pieces of functionality, PostgreSQL has some inherent shortcomings that have still not been addressed, in spite of its 30+ years of development. Those include:

  • Software architecture based on process-per-connection can lead to inefficient resource utilization
  • Too many tunable parameters, many which are poorly documented, and some with poorly chosen default values
  • Garbage-collection implementation related to its multi-version concurrency control (MVCC) can lead to database bloat and performance problems
  • Query optimizer (“planner”) can make very bad decisions, leading to performance problems that may be hard to diagnose and fix
  • Troubleshooting is sometimes very difficult

PostgreSQL in Name Only

There are many implementations of database systems that are either based on a fork of PostgreSQL, or which claim to be compatible with the PostgreSQL ecosystem of drivers. This is particularly true of the many PostgreSQL-compatible fully-managed DBaaS services. This is not surprising, due to the phenomenal popularity of PostgreSQL and its well-known shortcomings. Many of the PostgreSQL-compatible DBaaS services do not allow users to modify the full set of PostgreSQL tunable parameters such as those contained in postgresql.conf or client authentication based on pg_hba.conf, presumably in order to ensure stability and/or conform to cloud service providers’ client authentication frameworks. Also, some cloud service providers replace the underlying storage engine of open-source PostgreSQL with their own storage-as-a-service offerings. They may also restrict or prevent the use of third-party extensions that one can use with the DBaaS offerings. The ability to tune resources available to the database, such as by making modifications to the underlying Linux platform on which the database is running, may be partially or fully restricted. Finally, because of this proliferation of PostgreSQL forks (whether cloud-based or on-premise), claims of being PostgreSQL-compatible should be taken with a large grain of salt, as there is often a gap between what functionality is actually supported, and a lag between the current version of a given PostgreSQL-compatible product and the latest PostgreSQL open-source version.

For example:

  • AWS Aurora and YugabyteDB claim to be fully compatible with PostgreSQL, their implementation uses the upper levels of PostgreSQL source code, and they have replaced the lower levels (storage and transaction layers). Their focus is on scalability and availability .In some cases, they have modified the upper level PostgreSQL code to work better with their lower level code. That means that whenever there is a new release of PostgreSQL (major or minor version), they have to incorporate the new PostgreSQL source code into their code, without breaking anything.
  • CockroachDB claims to be compatible with PostgreSQL, but they are re-implementing it entirely, so full support is a work-in-progress. Their focus is on supporting geographically distributed data.
  • AWS RDS claims to be compatible with PostgreSQL, but has some restrictions in terms of not exposing all PostgreSQL tunable parameters and not being able to use certain extensions. Also, tweaks to the underlying OS are limited or not possible. From a DBA perspective, the tuning “knobs” are either missing or different. Their focus is on providing a managed PostgreSQL cloud service.

What is interesting about all these database systems is that they are all trying to maintain some degree of PostgreSQL compatibility while adding value through new capabilities. It remains to be seen if these “better-than-PostgreSQL” variants can support demanding, real-world applications — or else have difficulty being as good as PostgreSQL from a performance and stability perspective.

Summary

PostgreSQL is a novel and powerful database system. It has undergone over 30 years of development by a community of devoted and talented software engineers. Due to its liberal open-source license, it has become widely used and has a large, dedicated following.

That said, I have a love-hate relationship with what PostgreSQL has become. Its evolution reminds me of the “Unix Wars” of the late 1980s and early 1990s. There are many flavors and variants of PostgreSQL, as well as a plethora of DBaaS services that claim to be PostgreSQL-compatible. It generally “takes a village” of PostgreSQL experts to set it up and maintain, particularly for large, complex, real-world applications. Many people are using PostgreSQL applications quite successfully, and generally these were built by highly knowledgeable application developers, or else were built and supported by consultants who understand how to implement applications as well as how to deploy and maintain PostgreSQL.

--

--

Curt Kolovson

Part of the original open source PostgreSQL database team under Professor Stonebraker at UC Berkeley. 40+ yrs with DBMSs. Currently Sr Staff Engineer at VMware.