Postgres is Incredibly Awesome

Mark Mathis
Google Cloud - Community
4 min readApr 14, 2017

I suspect that a large percentage of users come to Postgres for PostGIS. You get spatial data and operations wrapped up in a familiar SQL syntax. This enables a developer familiar with relational databases to quickly begin dealing with spatial data. PostGIS plays well with other popular open source tools such as GDAL (OGR), GeoServer, and QGIS as well as commercial offerings from ESRI and Safe Software (FME); to name a few. Add on the fact that the Django ORM can handle spatial data and operations and you are enabling developers to do some serious geospatial processing.

That being said, there are other features of Postgres that should not be overlooked. In particular: crosstabs, array columns, and builtin JSON support. While Postgres is first and foremost a relational database; all of these features, in their own way, allow the designer to de-normalize the database to some degree.

Crosstabs

The tablefunc module includes various functions that return tables. In particular, the crosstab function:

Produces a “pivot table” containing row names plus N value columns, where N is determined by the row type specified in the calling query.

If this doesn’t make much sense, don’t worry… it’s not very clear what this means from this simple explanation. Rather than expanding on this here, see this blog post for a practical explanation and an excellent helper function.

Arrays

PostgreSQL allows columns of a table to be defined as variable-length multidimensional arrays. Arrays of any built-in or user-defined base type, enum type, or composite type can be created.

This feature allows the designer to create a column that can store a variable amount of data; something that might normally be handled with a many-to-many relationship defined through additional tables, for example. The array column can be indexed, so queries to find overlapping rows can be quite fast. For example, counting all “employees” on the “test” project might look something like this:

SELECT COUNT(*) FROM employees WHERE projects && ‘{test}’;

instead of this:

SELECT COUNT(*)
FROM
employees,
employees_projects,
projects
WHERE
employees.id = employees_projects.employee_id AND
employees_projects.project_id = projects.id AND
projects.name = 'test';

JSON

JSON data types are for storing JSON (JavaScript Object Notation) data, as specified in RFC 7159. Such data can also be stored as text, but the JSON data types have the advantage of enforcing that each stored value is valid according to the JSON rules. There are also assorted JSON-specific functions and operators available for data stored in these data types; see Section 9.15.

JSON columns can also be GIN indexed, which makes “existence and containment” queries quite efficient. One practical application of this is the ability to store “similar” data in the same table. That is, common fields can be extracted into regular columns (with indexes, etc.) with the remainder stored in a catch-all JSON column. In the extreme case, you could make a table with a single column containing a JSON document for each row… like a poor man’s MongoDB?

One of the drawbacks to Postgres (or any relational database) is that a centralized database can quickly become a bottleneck in a distributed system. The standard approach to solve this is to set up replication. This gives you both read-only replica(s) that can be used to service incoming requests and failover replacement in case the master fails or becomes corrupted. Here is a nice tutorial on how to set this up on Google Cloud:

So, I did that and it was pretty awesome… until the standby started to fall behind. This wasn’t a production system, so “no harm, no foul”. In the end I opted not to expend resources trying to get it to work. Besides getting it configured correctly, we would have needed to insert a connection pooler/load balancer to distribute reads across replicas, develop a failover and restore process, etc. We had the luxury of safely making this decision at this stage of our development, but that is not always the case. This excellent and thoughtful post-mortem from a GitLab outage earlier this year does a great job of illustrating the potential pitfalls of Postgres replication.

One of the big challenges of running a startup is as much deciding what to do as what not to do. This is true at a high level; that is, “What problem are we solving? Who are we solving it for? And how do we measure success?” Focus is important. This is true both for our mission and for our technology choices. We have to find a path from science to product without reinventing the wheel. This is one reason we were pleased to provide a case study for Google Cloud Platform’s launch of Cloud SQL for Postgres.

Cloud SQL for PostgreSQL is a fully managed database solution with flexible backups and automatic storage increases included standard; high availability features “coming soon”. Awesome! Finally, In case you missed it, here’s Google PM Brett Hesterberg and my fellow Descartes Labs co-founder Tim Kelton speaking about Cloud SQL and our latest offering “GeoVisual Search” at Google Cloud Next ‘17:

--

--