PostgreSQL: Data is Important

A case for PostgreSQL in a modern web architecture.

Written by Mike Lewis.

Heads up, we’ve moved! If you’d like to continue keeping up with the latest technical content from Square please visit us at our new home https://developer.squareup.com/blog

At our engineering all-hands meeting, I presented a 5-minute lightning talk on the merits of PostgreSQL. In the spirit of being able to search for data, I converted the deck into the following blog post. Enjoy!


PostgreSQL: Data is Important

But Why Postgres?

  • Solves some problems with migrations
  • Full-featured indexes
  • Robust constraints
  • Streaming replication
  • Sophisticated query planner
  • PostGIS
  • Full text search
  • Advanced SQL features

Migration Improvements

No downtime for most DDL migrations. Instantly performs:

Transactional DDL statements (not even Oracle has this)

  • wrap migrations in a SQL transaction
  • roll back everything in migration on a partial failure

Indexes


Constraints

  • CHECK constraints
  • foreign key constraints available for all tables
  • exclusion constraints (new in 9.0)

Benefits of validation in the DB

  • reduces dependencies on monolithic app
  • improves consistency guarantees
  • deeper level of safeguards
  • don’t have to worry about prefetching associations
  • just as easy to test

Streaming Replication


Sophisticated Query Planner

  • cost-based (as opposed rule-based) query optimizer
  • heuristics gathered on tables to optimize better
  • costs of random seek, seq scan, etc. configurable
  • e.g making random seeks on an SSD not a bad thing to the optimizer
  • genetic algorithms to plan complex queries.

Better EXPLAIN

MySQL sample

+----+-------------+---------------------+-------+--------------------------------------+--------------------------------------+---------+------+------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------------+-------+--------------------------------------+--------------------------------------+---------+------+------+------------------------------+
| 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
| 2 | DERIVED | bacon_cheeseburgers | range | index_bacon_cheeseburgers_on_user_id | index_bacon_cheeseburgers_on_user_id | 5 | NULL | 1 | Using where |
+----+-------------+---------------------+-------+--------------------------------------+--------------------------------------+---------+------+------+------------------------------+

Postgres Sample

QUERY PLAN                                                              
-----------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=8.31..8.32 rows=1 width=0)
-> GroupAggregate (cost=0.00..8.30 rows=1 width=12)
Filter: ((min(bacon_cheeseburgers.created_at) + '7 days'::interval) < max(bacon_cheeseburgers.created_at))
-> Index Scan using index_bacon_cheeseburgers_on_user_id on bacon_cheeseburgers (cost=0.00..8.27 rows=1 width=12)
Index Cond: ((user_id >= 69) AND (user_id <= 70))

Also… if you have a subquery, postgres won’t execute it


PostGIS (for geo queries)

  • extensible GIS system
  • R tree indexes for built-in geometric data types
  • geometric data types
  • free census data (TIGER) for geocoding (addresses→coordinates)
  • Rails integration (example from railsonpostgresql.com)
  • Restaurant.first(:conditions => [“the_geom && ?”, Polygon.from_coordinates([[[x_min, y_min],
    [x_min, y_max], …]]],
    4269)])

Full Text Search

  • full-featured text indexing
  • dictionaries (languages, stopwords, synonyms)
  • customizable search ranking algorithms
  • mature
  • lighter weight solution than solr/lucene

Advanced SQL Features

SELECT depname, empno, salary, rank() OVER (PARTITION BY depname ORDER BY salary DESC) 
FROM empsalary;

depname | empno | salary | rank
-----------+-------+--------+------
develop | 8 | 6000 | 1
develop | 11 | 5200 | 2
develop | 9 | 4500 | 3
personnel | 2 | 3900 | 1
personnel | 5 | 3500 | 2
sales | 1 | 5000 | 1
sales | 4 | 4800 | 2