PostgreSQL: Data is Important

A case for PostgreSQL in a modern web architecture.

Written by Mike Lewis.

At our engineering all-hands meeting, I presented a 5-minute lightning talk on the merits of PostgreSQL. In the spirit of being able to search for data, I converted the deck into the following blog post. Enjoy!


PostgreSQL: Data is Important

But Why Postgres?

  • Solves some problems with migrations
  • Full-featured indexes
  • Robust constraints
  • Streaming replication
  • Sophisticated query planner
  • PostGIS
  • Full text search
  • Advanced SQL features

Migration Improvements

No downtime for most DDL migrations. Instantly performs:

Transactional DDL statements (not even Oracle has this)

  • wrap migrations in a SQL transaction
  • roll back everything in migration on a partial failure

Indexes


Constraints

  • CHECK constraints
  • foreign key constraints available for all tables
  • exclusion constraints (new in 9.0)

Benefits of validation in the DB

  • reduces dependencies on monolithic app
  • improves consistency guarantees
  • deeper level of safeguards
  • don’t have to worry about prefetching associations
  • just as easy to test

Streaming Replication


Sophisticated Query Planner

  • cost-based (as opposed rule-based) query optimizer
  • heuristics gathered on tables to optimize better
  • costs of random seek, seq scan, etc. configurable
  • e.g making random seeks on an SSD not a bad thing to the optimizer
  • genetic algorithms to plan complex queries.

Better EXPLAIN

MySQL sample

+----+-------------+---------------------+-------+--------------------------------------+--------------------------------------+---------+------+------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------------+-------+--------------------------------------+--------------------------------------+---------+------+------+------------------------------+
| 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
| 2 | DERIVED | bacon_cheeseburgers | range | index_bacon_cheeseburgers_on_user_id | index_bacon_cheeseburgers_on_user_id | 5 | NULL | 1 | Using where |
+----+-------------+---------------------+-------+--------------------------------------+--------------------------------------+---------+------+------+------------------------------+

Postgres Sample

QUERY PLAN                                                              
-----------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=8.31..8.32 rows=1 width=0)
-> GroupAggregate (cost=0.00..8.30 rows=1 width=12)
Filter: ((min(bacon_cheeseburgers.created_at) + '7 days'::interval) < max(bacon_cheeseburgers.created_at))
-> Index Scan using index_bacon_cheeseburgers_on_user_id on bacon_cheeseburgers (cost=0.00..8.27 rows=1 width=12)
Index Cond: ((user_id >= 69) AND (user_id <= 70))

Also… if you have a subquery, postgres won’t execute it


PostGIS (for geo queries)

  • extensible GIS system
  • R tree indexes for built-in geometric data types
  • geometric data types
  • free census data (TIGER) for geocoding (addresses→coordinates)
  • Rails integration (example from railsonpostgresql.com)
  • Restaurant.first(:conditions => [“the_geom && ?”, Polygon.from_coordinates([[[x_min, y_min],
    [x_min, y_max], …]]],
    4269)])

Full Text Search

  • full-featured text indexing
  • dictionaries (languages, stopwords, synonyms)
  • customizable search ranking algorithms
  • mature
  • lighter weight solution than solr/lucene

Advanced SQL Features

SELECT depname, empno, salary, rank() OVER (PARTITION BY depname ORDER BY salary DESC) 
FROM empsalary;

depname | empno | salary | rank
-----------+-------+--------+------
develop | 8 | 6000 | 1
develop | 11 | 5200 | 2
develop | 9 | 4500 | 3
personnel | 2 | 3900 | 1
personnel | 5 | 3500 | 2
sales | 1 | 5000 | 1
sales | 4 | 4800 | 2