Easy data warehousing from Heroku Postgres to Redshift

Zeneixe
Carwow Product, Design & Engineering
2 min readJan 11, 2016

We are a data-driven startup. Since the beginning of carwow we’ve made business decisions based on numbers, which were mostly extracted by querying our Postgres database.

Heroku provides one magnificent tool called Dataclips, which we used and abused to quickly share our data extractions, mostly on Google Spreadsheets. To avoid any performance hits on our production applications we always ran our queries on Follower Databases.

As the company grew, we built our business intelligence team which needed to run increasingly heavy queries, and unfortunately we quickly hit the maximum capacity of the more affordable Heroku database plans. Also, Dataclips doesn’t like long running queries.

After some research, we decided to try Redshift. It’s basically a fork of Postgres hosted by Amazon optimised for Business Intelligence and it is extremely cheap and fast.

Using Redshift

The first step for us was to copy all the data we have got on our databases to Redshift. To do this we created a very simple application that runs every night and copies all of our data from Postgres to Redshift. We then forked our own version of the original postgres_to_redshift gem with a few fixes.

We also created our own “Dataclips” service that allows us to connect to any data source supported by ActiveRecord and provides the same reporting features — we’ll probably release this as an open-source Rails engine in the near future.

What’s the benefit?

If you run complex queries with lots of joins or subqueries, and you struggle with performance, I absolutely recommend this solution. We had queries that were taking more than an hour to run on Postgres and were running in under two minutes using Redshift. Bear in mind that some of your queries will require rewriting, especially if they rely on particular features such as series generation or complex subqueries.

Update

If you want to know how we did it, check out this article: https://medium.com/carwow-product-engineering/scheduled-postgres-to-redshift-copy-with-a-simple-heroku-app-c7d9a614e738

Interested in making an Impact? Join the carwow-team!
Feeling social? Connect with us on Twitter and LinkedIn :-)

--

--