Perfect table migrations in redshift and hive

Always do your migrations like this

Marcin Tustin
Build and Learn
1 min readDec 16, 2017

--

Here’s a neat little trick due to David Heller at Alooma — perform any migration in redshift (or other columnar store) without any need for downtime, table locking or any other unpleasantness.

What makes this perfect is that there is no loss of production availability to read data OR to write events to the right table.

Even in a non-transactional environment, if any current events get sent to the old table, the final copy copies them over. (Note that the old table effectively acts as a buffer, so from the period of readers of the new table, events sent to the old table during this procedure will have the extra lag of the final copy).

There is also no risk of data loss, because operating by renaming means you can always recover your prior state.

The only risk of data loss is if your event streaming solution doesn’t buffer events before they are finally committed to the database store. In that case, cutting over to the new table presents a risk if the schemas don’t match; that said, if you can’t buffer events somewhere, you’ll always lose events if you have to change your table schema.

If you model all your table migrations on this one, you should enjoy many years of tear-free operations.

--

--

Marcin Tustin
Build and Learn

Data Engineering and Lean DevOps consultant — email marcin.tustin@gmail.com if you’re thinking about building data systems