GumGum Tech Blog
Published in

GumGum Tech Blog

Time Travel in Snowflake

Background

Purpose

Last month we (the Data Engineering team at GumGum) faced a production error in Snowflake that was quickly debugged and resolved by using Snowflake Time Travel. We were amazed by how clean and easy the reconciliation process was, so we wanted to provide a practical example for using this feature.

Warning: spoiler alert for your favorite time travel movies…

Snowflake Time Travel

Snowflake Time Travel is an exciting feature that allows you to query previous versions of data.

This is a low-cost, low-latency feature because of the unique way Snowflake structures table data. Any time you perform DML commands (INSERT, UPDATE, DELETE, etc.) on your data, the previous Micro-partitions are not deleted; the table drops pointers to those Micro-partitions and creates new Micro-partitions with active pointers. The previous Micro-partitions are still stored for the duration of your retention period. If you want to recover the table state during this retention period, you can simply call AT on the table, and the previous micro-partition/pointer representation will instantly reappear, no data reconstruction or reloading required.

Snowflake Micro-partition selection during Time Travel

The Plot

The Issue

Avengers: Endgame (2019)

The Data Engineering team received a familiar message — ̶”̶T̶h̶a̶n̶o̶s̶ ̶h̶a̶s̶ ̶s̶n̶a̶p̶p̶e̶d̶ ̶h̶a̶l̶f̶ ̶o̶f̶ ̶t̶h̶e̶ ̶u̶n̶i̶v̶e̶r̶s̶e̶ ̶o̶u̶t̶ ̶o̶f̶ ̶e̶x̶i̶s̶t̶e̶n̶c̶e̶”̶ “this dashboard is wrong but was fine on Friday.” The team started the standard procedure — tracing the final representation through to the source tables, asserting business expectations against the current representation, identifying outlier metrics, etc. We eventually identified the likely source problem — duplicates in one of the tables — but could not identify a root-cause for the duplicates.

The Change

The Terminator (1984)

We stepped through datetime segments until we identified ̶S̶a̶r̶a̶h̶ ̶C̶o̶n̶n̶o̶r̶ the timestamp where the problem did not exist ( ‘2021–10–16 13:11:00’). Now that we identified the period of change, we could use that to determine a root cause and solution.

SELECT unique_key_1, unique_key_2, unique_key_3, COUNT(*)
FROM my_table at(timestamp => ‘2021–10–16 13:11:00’::timestamp)
GROUP BY $1, $2, $3
HAVING COUNT(*) > 1

The Root Cause

Back to the Future (1985)

A quick check of our QUERY_HISTORY showed some workflows that improperly triggered 3 batch copies around this time. Now that we had the evidence of these queries and the observed duplicates, we could be confident that our ̶p̶a̶r̶e̶n̶t̶s̶ ̶w̶e̶r̶e̶ ̶m̶e̶a̶n̶t̶ ̶f̶o̶r̶ ̶e̶a̶c̶h̶ ̶o̶t̶h̶e̶r̶ problem was fixable by reverting these batch copies.

SELECT *
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE END_TIME <= ‘2021–10–16 13:30:00’::timestamp
AND END_TIME > ‘2021–10–16 13:00:00’
AND LOWER(QUERY_TEXT) LIKE ‘%my_table%’;

The Fix

Using time-travel again, we f̶r̶e̶e̶d̶ ̶B̶u̶c̶k̶b̶e̶a̶k̶ pulled the correct representation and used it to replace the duplicated representation. We avoided the complicated logic of trying to determine which records in the candidate key group were duplicates and which were originals. Instead we defaulted to the original representation prior to the unexpected copy statements.

Harry Potter and the Prisoner of Azkaban (2004)
-- Create temp 
CREATE TEMP TABLE revert_my_table
AS
SELECT *
FROM my_table at(timestamp => '2021-10-16 13:11:00'::timestamp)
WHERE DAY = '2021-10-11';
-- Delete data
DELETE FROM my_table
WHERE DAY = '2021-10-11';
-- Insert 10-16 records
INSERT INTO my_table
SELECT *
FROM revert_my_table;

Reflection

Costs Considerations

Usually when discussing time-travel, we have to consider the ethical considerations involving the butterfly effect, the logical traps of temporal paradoxes, and the spacetime boundaries of general relativity. Lucky for us, Snowflake time-travel mostly avoids these concerns. The only significant concern is storage cost.

Storage costs are usually cheap, and in all of our cases it will be cheaper to enable time-travel than paying the warehouse cost to unload file-based backups. However, there are some cases where time-travel may not be the right answer, especially if your tables refresh consistently.

Summary

I was surprised at how handy this feature was. This is an excellent production debugging and resolution tool that requires near-zero effort to set-up and use. We were able to quickly find and resolve our issue with minimal overhead. I am excited to see how time-travel helps us resolve future production issues quickly, easily, and confidently.

We’re always looking for new talent! View jobs.

Follow us: Facebook | Twitter | Linkedin | Instagram

--

--

--

We’re hiring! Check out https://gumgum.com/engineering

Recommended from Medium

Beating submissive heart

How to optimize your website for search engines

Python vs. JavaScript: Which Language You Should Learn and Why

How To Use ERP Software For Getting Most Out Of Your Investment?

Building a Sales Commission Application…

Symfonos:2

Deploying your portfolio website without being a sound developer using a free domain; a…

#100DaysofCode — Day[3] (4)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Brendan Frick

Brendan Frick

More from Medium

Data Security: JSON & Dynamic Data Masking (Part 2 of 2)

Data Governance using Azure Purview

Change Data Capture in Snowflake

Snowflake — Handling Tasks in a Layered Access Control Model & Managed Schema