One of the trickiest parts of data engineering is working with timestamps, timezones, and Daylight Saving Time. Too often I’ve seen IoT projects, in particular, stumble on this core challenge.
While UTC time can help keep things clean, it is not enough on its own.
If you’re considering storing timestamp data in Snowflake, you’ll want to keep reading to save yourself from the heartache of a misused
DATEADD or an automatic UTC conversion thanks to your current session timezone.
What is a Timestamp
For the purposes of this discussion, a timestamp is a temporal value representing a single point in time, possibly in a given timezone.
January 1st, 2019 would not be a timestamp, but
12AM on January 1st, 2019 would be.
Because there are so many different ways to write a date and time (
12AM 1/1/19 vs
00:00 2019/01/01), timestamps are often represented in a standard form. In Snowflake, the default output format is:
YYYY-MM-DD HH24:MI:SS.FF3 TZHTZM
which looks like this for
March 10th, 2019 in US Central Time:
2019–03–10 00:00:00.000 -0600
This isn’t strictly standard ISO 8601, but I think it looks a bit more readable to humans (I assume that’s why Snowflake uses it as a default). This can be changed by altering your Snowflake account parameters if you’re a true purist.
On the ingestion side, Snowflake can take any suitable timestamp string (single quotes) wherever a timestamp is needed and will try to parse it. If it has trouble, you can also give it a hint by providing input formatting preferences.
Timestamp Datatypes in Snowflake
There are 3 different timestamp types in Snowflake:
TIMESTAMP_NTZis the datatype for timestamps without a timezone (ntz = no time zone). This is also referred to as “walltime” as it is the time you would get by looking at a random clock on the wall and writing it down.
TIMESTAMP_TZis the datatype for timestamps with timezones (tz = time zone). Annoyingly, Snowflake does not store the timezone associated with this timestamp. Instead, it stores the UTC offset associated with the timezone associated with this timestamp. This may sound pedantic, but it has consequences we’ll explore later.
TIMESTAMP_LTZis the dangerous datatype for timestamps with timezones in the current session timezone used by Snowflake (ltz = local time zone). This is a nice idea because theoretically, every consumer would automatically work with timestamps in their local timezone (assuming they’re setting this parameter, which they probably aren’t). Any SQL code interacting with
TIMESTAMP_LTZdatatypes will be non-deterministic, so buckle up tight when using this bad boy.
Which Datatype Should I Use?
In all, your safest bet is usually to go with
TIMESTAMP_TZ for storing your data. I suggest you go one step further and store the timezone information in an adjacent column (
America/Chicago for example) if you’re working with any time data that will need to be careful with DST (Daylight Saving Time) transitions.
If you’re really trying to squeeze out wasteful storage, you could also store your timestamps as
TIMESTAMP_NTZ values if you convert to UTC first and don’t really care about DST effects. These types are about 6 bytes smaller per row than their ‘timezoned’ siblings. If you do this, however, make sure you’ll never need that timezone.
My personal preference is
TIMESTAMP_NTZ values with timezone data easily accessible in either the same table or in a dimensions table associated with the event source identifier. This works for me because I rarely worry about converting timestamps back and forth between timezones. I don’t often care about DST effects, but I’m too paranoid to commit to losing the timezone data.
I wish there was an easy, go-to piece of advice that fully covers how to store timestamps, but there are just too many different usage patterns for one approach to address everything.
Unless Snowflake decides to start storing timezone information along with timestamps instead of just storing UTC offsets, then I’d say use that all the time. But until then, don’t panic!
Randy On Snowflake
Below are a few of my other recent Snowflake stories as well:
5 Things I Wish I Knew Before Learning Snowflake
Hashmap has been focusing on delivering the simplest, most cost-efficient data solutions for the better part of a…
Artisanal Data Pipelines | Best Practices for Snowflake External Staging in AWS and Azure
Ingestion is the lifeblood of your Snowflake Cloud Data Warehouse. Make sure that you’re spending less and doing more…
30 Second Snowflake Cloud Data Warehouse Cheat Sheet
At Hashmap, we’re constantly looking for ways to provide our clients with a simpler and quicker path to business value…
Feel free to share on other channels and be sure and keep up with all new content from Hashmap here.
Randy Pitcher is a Cloud and Data Engineer (and OKC-based Regional Technical Expert) with Hashmap providing Data, Cloud, IoT, and AI/ML solutions and consulting expertise across industries with a group of innovative technologists and domain experts accelerating high-value business outcomes for our customers.
Be sure and connect with Randy on LinkedIn and reach out for more perspectives and insight into accelerating your data-driven business outcomes or to schedule a hands-on workshop to help you go from Zero to Snowflake.