Why You Should Always Store Your Redshift Date Time Data In UTC
Tricks and tips when working with timezones in your data warehouse
There is nearly nothing in this world more frustrating that working with timezones. To add insult to injury, the world decided that we should support not just multiple timezones but specific rules in certain regions about time of day, aka daylight savings time. Now, of course, this isn’t a standardized rule across the world, or even across the country, but it’s granularly specific by region of which areas do and do not support Daylight Savings Time.
If you’ve ever worked with timezones and day light savings within your data you’ll know it’s quite a pain to deal with, especially when it comes to handling multiple differences across multiple sources.
In our case we are ingesting a lot of data that is really time specifically important, especially when it comes to cohort analysis and reconciliation work. Having the right data show up for the day you query is critically important.
I’ve learned over the past couple of months a few tricks to handling timezones when it comes to a standard query to your redshift cluster, or even supporting DST in your BI tool, like tableau.
For any developer or data engineer with far more experience than I this is an obvious thing, but if you are just getting started it might seem like a lot of extra work to handle 1 or 2 or even 3 timezone conversions for your data sources. I also have to admit, though I recognize that this is the best way to handle timezone support we do not always follow it perfectly, but are making some changes and optimizations moving forward.
If you are putting data into your data warehouse, and that data contains a date time field you should store it in UTC, which is the standard across all servers and computers. Once it’s stored in UTC you can convert it on an ETL to another table or within whatever application you are pulling that data into.
Oftentimes, just as important is store with your date time field in a separate column is the timezone conversion column. Then you can dynamically populate that field with the appropriate timezone to convert the date time to without having to store it somewhere else.
It might look something like this:
Then it’s super easy to write queries or calls to make the transition to display your data in the correct timezone. It’s a bit more work upfront but saves time in the long run.
Regardless, if you know all your date time fields are stored in UTC you’ll need to know a couple queries and functions to be able to easily convert them to the right timezone in whatever application you may be using the data.
Let’s start with redshift. This is pretty straightforward but I learned it recently and it’s been super helpful in writing quick queries to return the correct data.
The first is:
cast(datetime as date)
When writing a query to redshift this will simply take your date time field and make it a date field, so if you are like me and want to look at a specific days worth of data and don’t want to write out
where datetime > = ‘2018–01–15 04:44:54’
Then you can just do this instead
where cast(datetime as date) > = ‘2018–01–15’
It’s super simple and helpful, casting of strings is all built in natively and is easy to look up and use in your queries.
To make it more useful though and actually deal with timezones you might consider using this time_zone function AWS supports. All the documentation can be found here: https://docs.aws.amazon.com/redshift/latest/dg/CONVERT_TIMEZONE.html
Their “convert_timezone” function supports a bunch of different options of how you can convert and “cast” your query to return data back in the right format. They support natively DST and all timezones, this query below would return datetime in the timezone of Denver, which would be daylight savings.
Additionally, if you want to cast the conversion to be a standard date instead of date time you can layer those together and that works nicely.
cast(convert_timezone(‘America/Denver’, datetime) as date)
So to tie that all together, here’s an example of full query I was using today:
select * from schema.table where certain_id = ‘829’ and cast(convert_timezone(‘America/Denver’, datetime) as date) = ‘2018–01–13’ and revenue > 0 and status = ‘approved’;
It’s super nice to find out that they already natively support daylight savings and date functionality without having to write external scripts to handle things, great job amazon.
If you are using Tableau or something similar and need to do some date time conversion this is what’s worked best for me…
Step 1, make a “TZAdjusted_Date” field
//Adjust UTC DB time to MST
Step 2, make a new datetime calculation
DATETIME(//Adjust Date to Daylight Savings Time if the Date falls within that period. DST starts on the 2nd Sunday of March
// and ends on the the 1st Sunday in Nov.
IF [TZAdjusted_Date] >= DATEADD(‘hour’, 2, (IF DATEPART(‘weekday’, DATEADD(‘month’, 2, DATETRUNC(‘year’, [TZAdjusted_Date]))) = 1
// check to see if March started on a Sunday by changing the date to Jan. 1 and adding 2 months
THEN DATEADD(‘month’, 2, DATETRUNC(‘year’, [TZAdjusted_Date])) + 7
//If yes, then change the date to March 8 by adding 7
ELSE DATETRUNC(‘week’, DATEADD(‘month’, 2, DATETRUNC(‘year’, [TZAdjusted_Date])) + 13)
//Otherwise change the day to 13 days after the beginning of the week that March 1 is in
END)) //set the date to the 2nd Sunday in March and then add 2 hours to the date to make it 2am on the 1st day of DST
//Same as above except check for 2am on the 1st Sunday in November
[TZAdjusted_Date] <= DATEADD(‘hour’, 2, (IF DATEPART(‘weekday’, DATEADD(‘month’, 10, DATETRUNC(‘year’, [TZAdjusted_Date]))) = 1
THEN DATEADD(‘month’, 10, DATETRUNC(‘year’, [TZAdjusted_Date])) + 7
ELSE DATETRUNC(‘week’, DATEADD(‘month’, 10, DATETRUNC(‘year’, [TZAdjusted_Date])) + 6)
THEN DATEADD(‘hour’, 1, [TZAdjusted_Date]) //Date >= (2am on 2nd Sun of March so Spring forward 1 hour
ELSE [TZAdjusted_Date] //Date <= (2am on 1st Sun of Nov so Fall back 1 hour (meaning don’t adjust)
Yikes right? Yeah if you think I’m smart for writing that, don’t. I didn’t, I just found someone online way smarter than I and I can’t find where that article was to give them credit. (It’s a tableau community forum, and a life saver)
Moral of this whole story, UTC time is great and is by far the best way to store your data in your data warehouse. It requires a little bit of modification when you pull the data out but it is much simpler to standardize across your data systems. Hopefully this was a little bit helpful to make dealing with dates and times a bit easier.