Extract. Transform. Read.
This Month Is The Worst For Data Engineering
Understand the physical constraints, like time, that impact data pipelines during certain times of the calendar year.
The following short read is an excerpt from my weekly newsletter, Extract. Transform. Read. sent to 2,000+ aspiring data professionals. If you enjoy this snippet, you can sign up and receive your free project ideation guide.
From 2014–2017 I lived in Phoenix, Arizona and enjoyed the state’s best resident privilege: No daylight saving time. If you’re unaware (and if you’re in the other 49 US states, you’re really unaware), March 9th was daylight saving, when we spring forward an hour.
If you think this messes up your microwave and oven clocks, just wait until you check on your data pipelines. Even though data teams are very aware of DST, this isn’t always something we account for when building and scaling pipelines.
To build DST-resistant pipelines, you need to set your schedule parameters to daylight time vs standard time. And even if you think your builds are properly calibrated before breaking for the weekend, I’d still remind a team it’s DST and, if possible, designate an on-call position to respond to issues that shouldn’t wait until the next weekday.
In addition to DST, a less frequent problem is creating schedules and variables that account for Leap Year. While you could be like one engineer I know and tell yourself it’s a “future me” problem, I’d recommend creating logic to check for instances of that extra February day in a given year. You can also use the datetime package’s .day method to output the correct day.
Much more common than either DST or Leap Year is what I call the “31 problem.” This is when you want to isolate date attributes but are lagging a day behind because of the few months that have 31 days.
For instance, say you need to create a file string that is supposed to say “March 31” but the datetime module hasn’t accounted for an extra day in March so your output becomes “April 31”, a date that doesn’t exist.
To learn how to solve this problem and for more in-depth analyses of date issues (including code snippets), I encourage you to read “Why Your Data Pipelines Will Fail On These 10 Days Every Year (And What To Do About It)”.
Happy DST and thanks for ingesting,
-Zach Quinn