Execute A Multi-CSV Backfill From Google Cloud Storage In 50 Seconds

Leverage the Python Google Cloud Storage and BigQuery APIs to bulk download, transform and upload CSV files in < 1 minute.

Zach Quinn
Pipeline: Your Data Engineering Resource

--

Currently job searching? Give yourself an edge by developing a personal project using my free 5-page project ideation guide.

My facial muscles still ache from the 3 years I plastered on a fake smile in order to work in hospitality. While I have fond memories of that period (which mostly consist of friends and me doing anything but the job), it’s a role I largely dreaded. Truthfully, I’m not built to interact with the general public in long increments. Needless to say, I’ve found my data engineering job to be a better fit for my analytical mind and increasingly hermetic personality.

This doesn’t mean, however, that there aren’t days and tasks I’d rather not endure. For my first year as a baby (junior) engineer I got assigned (read: stuck with) a lot of grunt work: Writing documentation, auditing table metadata by hand (until I eventually automated the task) and, worst of all, conducting time-consuming and mind-numbing backfills.

I don’t intend to imply that backfilling missing data is a grunt task. In fact, it’s critical to continually maintain both real-time (or near real-time) data and historic data to support time series analyses.

--

--

No responses yet