A lot has changed in the data world in 5 years — my journey from rags to beautiful data pipelines

Gabriel Freeman
5 min readJun 2, 2022

--

A humble beginning

The year is 2017, and I am in my first full-time job out of college — on the marketing sciences team at a large media agency. Our team was responsible for providing reporting to the clients so they could monitor the effectiveness of the advertising dollars they were spending. Within our specialized 4-person team, I was the most junior, so I was bestowed the honor of doing the most tedious of the workflow: updating the data each day. The process was not particularly hard (downloading some 15+ files from various advertising platforms and copy/pasting the data into our database), but it was time consuming plus the manual process increased the likelihood of data errors. So why didn’t we automate this part of the workflow through API calls to the various data sources? At the end of the day, the media agency had two options: hire a data engineer or hire me. And the cold hard truth is that, at the time, you could have hired 5 of me for the price of one data engineer. Anyways, the point is that the data operations at my first job were inefficient, and the result was far too little time spent analyzing data and the majority of time spent on updating files, wrangling data, and building reports.

The work at my first job out of college was repetitive and somewhat mind-numbering

Learning the ropes

The year is now 2019, and I am hired as the business intelligence manager at a small but growing D2C (direct to consumer) startup. As the the lone soldier data team (and first data hire), I am tasked with doing the same thing I did at the agency but with less resources. My prompt was: create a set of reports that the team can see each day to determine if our marketing dollars are being well spent. This sounded easy enough in my brain, but in reality, replicating the agency process at the startup failed me quickly. You see, the stakeholders at the startup actually looked at the reports this time. And now, the stakeholders (who also happen to be the executives of the company) asked me to analyze the data. Without the help of my other 3 comrades, there was not enough time in the day for me to do everything (data update, report building, + providing insights). What followed the next few months was a journey of exploration into automation tools with the goal of providing timely and insightful reporting to a growing number of stakeholders with increasingly challenging requests. The culmination of this journey, which was far from linear, resulted in a robust data system that was automated, reliable, and insightful (+ provided the foundation for our startup to scale to over 60 million in annual sales for 2020). And the cherry on top — the tools I was using were relatively cheap compared to the tools I was using at the agency. The moment I realized I was on to something was the first morning of my first vacation day I took some 6 months into the job: the data was updating by itself and no one bothered me while I lay peacefully on a beach.

From rags to beautiful data pipelines

The year is 2022, and now I am helping businesses automate their reporting in as little as 8 weeks. Not to mention, I am also providing detailed insights and analysis to help them drive decision making and planning. Quite the transformation from where I started at the agency! So what happened to the data industry in the last 5 years that made this all possible? The answer lies within 4 tools + a little SQL magic. Before I dig in, let me quickly say that there are lots of data tools now! This is one of the reasons analysts can be much more efficient compared to 5 years ago. The following 4 tools are the ones I have converged on because they have proved reliable + cost effective for businesses.

The data stack

The first tool in the data stack is called Stitch, a software tool that automatically moves data from its original source to a data warehouse daily. This no-code tool (really, it requires no coding skills), costs $100 per month for the enterprise plan (costs a bit more for companies with bigger datasets).

Google Big Query is the data warehouse where the data is automatically sent each morning. A data warehouse is exactly as it sounds — a place where data from many different places lives. Within Big Query, you can apply SQL logic to transform the data and make it more useful. This tool is both sophisticated and cheap (~$50/month for most businesses I have worked with). Not to mention, Big Query can scale its operations to host growing data needs as big as you need.

Lastly, the tools I use to visualize data are Google Data Studio (free for Google accounts) and Connected Google Sheets (available to Google Enterprise accounts). Both of these tools connect directly to the data warehouse and ensure your reports are updated each day.

A simple yet powerful set of tools

Using these tools, any company can get an automated data system spun up quickly and without investing huge sums in human capital and software fees. And the big unlock of having an automated reporting system is now your team will be able to run A/B tests, make data-driven decisions in real-time, and ultimately provide the foundation for your company to scale.

The next chapter

The last 5 years have been transformative in my professional development, which I credit a lot to the tools that are making life a heck of a lot easier for analysts. I feel grateful for the journey I had to get here, and I am excited for the next 5 years to see where this industry goes! If anyone is interested in learning more about my process, I love helping people get setup with these tools. You can hire me through my business or you can also setup a free consultation, where I can help you get these tools setup on your own if you wanted to give it a go yourself!

--

--

Gabriel Freeman

What an interesting time to be alive. I write sports, tech, + just general life observations. @gabe_freeman21