Accumulate Stepping Stones

What building a data pipeline taught me about ambition

Josh Embree
4 min readApr 15, 2024
Photo by Lisa Baker on Unsplash

Five years ago I did a customer analysis for a direct to consumer brand. They wanted to know things they couldn’t get through the standard analytics in Shopify so I downloaded their data, wrote some python code in a notebook, and made a few plots. They liked it and asked if I could run that for them every month. I said yes but thought, “hell I could run it every day if I didn’t have to manually download the data and could run my code somewhere more reliable than my laptop.” And with that, my first data pipeline project began.

I knew Shopify had an API so I could, in theory, pull the data programmatically through that. I was also trying to learn AWS so I figured I could deploy my code there after I got it working locally. So if I could “just” write the code to pull data from Shopify, store it in S3, run my analysis, and schedule it all somewhere in AWS, I’d be in business. Then I thought, “maybe other companies would want the same thing and I could just sell it to them without much extra work.” The rate of “if” and “just” in my thinking might has well have been a big orange “CONSTRUCTION AHEAD” sign.

Step One: figure out how to send a request to the Shopify API and get orders data back. Not knowing anything about REST APIs, this took longer than I’m willing to admit even today. I also stubbornly refused to use the Shopify python library that makes this straightforward. Instead, I used plain old requests and figured out how to construct the URL to send the right request and get back what looked like orders.

Step Two: learn how the API works to get all of the orders between two timestamps. This seemed quick and easy since I was able to send the right request and get back what looked right after only a few tries. Then I realized I was getting the exact same number of orders for every 24 hour period requested. So I got to learn all about pagination. Of course, the API won’t just send you ALL orders, that would be crazy. You get up to a certain number in your first request and if there are more, there’s an indicator you have to request the next “page” and repeatedly send requests until you actually get everything you wanted.

Steps three through some number I don’t want to remember included figuring out API rate limiting, IAM policies, credential management, AWS lambda, S3 encryption, proper error logging, web app hosting, and step function orchestration. At each step, I naively assumed the rest of the steps would be easy once I got through just this one last thing. Some steps were quicker than others, but most of them took what felt like ages and required learning something I didn’t even know existed before.

After about six months of chipping away at this during nights and weekends, I had a backend pipeline that would pull data from Shopify every night, aggregate it, transform it, and store final tables in S3 as CSVs. I also had a front end running on an EC2 instance where users could view dashboards and charts showing that data as it updated regularly. I was proud of it but had you told me what would be required when I first had the idea, there isn’t a chance in hell I would’ve started.

My ignorance of how hard things could be provided the mental/emotional headroom to keep going. I was able to accumulating stepping stones without feeling anxious or overwhelmed by all the work that was left to do after each step. It also kept me from doing something stupid like calculating the return on my time invested. Had I done that at any point along the way, I would’ve quit immediately.

With the benefit of time and hindsight, my return on time invested was actually very high in the long run. I didn’t end up selling that tool to other companies but I ended up leveraging the skills to get a better job and it opened doors to opportunities I couldn’t have imagined five years ago. My ambition for building something of value kept me working long enough to actually make myself more valuable. The work did more for me than I did for it.

Now that I know how difficult and complicated these efforts can be, I catch myself shying away from ambitions projects. I do a quick expected value calculation on the time I’ll likely put in relative to the immediate return and flinch a little. But then I remember I’m not Warren Buffet making a value judgement on shares of Apple. I’m just a guy trying to make stuff work. It’s ok to give a little extra time in the short run to accumulate stepping stones that may lead somewhere amazing in the long run.

I don’t want to be someone who only does things that make sense. I want to be someone who does things because they’re awesome and figures out how to make enough of them make sense. I’m going to practice accumulating more stepping stones and letting go of the expected value math I can’t seem to avoid doing in my head. If you’ve read this far, I know you have the courage to do the same.

Keep those capable hands dirty.
Josh

PS — If you can handle a little machine learning math, these AI researchers wrote a great book for general audiences about the myth of the objective and accumulating stepping stones: Why Greatness Cannot Be Planned

--

--