The Simple Reason Your Data Pipelines Will Never Be 100% Automated
Even AI solutions can’t fix gaps in vendor infrastructure and the ultimate data engineering constraint — time.
Want to create your own portfolio-worthy data automation project? Learn how with my free project guide.
Data Engineering: A Waiting Game
One of the worst waits I’ve endured was on a boat filled to capacity, idling in a monsoon; if that doesn’t sound bad enough: I was the captain. When at the helm of a boat, it’s these moments that make you question why anyone romanticizes boats or bodies of water. Of course, most boats offer an efficient, comfortable way to travel–in ideal conditions. Unlike cars and even most airplanes that can climb and dip below storms, when watercraft and their passengers encounter rough weather all they can do is wait it out.
Unfortunately, no matter how technically perfect and resilient your data pipelines are, there are aspects of ingestion even the best data teams can’t predict or out-engineer. Two of the most significant “out of our control” challenges I’ve encountered are:
- Vendor-related or “upstream” failures; all we can do is open a ticket and wait for a response
- Unpredictable load times
And while I could write a nice anthology series on the first challenge, I try to be a friend of all data vendors and API…