How to measure data-drivenness
I’ve spoken with several product managers recently who struggled to come up with an immediate use of the Astronomer ETL platform for their company. They were looking for “things on fire” to address.
In computing, Extract, Transform and Load ( ETL) refers to a process in database usage and especially in data…en.wikipedia.org
As companies seek to become data-driven, its people must become more savvy with ETL concepts; it’s how modern companies put their data to use.
Watch 50 seconds of this video (7:49–8:39).
As of six months ago, Airbnb processes 8,000 daily ETL tasks, which are orchestrated on a cluster of computers that play air traffic controller. It was an interesting measurement. Made me wonder “how many companies actually know their number?”
Also interesting is that authoring pipelines is not centralized inside Airbnb. Data engineers, data science, engineering, growth, and engagement people all write their own workflows!
So you want to be data-driven, right? Put your data assets to work for you?
ETL jobs/data pipelines actively put your data assets (internal data, third party data, open data) to work. It reasons that the more data processing you do, the more data-driven you likely are.
Could Daily ETL Tasks (“DETLT”) be an effective indicator for data-drivenness of an organization?
Thought experiment: If you wanted to boost your company’s DETLT, what data processing jobs would you do to “get your numbers up?” Assuming you’re creating valuable data and sharing it with stakeholders, wouldn’t this have a big positive effect on the business?
Most of us do not know how many data pipeline tasks our company runs each day; nor do we have any sense of how that metric has trended historically.
Could be an opportunity.
P.S. We are tracking this number in Astronomer, though we haven’t put it front-and-center within our customer’s dashboard as of yet.