‘DAG’, what’s a ‘DAG’?? why is it cool?

DataKitchen
data-ops
Published in
2 min readMar 1, 2017

Today in my favorite nerd publication, Hacker News, there was a thread about tools to run a Directed Acyclic Graph (DAG) on data.

The article talks about a number of open source DAG-runner data management tools: airflow, luigi, etc. These tools are great to do data work on large data sets and have evolved out of need to run large data processing jobs at big consumer internet companies. Here’s what I wrote in response

[Bias Alert: I’m Head Chef of DataKitchen]. Our perspective is that the DAG abstraction should not apply only to data engineering, but the whole analytic process of data engineering, data science, and data visualization. Analytic teams love to work with their favorite tools — Python, SQL, ETL, Jupyter, R, Tableau, Alteryx, etc. The question is how do you get those diverse teams and tools to work together to deliver fast, with high quality, and reusable components?

We’ve identified seven steps taken from DevOps, CI, Agile and Lean Manufacturing (https://www.datakitchen.io/platform.html#sevensteps) that you can start to apply today.

The challenge is that there are many separate DAGs (and code and configuration) involved in producing complete production analytics embedded in each of the tools the team has selected. So what is needed is really a “DAG of DAGs” that encompasses the whole analytic tool chain.

--

--