Paul DeVos
Sep 1, 2018 · 2 min read

Igor,

Have just finished reading your 3 article series. Really enjoying it as I’m pretty new the past few months to DevOps as a primary workflow and the the number of tools as well as where one ends and another begins (end-to-end? Config only? etc) is quite daunting. And if that wasn’t enough, there’s a “Stack” for each language it seems!! GULP!!

That said, I’m trying to take what you present and interpret it in a Data Warehouse context — as in, is there much difference if any in the tools and what ‘state’ you have to preserve? Not sure if you’ve worked much with Data Warehouse side, especially around that main orchestrator node, an EC2 in my case, which is responsible for all automation, scheduling, and monitoring of ETL processes.

I’ve previous built this node with bash scripts, but it definitely becomes mutable (not ideal). This scenario may not be much different from an application in production, but an app in development — it seems there’s much more emphasis on maintaining all the data. As in maintaining all ETL load history, updated ETL scripts (Airflow DAGs), new credentials (e.g. SSH, etc), and other ‘config’ like things is critical.

So figuring out best “templates”, methods, CI/CD (in this context) to maintain adding processes as well as if the service (e.g. Airflow, Kubernetes) fails.

Thus I have 3 questions I’m hoping to learn more about, either in your articles or if you can just answer as is…

  1. Would you also architect this orchestrator box with Terraform? I started with Ansible as my primary language is Python, but seems it may not be a great choice as it can’t preserve state of the software, well, unless, if I understand this correctly, you would need vendor support e.g. Ansible Tower.
  2. So what are the best options for maintaining software services/state on boxes? if that makes sense. e.g. Make sure Airflow is always running, if not, re-pull Github, the credentials, etc, and re-hook up to the meta database to pick up where loads were left off and kick it off.
  3. Your “home base” — would you recommend using your local (e.g. MacOS in my case) or should I create say, a Linux VM (with Vagrant) and build everything off of that? I don’t know if that presents some ‘tricky’ SSH like communication issues from “home” to that orchestration box.

    Paul DeVos

    Written by

    Python Engineer, Data Engineer, DataOps, HPC, Deep Learning, Reading, Basketball History, Exercise, Meditation, Living Intentionally