We actually haven’t been using Airflow Connections with this new approach! We do the following instead:
1. Create a service account in Google Cloud Platform with the appropriate permissions.
2. Create a Kubernetes Secret using that service account’s json keyfile.
3. Pass the Kubernetes Secret name to the…
Interesting! I hadn’t read up on Apache Oozie before, I’ll take a look.
The Airflow Kubernetes Operator was created by Bloomberg. The main difference is that our Operator spins up arbitrary Kubernetes resources given a Kubernetes yaml, while Bloomberg’s Kubernetes Operator creates specific…
I’ve heard other good things about plugins since writing this post, we will look into it!
However, one of the main benefits of pushing people to use our KubernetesOperator is they don’t have to write any custom Airflow code. Not only does this simplify developer onboarding to Airflow (all…
We had that problem as well! We actually re-implemented XCOMs internally in order to make passing data between discrete steps work more efficiently. I just let the main developer on that code know that he should write a blog post as well!
The BigQuery Operator is fairly simple! However, Operators vary in complexity and we’ve still found bugs in seemingly simple things.
For example, a prior version of the DataFlow Operator took a service account (to use to execute the work) as a parameter, and then didn’t end up using it at all! This meant that work was being…
The Jobs are written as typical Kubernetes Job yamls (although we do add a couple extra parameters to every yaml to make this work): https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/.