Data Pipelines with python(draft)
Let’s build an end-to-end data pipelines with:
- Luigi and Airflow
- Apache spark
- Property based hypothesis testing
I’ll also point to some useful resources like the 12 factor app, docker containers etc. This will be a relatively intermediate post. Beginner for data engineers. The github repo with all the code can be found here:
My first 5 minutes on a server gives a pretty good start on getting basics of security right when playing with a new server.
Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well.
Few supplemental links and useful resources: