Run GitHub Actions faster with cache for pipenv and docker buildRecently we create more PRs, notice that there are a lot of redundant steps (env setup before triggering checks, etc). Found out you can…Jun 29, 2022Jun 29, 2022
Use pyspark locally with dockerFor data that doesn’t fit into memory, spark is often a recommened solution, since it can utilize map-reduce to work with data in a…Jun 29, 2022Jun 29, 2022
Use SSH key during docker build without embedding the key via ssh-agentImagine working in a company, and they have a super cool internal module! The module works great, except that it is a private module, which…Jun 27, 2022Jun 27, 2022
What SQL can’t do for data engineeringI often hear people ask “if you can do data engineering with SQL, then what’s the point of learning spark or python?”Jun 27, 2022Jun 27, 2022
Secrets Management with SOPS, AWS SSM and TerraformAt Baania, we use SOPS to check in encrypted secrets into git repos. This solves plaintext credentials in version control. However, say…Jun 27, 2022Jun 27, 2022