17 Things I Wish I’d Done Earlier as a Data Engineer
I don’t like to dwell on the past. And a lot of times, it is essential to make mistakes yourself. And yet, I find myself recommending many practices to new data engineers.
Because I felt the pain personally, and it wasn’t fun. I spent two months early on my job writing a testing framework for your good old ETL tool, Pentaho Data Integration. A colleague even built a “diffing tool” since the data pipelines inside PDI don’t even version well.
All of it, only to realize two years later (a long time!) that we could’ve switched tools to one with good support for versioning and tests.
I'm not saying I regret it, but I think these … might benefit you…
- I was lucky enough to come into the job already able to write some Python. I thought. Boy, was I wrong; I didn’t know anything about programming. Learn to program early on, including testing and all that not-fun stuff.
- Write tests for your data pipelines. The story above isn’t about lost time; it’s really about how much better our life as a team got once we introduced tests into our monster ETL-monolith. It’s as simple as this: If you have a test, you know what you broke.
- And yes, if your ETL or EL or whatever data pipeline tool does not support testing or versioning, strongly…