17 Things I Wish I’d Done Earlier as a Data Engineer

Sven Balnojan
Geek Culture
Published in
4 min readSep 1, 2023

--

The author — in front of his laptop doing some heavy data engineering. Photo by Catherine Heath on Unsplash

I don’t like to dwell on the past. And a lot of times, it is essential to make mistakes yourself. And yet, I find myself recommending many practices to new data engineers.

Because I felt the pain personally, and it wasn’t fun. I spent two months early on my job writing a testing framework for your good old ETL tool, Pentaho Data Integration. A colleague even built a “diffing tool” since the data pipelines inside PDI don’t even version well.

All of it, only to realize two years later (a long time!) that we could’ve switched tools to one with good support for versioning and tests.

I'm not saying I regret it, but I think these … might benefit you…

- I was lucky enough to come into the job already able to write some Python. I thought. Boy, was I wrong; I didn’t know anything about programming. Learn to program early on, including testing and all that not-fun stuff.

- Write tests for your data pipelines. The story above isn’t about lost time; it’s really about how much better our life as a team got once we introduced tests into our monster ETL-monolith. It’s as simple as this: If you have a test, you know what you broke.

- And yes, if your ETL or EL or whatever data pipeline tool does not support testing or versioning, strongly…

--

--

Sven Balnojan
Geek Culture

Head of Marketing @ Arch | Data PM | “Data Mesh in Action” | Join my free data newsletters at http://thdpth.com/ and https://svenbalnojan.gumroad.com/l/oivjd