The rise and future of data engineering — what’s it all about?

I still remember how not that many years ago it was common for people to refer to a data scientist as a “unicorn”. The expectation level was set at a super full-stack engineer/mathematician who could understand all business problems. However, during the past 2 years, as we’ve passed the peak AI/ML hype, we’ve witnessed the rapid rise of the data engineer. Dice’s 2020 tech jobs report cites data engineering as the fastest-growing job in tech in 2020, increasing by a staggering 50%.

Data Engineer is the fastest-growing job in tech (Source: Dice 2020 Tech Job Report)

The rise and evolution of the data engineer

Today, cloud data warehouses (Snowflake, Amazon Redshift and Google BigQuery) and lakehouses (Databricks) provide the ability to store massive amounts of data in a way that’s useful, not completely cost-prohibitive and doesn’t require an army of very technical people to maintain. In other words, after all these years, it is now finally possible to store and process Big Data.

Everything is trending towards a bright future for data engineering

The next evolution of the data engineer

But what is data engineering today?

Data engineering today = data pipelines?

Data engineers come in different shapes and colors

Data Science Hierarchy of Needs (Source: Monica Rogati)

A real-world example from a fast-growing scaleup with a modern data team

A high-level view of how the three roles have similarities and differences in their focus (image source Oda)

Being a data engineer = shiny new tools?

Example Data Stacks (simplified) we often see at Validio

Does data engineering exist for the sake of data science?

Image source Marijn Markus

Final thoughts

I want to disclaim that I’m not a data engineer myself. This post and the observations made are based on numerous discussions I’ve had with data teams — from fast-growing startups and scaleups to large publicly traded companies.



