The role of data engineer in a modern (wannabe) data-driven company

Korina Kuhar
Nexi eCom
Published in
6 min readAug 10, 2023
Working with data, Craiyon. (2023). AI-Generated Image: Data Engineer. Retrieved from [craiyon.com]

Data driven. Data engineering. It almost feels like the topic needs no introduction at all. A buzzword we hear so often, not only within the tech community, but also in everyday work. Seems almost like it is a new norm, having a data team, as having the HR or the IT department. In fact, there is almost no company left which hasn’t already or isn’t in the process of establishing a data team with the goal to “fix their data problems“, or “help company become data driven“ or… you get the point.

So, who works in those data teams? Well, data engineers, of course. So, yet another buzzword: data engineers.

So, who are the members of those data teams? Naturally, data engineers are part of the equation. Thus, introducing yet another buzzword in the field: data engineers. Advertised as one of the currently most desired professions with the demand still on the rise. With the high demand, one must wonder what about the supply? Is there a recipe for one becoming a data engineer? What specific challenges do they address and resolve?

Even though this could be a post about data tools and tech stack used by data engineers, I choose not to discuss the technology, as it changes so fast that we may have new tools available before I finish writing this post (well, almost!).

As everyone seems to be determined to recruit these elusive and highly skilled (not my words) people, I would like to shift the discussion to the role itself — its current shape, how it is evolving and why is it so difficult to increase the pool of qualified candidates?

The why

Let us allow ourselves to take a step some years back, when everyone was talking about data science and machine learning, at the beginning of, what we now dare to call a data revolution (or at least large-scale data movement). Among other things, the movement discovered a major flaw: companies do not have their data in order. Of course, there were different scenarios, some were in better place than others (think traditional data warehouses). The need for the data is not new, what is new is the need for large scale data reliability. Suddenly, it is not enough to extract data from a database to Excel and create a report. The new need was, all the data, all the time, highly reliable and.. don’t bankrupt the company while doing so. In other words: big data.

But how does one get to that higher level in data evolution? One thing is the new technology that enables it and the other is the “data experts”, the ones whose job is to supply the business with fresh, correct, and reliable data. What do we call those experts? Data.. engineers?

A profession has risen by the data evolution in order to fill a void. With the evolution being fast and many companies picking up on it, there was no time to formulate a corresponding degree to supply us with these highly desirable experts. Instead, initial data teams were a beautiful blend of “converted” enterprise data warehouse engineers and some newcomers, fresh entries to the job market ready to put into use their IT/maths/physics background and give it a go in this new field.

One can say we gathered a group of individuals capable of recognizing and addressing the specific problem, either through possessing the required skills or being able to swiftly acquire them.

While the scarcity of qualified candidates can be the consequence of the lack of formal education, it is not the sole reason. Data engineering is still in its early stages as a profession, and due to its rapid evolution and elusive nature, it is not as enticing as some other IT specializations.

A higher level of clarity regarding the nature of the job and required qualifications is needed to help us attract prospective date engineers.

Now, there is a common misconception that data engineering is the equivalent of using the latest languages, platforms and techniques: python, sql, Databricks, Snowflake, dbt, Airflow, Fivetran, Airbyte … or any others. I could also bore you trying to explain how it’s all about building pipelines, scalability, setting up data platforms, etc. And the above is not wrong but it answers the wrong question, “the how” instead of “the what “.

I like the quote from Oliver Molander: ”Data engineering is not — and never has been — about any particular technology.

Allow me to try and explain what it IS about.

The job

Data detective, DALL-E2. (OpenAI © 2015–2023). AI-Generated Image, Retrieved from [OpenAI]

We are building a data team!” What we mean is, “let us hire some data engineers, lock them up in the tower and in a year we will be data-driven”. Slightly exaggerated but not far from the truth. Also, famous last words (LoL) — if you have one (data) problem now, you will have two in a year!

Despite data engineers spending a significant amount of time programming, building data capabilities is not a lonely programming task but a detective task: Yes, that’s correct. Data engineers are simply detectives who code, it is the people, not the technology. Data engineers must understand how the business works, which data is stored and processed, where and which problem they are helping to solve. In addition, an extensive knowledge of the IT infrastructure is needed, to implement a solution.

Think about it in the following way: when I was working in a bank, we were taught how we are all ‘risk officers’- cause we all managed risk in our daily work. I dare to say the same for data: we are all data officers — every single employee who uses, processes, or creates data is a data officer. They have the knowledge and context to make data a piece of usable information. Do you see why it is a detective game? Someone needs to gather all the ‘clues,’ connect them together, and only then, the story (the data) will make sense. Get it wrong, you got the story wrong, the data product is wrong, and your conclusions will be wrong.

That is why we talk about so-called data products: data solutions designed by data engineers to help solve business challenges. The data ‘detectives’ need to investigate how the business works, why and when certain data is used before designing a data product.

The future

Lately there has been a rise of teasers suggesting the “downfall of the data engineers”, moreover since the introduction of the AI powered chatbots which seem to “have the answer to everything. The way I see it, we are looking at the next evolution of the role, rather than the downfall. As the field evolves, new roles will start branching out from data engineering such as “analytics engineer” (coined by the dbt Labs team). Another observation is the rising popularity of user-friendly, graphical tools enabling business analysts to extract insights from data.

Does this mean we will soon witness the end of an era? Not at all. As long as organizations keep having an appetite for new data products, there will be a place for data engineers. However, we might witness a better separation of responsibility between business analysts and engineers, and eventually, as the processes become more streamlined, less detective work and more implementation. As far as the new technologies go, data engineers are not threatened by it, in fact, we embrace it! Try thinking of it this way: just because you equipped your kitchen with modern new appliances, it will not make you a chef. However, giving a cool new kitchen to an experienced chef will help them cook faster, more efficiently and for more people at the same time. Just like the new kitchen will not replace the core skill of cooking, and the invention of the microwave oven did not result in the downfall of chefs, the new data tools will not replace the need for skilled data engineers.

Confused chef and a microwave oven, DALL-E2. (OpenAI © 2015–2023). AI-Generated Image, Retrieved from [OpenAI]

--

--