Does Data Engineering Generate Business Value?

Quick answer: NO. But I have to explain that.

Neylson Crepalde
MLOps.community
5 min readJan 27, 2022

--

If you are somewhat familiar with data projects, you probably noticed that today we briefly have three main roles on data teams:

  • A Data Engineer, responsible for data acquisition, processing, and availability;
  • A Data Analyst, responsible for connecting business problems with historical data analysis. They are worried about understanding "what happened";
  • A Data Scientist, responsible for hypothesis testing and predicting the future.

It is also already known that most of a data team’s time is spent on data cleansing, transforming, and structuring. That is the main reason why data engineers are so sought after. Given the volume of data organizations deal with, the vast amount of different sources/structures, and the incredible velocity with which this data is expected to be delivered, data engineering tasks are not trivial at all. They are complex, time-consuming, and demand highly skilled professionals (which, by the way, are hard to find these days).

I have been working with several data teams in various projects in different industries in the past few years. During this time, I have realized some challenges regarding data engineering. One of them is that business users usually have a hard time perceiving value on data engineering deliverables. What business value does a data pipeline (one of the main deliverables created by data engineers) generate by itself? It is a tricky question.

Some words on the matter

Photo by Tom Hermans on Unsplash

After some research on the topic, two medium posts caught my attention. The first one by Quantum Black's team on the role and importance of data engineering. After discussing four archetypes of data engineering maturity level among organizations, they state that

In all of these archetypes, data engineering plays a critical role; it is often the make or break factor in organisations achieving their North Star (highest level) in analytics.

Further on, on the importance of data engineers, they state that

Data engineers can “unlock” data science and analytics in an organisation, as well as build well curated, accessible data foundations.

In this second post, Lewis Gavin describes data engineers as “Data Science enablers”. He argues that 60% to 70% of time in data’s team is non-productive, and spent on data cleansing, structuring, ETL processes, and so on. This amount of time should be absorbed by Data Engineers, specialists in the matter, so analysts and scientists could focus on their primary tasks. To him,

To get to the moon faster you don’t need more astronauts. You need people to build the rocket that will let the astronaut do their job.

Thus more data engineers are needed than scientists or analysts. On leaving data processing tasks to data engineers, he also argues that

Not only will this improve the efficiency of your Data Science team but their output too. Your data is an asset and should be treated as such. Having an engineer build reliable, scalable and repeatable practices into your data platform is essential for any company looking to use analytics for growth.

Searching for an answer

Photo by LinkedIn Sales Solutions on Unsplash

In order to understand the business value generated by data engineering and how this is perceived by business users, I and my team interviewed several engineers with different seniority levels, product managers, agile coaches, data projects sponsors, tech executives within and outside our organization. This is what we have found out:

Data engineers, in general, state that their tasks generate a lot of value. Indeed, they make it possible to access some data, integration between different systems or data sources, consistency, and regularity in the data processing. Even so, it is common ground that these possibilities are complex for the business users to perceive, specially because the main interface they use to deal with data is usually a spreadsheet or a dashboard or something else that engineers did not make.

Usually, when projects have a more technical approach (aiming to reduce data processing time or a number of calculation errors in an analytical dataset or even optimize cloud resources cost), the value perception on data engineering comes easier. This also happens when data pipelines are connected to automated decisions opening the possibility to what prof. Claudio Lucio calls “actionable insights”. The idea comes from an old term used by Gartner Business Activity Monitoring — BAM and , in fact, it emphasizes that data projects are inner related to decision making and not only dashboards. In this sense, in a data engineering phase there are many opportunities to provide users a better decision making. Automated alerts based on a customer's last purchase (simple alerts, not recommendation systems) or the most purchased item in the last 6 months, or just a warning that some equipment have strangely behave are just some examples. The sky is the limit with this.

Last words

Once, I heard from a tech executive: “If I could, I would extinguish the data engineering job. Why? Data engineers are expensive, really hard to find, their tasks take a lot of time, and the value only comes after”.

Although the sentence can sound a bit too radical, it illustrates the point: quality-assured data is the main deliverable provided by the data engineers. Data by itself will not fix a problem, improve a process, or help focus attention on a valuable target, but it enables all these possibilities.

There is no data science or data intelligence whatsoever if you don't have trusted data. So,

Data engineering is a necessary condition for generating business value but not a sufficient condition.

In other words, data engineering does not generate business value by itself. Nevertheless, you simply cannot generate data value without engineering.

Prof. Claudio Lucio contributed to this text.

--

--

Neylson Crepalde
MLOps.community

Senior Generative AI Strategist @ AWS | PhD | Professor