Behind the Scenes: The Role of Data Engineers in Big Data

Chisom Nnamani
Towards Data Engineering
2 min readSep 22, 2023

Most times when we hear data engineering, we hear big data, and this always comes with the common belief that Data Engineers work with large amounts of data in their career.

In this article, we’ll both walk through what Data Engineering really is, what Data Engineers do, and the role they play in big data.

Designed by Author

What is Data engineering, and what exactly do Data Engineers do?

Before we understand Data Engineering, we need to understand the four steps through which data flows within an organization:

• We collect and ingest data from web traffic, surveys, and media consumption.

• We clean the data by finding missing or duplicate values and converting the data into an organized format.

• We explore it, visualize it, build dashboards to track changes, or compare two sets of data.

• We run experiments to evaluate which article title gets the most hits, to build predictive models, for example, or to forecast stock prices.

Now, Data engineers are responsible for the first step of the process, which is ingesting the collected data and storing it. They have a great responsibility as they lay the groundwork for data analysts, data scientists, and machine learning engineers. If the data is scattered around, corrupted, and difficult to access, there’s not much to prepare, explore, or experiment with. 🤷🏽‍♀️

And that’s exactly why companies and enterprises need Data Engineers; their job is to deliver the correct data, in the right form, to the right people, as efficiently as possible.

So when it comes to their responsibilities, Data Engineers:

  • Ingest data from different sources
  • Optimize database for analysis
  • Remove corrupted data
  • Develop, construct, test, and maintain data architectures.

Now, to data engineers and Big data.

With the advent of big data, the demand for data engineers has increased.

Big data can be defined as data so large that you have to think about how to deal with its size because it’s difficult to process using traditional data management methods.

Big data is mainly composed of sensor and device data, social media data, enterprise data, and VoIP data.

Big data is commonly characterized by the five Vs:

volume (the quantity of data points),

variety (type and nature of the data: text, image, video, and audio),

velocity (how fast the data is generated and processed),

veracity (how trustworthy the sources are), and

value (how actionable the data is).

When working with big data, Data engineers need to take all of this into consideration to effectively manage and extract meaningful insights from large and complex datasets.

This video by AltexSoft succinctly explains in a lovely and fun way what Data Engineering is, and I am sure you will love to check it out!

Connect with me on LinkedIn and follow me on X (formerly Twitter). 🥳

--

--