Data Engineering and Why it is trending now ?

Saurabh
FalabellaTechnology
7 min readSep 23, 2021

As we all know that Data is at the centre of every business today. Data is the fuel that drives companies and no Organization can function without data these days. With huge amounts of data being generated every second from business transactions, sales figures, customer logs, stakeholders and associated devices. All this data gets piled up in a huge data set that is referred to as Big Data.Together, this data provides a comprehensive view of our business.

This data needs to be analysed to enhance decision making. However, there are some challenges of Big Data encountered by companies. These include data quality, storage, lack of data professionals, validating data, and accumulating data from different sources. Data analysis is challenging because the data is managed by different technologies and stored in various structures.

Companies use data to answer many different aspect of their business, such as:

a. Identifying Customer and their 360 degree view

b. What’s a new customer worth?

c. How can I improve Customer experience ?

d. What are the fastest-growing product lines?

e. Detect and remediate security vulnerabilities before it occur

Companies of all sizes have huge amounts of disparate data to comb through to answer critical business questions.

What is Data Engineering ?

Now we all want to know what is Data engineering and where it fits in Big Data world. So here is my answer:

The key to understand what data engineering lies in the “engineering” part. Engineers design and build things. “Data” engineers design and build pipelines that transform and transport data into a format wherein, by the time it reaches to end users, it is in a highly usable state.These pipelines must take data from many disparate sources and collect them into a single warehouse that represents the data uniformly as a single source of truth.

How did Data Engineering Come About ?

Many would say that data engineering as a profession has been around for well over a decade, maybe a couple, ever since databases, Microsoft SQL Servers and ETL came to be. Some would say ever since IBM popularised database management systems in the 1970s.
Even after the rise of the internet in the 1990s and 2000s, ‘big data” came to be. Yet DBAs, SQL Developers and IT professionals working in the field were not labeled “Data Engineers” at that time.

So why the new job title?
Let’s summarise by saying that a lot of huge technological changes happened which escalated big data volumes, variety, and velocity. Around 2011 the term “Data Engineerstarted to crop up in the circles of new data-driven companies such as Facebook and AirBnB. Sitting on mountains of potentially valuable real-time data, software engineers at these companies needed to develop tools to handle all the data quickly and correctly.

The term “data engineering” evolved to describe a role that moved away from using traditional ETL tools and developed its own tools to handle the increasing volumes of data. As big data grew, “data engineering” came to describe a kind of software engineering that focused deeply on data — data infrastructure, data warehousing, data mining, data modeling, data crunching, and metadata management.

Who is Data Engineer, Birth of the data engineer ?

Data engineers make raw data usable and accessible to other data professionals. Organizations have multiple sorts of data, and it’s the responsibility of data engineers to make them consistent, so data analysts and scientists can use the same. If data scientists and analysts are pilots, then data engineers are the plane-builders. Without the latter, the former can’t perform its tasks.

Data teams before the Big Data craze were composed of BI and ETL developers. Typical BI / ETL developer activities involved moving data sets from source to destination and building the web-hosted dashboards with that data (BI). Specialised technologies existed for each of those activities, with the knowledge concentrated within the IT department. However, apart from that, BI and ETL development had very little to do with software engineering, the discipline which was maturing heavily at the beginning of the century.

As the data volumes grew and interest in data analytics increased, in the past ten years, new technologies were invented. Some of them died, and others became widely adopted, that in turn changed demands in skills and teams’ structures. As modern BI tools allowed analysts and business people to create dashboards with minimal support from IT teams, data engineering became a new discipline, applying software engineering principles to ETL development using a new set of tools. This was the era when data engineers born in BigData World.

Why does the world need Data Engineers and trending now ?

“The need for more complex, code-based ETL and changing data modeling drove the demand for data engineering.”

One would almost be forgiven to think that Data Engineering is a relatively new ‘buzzword’, spawned almost at the same time as a Data Scientist. While Data Engineering is definitely a byproduct of the data science discipline, it is nowhere as new as Data Scientist.

  1. Data Engineering Is Not A New Phenomenon:
    The first wave of data engineers worked on Apache Hadoop and conducted data wrangling jobs in leading tech companies such as Yahoo, Google and Facebook. Comingore cites how by 2010, big companies rapidly adopted Hadoop, pivoting data engineering from niche to mainstream. This is what led to the rise of modern day data engineer in enterprises today and also drove a division of the two roles — a) one who could work on the data processing system (clean & organise datasets); b) someone who could mine the datasets for patterns and insights.
  2. Companies Turn Data Fabulous, Give Rise To Data Engineering:
    So, what pushed Data Engineering into prominence today? Besides the mainstream adoption of Hadoop, the resurgence of Data Engineering can also be attributed to the rise of new-age tech companies such as LinkedIn, Airbnb, Netflix, Spotify, Uber and in India Flipkart, Ola, InMobi, Paytm, BigBasket among others that are at the forefront of developing cutting-edge data-driven products.
  3. The rise of cloud:
    Cloud has finally reached a tipping point where even institutions such as finance and government, that have historically shied away, are embracing it. In the last 4 years alone, the market for cloud computing has doubled from ~$114B to ~$236B. Amazon Web Services has led the market over the past several years (currently at 33% market share) but Microsoft Azure (13%) and Google Cloud Platform (6%) are catching up.
  4. The expansion of open source:
    Data engineering used to be dominated by closed-source, proprietary tools. Now we are seeing a growth of open source tools and, in many cases, a preference for these tools in data organizations. Open source libraries such as Spark and Tensorflow have become widespread and many organisations are seeking to minimise vendor or product lock-in. This was a driving factor in open sourcing QuantumBlack’s very own Python library, Kedro.
  5. The growth of data in scale:
    Companies simply have more data at their disposal now more than ever before, which makes it more important for data engineers to understand how to scale. More than 90% of the world’s data was created in the last few years. Data engineers need proficiency in tools that can help to quickly organize and assess this massive amount of data.

Today these enterprises are plowing more money in their data processing systems with the aim of uncovering insights from petabytes of data that would give them an edge over the competitors. The job role transcended from simply handling large scale data processing and preparing data for analysis to adapting the new technology to handle both big and streaming data.
In a way, just as a Data Scientist is crucial to drive business strategy, a data engineer is required for the data preparation and making data ready for analysis. In other words, the two job roles are interdependent and this explains why a 2016 survey attributed cleaning data the most time-consuming jobs and cited why companies should free up their data science team to enable them to spend 79% more time on analysis. In other words, data engineers are crucial asset to get more value out of data.

The market has seen a surge in demand for data scientists in the past several years and we see almost all universities and colleges offer some kind of data science courses and programs. However, data engineers are usually harder to train and source because the program needs to be very practical/hands-on and there is not much theory to teach. The open source communities are also pushing out new tools and platforms on a regular basis which makes teaching data engineering challenging because materials need to be updated rapidly to keep up with the latest trends. At Falabella, we have heard from many hiring managers and recruiting agencies say that while the demand for data engineers is great, data engineer talents are even harder to find compared to data scientists.

Conclusion:
In a recent survey by Stitch Data, 50% of the world’s data engineers reside in the US. India ranked second with 11.96 % of data engineering talent in the country. With the rise of new-age tech companies, data engineering has grown in size and visibility since enterprises know the real value of data can be realised with a robust data infrastructure and architecture. No wonder enterprises today are scrambling to find the best talent in this hot and buzzing field.

If you are interested in joining our data engineering team please contact me.

--

--