Data Data Engineering 101

60+ Years of history in Data Processing

Data Engineering Beyond Hype #1

Saikat Dutta
CodeX

--

Image by WikiImages from Pixabay

This is a light read on the history of Data Analysis and processing over the years.

However, at times light reads give so much perspective into the Why and How it was done a certain way, and How far we have come from the moment we started.

Let’s try to dig in through 60+ years.

The History:

The earliest records of storing or processing Data were different Cave images and inscriptions to record best practices and danger.

The computation might have started with the invention of 0 as a base and the decimal system.

The earliest computational device was the Abacus, invented in ancient Egypt.

Napier, Pascal, and Leibniz made significant contributions toward Data Processing and computation.

However, in 1834 Charles Babbage designed the forerunner of the modern computer, the mechanical Analytical Engine.

The product did not quite succeed due to the slowness of mechanical parts.

However, concepts he designed like Input, Storage, Processor, and Output form the basics of modern computer design.

Manual Data Processing

Manual Data Processing continued for a long time, 1850 to 1880 US census data had manual processing using tallying.

With the increase in data, they introduced a punch card-based system.

Tabulating Machine Company was formed, which later became the huge IBM.

Slowly Data Processing moved to computation on mechanical and electronic computers.

Electronic Data Processing

With the invention of the first modern-day computers, MARK1, ENIAC, and UNIVAC electronic data processing received a huge boost.

Modern Data Processing

Started with the invention of DBMS in the 1960s and Relational DBMS in the 1980s.

SQL was invented in 1974. That completely changed Data Analysis for the better.

Parallaly Spreadsheets too became popular, starting with VisiCalc, lotus and subsequently, MS-Excel grabbing the lead market share.

Excel became synonymous with any kind of Data Analysis.

Data Warehouse Systems

Became popular in the 1980s and was used to ingest data from disparate sources and allowed fast analytics.

This had indexes, schemas, transactions, etc.

They became extremely popular and are even in use today for analytical workloads.

They are extremely fast in Data Analytics and aggregation workloads as they are specifically designed for that use case.

Datalake systems

However, Data saw a massive increase in volume and speed with the invention of the Internet ( World Wide Web).

It continued to grow exponentially with social media, SIoTors, IOT, etc.

With the increase in Data, Data Warehouses become restricted.

Data warehouses were limited in capacity to store huge data, high cost of processing, and lack of machine learning and DS use cases on them.

In the 2010s community hardware was used to store data using HDFS and other object storage and file systems.

Big Data processing become more common.

Data Lake architecture became popular with distributed systems built on huge clusters of community hardware and processing.

Cloud Data Warehouse

Post-2015 Cloud data warehouses become the norm and gained popularity.

People understood the limitations of ACID principles in Data Lake.

Data Analysts also loved the BI systems on top of Data Warehouses.

Hence Data Warehouses came back but albeit on the cloud with increased distributed architecture and scalable hardware on the cloud.

They still were limited in usage with unstructured and huge data sets.

Data Engineering Boom

In the 2020s New data architectures like Data Fabric, Data Mesh, and Data Lakehouse architecture are becoming more and more popular.

Dedicated links ( ex. Azure Synapse Link ), and Data Virtualization are making it possible to run Analytical workloads, on top of the source data itself, but on a replicated/virtualized copy, without 0 actual data copy.

Data Governance, Data Catalogs, Data Ops, and Metadata Activation are the new buzzwords in the Data world.

Just as in the 2010s, I think we are on the verge of another decade in the 2020s where Data related technologies will see a massive change. Already Data Engineering as a career path has been growing the fastest.

I think there are a lot of great possibilities just waiting to be explored by Data Engineers.

If you are not one already, NOW is the best time to become a Data Engineer.

See you again next week.

Whenever you’re ready, there are 3 ways I can help you in your Career Growth:

  1. Let me help you mentor with your career journey here.
  2. Download an exhaustive Data Engineering Roadmap here.
  3. Grow Your LinkedIn brand and get diverse here.

--

--

Saikat Dutta
CodeX
Writer for

Azure Data Engineer| Multi Cloud Data Professional| Data Architect | Career Mentor | Writer(Tech) | https://withsaikatdt.gumroad.com/l/DE2022