Data Engineering

Structured, Semi-structured, and Unstructured Data

A very quick introduction to the concepts of data in the context of a Data Warehouse, Data Lake, and Data Lakehouse

Angelica Lo Duca
syntax-error
Published in
1 min readApr 22, 2024

--

There are three main data types:

  • Structured data,
  • Semi-structured data,
  • Unstructured data.

The following table shows these data types and briefly describes the typical formats, their pros and cons, and some practical examples.

Image by Author

There are different types of storage systems, including:

  • Data warehouse is a central repository containing only structured data and is used for reporting and analysis.
  • A data lake manages and provides ways to consume or process structured, semi-structured, and unstructured data. Ingesting raw data permits a data lake to ingest historical and real-time data in a raw storage system.
  • Data Lakehouse is an augmented data lake with support for transactions at its top. In practice, a data lakehouse modifies the existing data in the data lake, following the data warehouse semantics.

Learn more in the following video:

or in the book:

Thanks for your reading, and see you next time! Bye bye!

--

--

Angelica Lo Duca
syntax-error

Researcher | +50k monthly views | I write on Data Science, Python, Tutorials, and, occasionally, Web Applications | Book Author of Comet for Data Science