Navigating the Data Landscape: Structured, Semi-Structured, and Unstructured Data Storage and Retrieval

Chandrashekar M
Plumbers Of Data Science
3 min readSep 19, 2023

let’s explore types of data (structured, semi-structured, and unstructured).
How they are stored in databases, examples of databases that handle these data types, and how they are initially fetched often using APIs.

  1. Structured Data:
  • Description: Structured data is highly organized and follows a predefined schema. It is typically stored in relational databases.
  • Storage: Structured data is stored in tables with rows and columns. Each column has a specific data type (e.g., integer, string, date).
Image Source: Fivetran
  • Tools: MySQL, PostgreSQL, Oracle Database.
Image Source: Fivetran
  • Fetching: Structured data is typically fetched using SQL queries, which allow for precise retrieval of data based on predefined criteria.

2. Semi-Structured Data:

  • Description: Semi-structured data doesn’t adhere to a rigid schema but has some structure. It’s often represented in formats like JSON or XML.
  • Storage: NoSQL databases, document databases, or JSON storage in relational databases are used for semi-structured data.
  • Tools: MongoDB (NoSQL), CouchDB (NoSQL), PostgreSQL (JSONB).
  • Fetching: APIs are commonly used to fetch semi-structured data. For example, RESTful APIs return JSON data, which can be parsed and processed.

3. Unstructured Data:

  • Description: Unstructured data lacks a predefined structure or schema. It can include text, images, audio and video.
  • Storage: Unstructured data is stored in databases designed to handle binary or text data, often in a raw format or with minimal metadata.
  • Tools: Amazon S3 (for object storage), Hadoop HDFS (for big data storage).
  • Fetching: Unstructured data can be fetched through various methods. For text data, web scraping or text extraction tools are common.
    For media files, direct file access or content delivery networks (CDNs) may be used.

Fetching Data:

  • APIs (Application Programming Interfaces): APIs are a common way to fetch data from various sources. RESTful APIs are prevalent for web services, returning data in JSON or XML formats.
  • Web Scraping: Used for structured and semi-structured data, web scraping involves extracting information from websites directly.
  • Data Ingestion: Data pipelines or ETL processes are used to fetch, transform, and load data from various sources into databases.
    Tools like Apache NiFi/ Talend are used for this purpose.
  • Direct Access: For unstructured data stored in file systems or object stores, direct access via file paths or URLs may be used.

The choice of data type and storage method depends on the nature of the data and the specific use case.
In modern data architectures, a combination of structured, semi-structured, and unstructured data is often integrated to provide a comprehensive view of an organization’s data landscape.
This diversity of data types and sources is what makes data engineering and management both challenging and exciting.

--

--