Columnar Storage Explained

Emre Aydın
2 min readDec 24, 2023

--

Traditional Row-Based Storage vs Columnar Storage

Traditional Row-Based Storage

  • Description: Data is stored row by row. If a table has columns (A, B, C, D), a single row with data for each of these columns is stored together.
  • Use Cases: Efficient for transactional operations like inserting, updating, or deleting rows, as it allows easy access to all the data in a specific row.

Columnar Storage

  • Description: Data is stored by columns rather than by rows. In the same table example, all data for column A is stored together, then all data for column B, and so on.
  • Use Cases: Particularly efficient for analytical queries that scan large datasets but only access a few columns.

Advantages of Columnar Storage

  • Improved I/O Efficiency: Only the necessary columns are accessed, reducing the amount of data read from storage.
  • Better Data Compression: Similar data in each column allows for more efficient data compression.
  • Faster Aggregation and Analytics: Beneficial for operations like SUM, COUNT, AVG, etc., on big datasets.

Common Use Cases

  • Data Warehousing and Business Intelligence: Handling large volumes of data and complex analytical queries.
  • Big Data Applications: Used in ecosystems like Apache Hadoop and Spark for processing large datasets.

Columnar storage is an approach focused on optimizing the reading, aggregating, and analyzing of large datasets, especially useful when queries involve only a subset of the columns in a database.

References:

--

--