SSTables and LSM Trees

Published in

The Developer’s Diary

5 min readJan 2, 2024

Exploring Advanced Database Storage: SSTables and LSM Trees

Welcome to Part 2 of our Database Storage series. Previously, we delved into hash indexing and its role in handling key-value data efficiently. However, we also identified key limitations: unsorted data and limited support for range queries. In this segment, we focus on SSTables (Sorted String Tables) and LSM Trees (Log-Structured Merge-Trees), advanced structures designed to overcome these challenges. These technologies not only enhance data organization but also significantly improve query performance. Join us as we explore how SSTables and LSM Trees redefine data retrieval in modern databases.

This article series draws inspiration from the renowned O’Reilly’s book on distributed systems, specifically the insights on storage engines and retrieval mechanisms.

From Book : Designing Distributed Systems: Patterns and Paradigms for Scalable, Reliable Services

SSTables (Sorted String Tables) and LSM Trees (Log-Structured Merge-Trees)

SSTables (Sorted String Tables):

What They Are: Data files storing key-value pairs in sorted order.
Main Benefit: Efficient for range queries due to their sorted nature.
Example: Think of an SSTable as a sorted list in a notebook, where you’ve written down names and phone numbers in alphabetical order for quick lookups.

LSM Trees (Log-Structured Merge-Trees):

What They Are: A data structure that uses layers, including SSTables, to manage data.
Write Optimization: Ideal for write-heavy environments. Data is first written to a fast in-memory layer and later moved to SSTables on disk.
Example: Imagine a secretary (the in-memory layer) taking quick notes (writes) throughout the day. At the end of the day, she organizes these notes into sorted files (SSTables) in a filing cabinet (disk).

Key Difference:

SSTables are like individual sorted files, while LSM Trees are the entire filing system managing these files, along with a process to handle and organize incoming data.
SSTables ensure efficient read operations, especially for ranges, while LSM Trees excel in managing high volumes of write operations, using SSTables as part of their layered structure.

Constructing and managing SSTables or LSM Trees

Creating SSTables or LSM Trees involves a process that efficiently manages data writes and reads, balancing in-memory operations with disk storage. Here’s how it typically works:

In-Memory Data Handling:

New write requests are added to an in-memory balanced tree structure, commonly known as a memtable. This can be implemented using efficient tree data structures like AVL or Red-Black trees.
The memtable maintains the data in sorted order, leveraging the ease of sorting in memory compared to disk

2. Writing to Disk as SSTables:

Once the memtable reaches a certain size, it’s written out to disk as an SSTable. This process benefits from the memtable’s already sorted nature, allowing for efficient creation of the SSTable.
The new SSTable becomes the most recent segment of the database.

3. Handling Concurrent Writes:

While one memtable is being written to disk, new write requests can be directed to a new memtable. This ensures continuous and uninterrupted write operations.

4. Reading Data:

Sparse Indexing: The in-memory index contains entries for a subset of keys, with each entry pointing to an offset within an SSTable. Because data in SSTables is sorted, a sparse index allows us to efficiently narrow down the location of any key.
Rapid Scanning: Since SSTables are sorted and the index is sparse, we can quickly scan a small portion of the SSTable to find the exact key. For example, if our index contains an offset for every 1KB of data, we can rapidly scan this 1KB segment to locate the desired key.
Example: Suppose you’re looking for the key “ci” in an SSTable, but the sparse index doesn’t have an exact offset for “ci”. However, it has offsets for “ce” and “ck”. Knowing that “ci” falls between “ce” and “ck”, the system only needs to scan the SSTable segment between these two points to find “ci”.

5. Merging and Compacting:

Periodically, a background process runs to merge and compact SSTable segments. This involves combining segments, discarding redundant data (like older versions of a key), and keeping only the most recent updates.
This compaction process helps in managing disk space and maintaining efficient read performance.

Performance Optimization

While LSM Trees offer efficient write operations and manageable disk space usage, they do face a challenge in read performance.

Challenge with Non-Existent Keys: A notable issue with LSM Trees is the inefficiency in searching for keys that do not exist in the database. The process involves checking the memtable and then sequentially searching through each SSTable segment, which can be time-consuming.

To optimize this, a concept known as a Bloom filter can be employed. Bloom filters efficiently indicate whether a key is likely to be absent in a segment, thus potentially avoiding unnecessary searches. This can significantly speed up lookups for non-existent keys.

Note: The detailed mechanics of Bloom filters and their integration into LSM Trees will be covered in a separate article.

Compaction Strategies in LSM Trees

Compaction is a crucial process in LSM Trees, involving the merging and reorganizing of SSTable segments. The strategy for compaction significantly impacts the database’s performance. Two common strategies are size-tiered and leveled compaction.

Size-Tiered Compaction:

How it Works: In size-tiered compaction, SSTables are merged based on their sizes. Smaller SSTables are merged first, gradually moving towards larger ones. This approach is efficient in write-heavy environments.
Example: Imagine a stack of papers on a desk, where each paper represents an SSTable. With size-tiered compaction, you’d first combine smaller, similarly-sized stacks of paper before dealing with the larger ones. This method keeps the merging process manageable and efficient.

2. Leveled Compaction (LevelDB Style):

How it Works: Leveled compaction divides SSTables into different levels. Each level has a size limit, and once an SSTable in a lower level reaches a certain size, it’s merged into a higher level. This method is beneficial for read-heavy workloads as it minimizes the number of SSTables to search through.
Example: Consider a filing system with multiple drawers, each representing a level. When a file (SSTable) in a lower drawer gets too full, its contents are organized and moved to the next drawer up. This keeps the number of files in each drawer (level) manageable, making it easier to locate specific documents.

Choosing a Strategy

The choice between size-tiered and leveled compaction depends on the specific workload and performance requirements of the database. Size-tiered is generally better for write-heavy workloads, while leveled compaction suits read-heavy scenarios with frequent read queries.

Limitations and Looking Forward

However, it’s important to recognize that LSM Trees are not without limitations. For instance, the compaction process can be resource-intensive, and read performance can be affected depending on the number of SSTable layers. These aspects, among others, are crucial considerations when designing a database storage system.

In our next part, we’ll delve into another cornerstone of database storage: B-Trees. B-Trees offer a different approach to data organization and retrieval, balancing the needs of read and write operations in a unique way. We’ll explore how B-Trees function, their advantages, and how they compare to LSM Trees, providing a comprehensive understanding of these fundamental database structures.

Stay tuned for our continued journey into the intricacies of database storage systems.