LSM Trees: the Go-To Data Structure for Databases, Search Engines, and More

Ankit Dwivedi
6 min readMay 6, 2023

--

I dive into the fascinating world of LSM Trees and how they revolutionize the way large amounts of data are stored and retrieved. 🌳💡

What is a LSM tree?

Overview of LSM Tree design

LSM tree, short for Log-Structured-Merge Tree, is a clever algorithm design that helps us store massive amounts of data without making us wait forever to write it. It stores data in memory first, which is lightning fast. But since we can’t keep everything in memory, the LSM Tree periodically flushes data to disk.

But here’s where it gets even cooler! It organizes the data into layers of sorted structures, each with more and more compressed data. The top layer is the fastest to access but the least compressed. As the data piles up, it’s compressed and written into a new layer called an SSTable (Sorted String Table). We can decide how many layers we need and how much compression to apply, based on the type of data we’re dealing with. This trick helps us minimize disk I/O operations and make our data storage super efficient! In this blog we will go through the read and write operations in detail.

How write works in LSM Tree?

Let’s dive deeper into how writing works in an LSM Tree.

When you want to write data to the LSM Tree the first stop is the in-memory layer of the LSM Tree, which is like the top part of the tree and is super fast as well, because it’s stored in memory!

However, keeping all the data in memory indefinitely is not practical, so the LSM tree periodically flushes the data from the in-memory layer to disk.

But here’s the clever part: before the data is flushed to disk and organized into SSTables, it is also written to the Write-Ahead Log (WAL).

The Write-Ahead Log acts as a backup, ensuring the durability of the data. It serves as a log of all the changes made to the database, acting as a safety net in case of system failures or crashes.

Write flow in LSM tree

So, when a write operation occurs,

  • The data is first written to the in-memory layer for speedy performance.
  • Simultaneously, the changes are recorded in the Write-Ahead Log to ensure data integrity.
  • Then, at regular intervals or when the in-memory layer reaches a certain threshold, the data is flushed to disk and organized into SSTables.

SSTables, or Sorted String Tables, are essentially just sorted key-value pairs that are written to disk.

One of the best things about SSTables is that they’re incredibly efficient for searching and reading data. As you accumulate more data, more levels of SSTables are created, with each layer being more compressed than the previous one. The number of levels and amount of compression can vary depending on the use case, but the general idea is to keep compressing the data as you go down the layers.

Now, you might be wondering why the LSM Tree uses this approach.

The answer is simple: it’s great at minimizing disk I/O operations. By writing to disk periodically and in larger batches, you’re reducing the number of times the disk has to spin up and down to access data.

How is data read from LSM tree?

When you want to read data from an LSM tree, the process begins with a user query. You’re searching for a specific piece of information, and the LSM tree jumps into action to find it for you.

The first place the LSM tree checks is the in-memory layer. Remember, this layer is super fast to access because it’s stored in memory. So, if the data you’re looking for happens to be in this layer, hooray! The LSM tree quickly retrieves it, and you get your desired information without any delay.

But, what if the data you’re searching for is not in the in-memory layer? Well, don’t worry, the LSM tree has a plan for that too! It moves on to the next step, which involves searching the on-disk layer, where the data is stored in what we call SSTables.

Now, this is where the bloom filter comes into play. Picture the bloom filter as a clever little assistant that helps the LSM tree narrow down its search. Before diving into each SSTable, the LSM tree consults the bloom filter to see if the data you’re looking for might exist in a particular SSTable. The bloom filter gives a probabilistic answer — it either says “the data might exist” or “the data definitely doesn’t exist.”

Read flow in LSM tree

If the bloom filter indicates that the data might exist in a specific SSTable, the LSM tree jumps into action again and starts searching that SSTable. It scans the sorted key-value pairs within the SSTable until it either finds the data you’re looking for (yay!) or realizes it’s not there.

On the other hand, if the bloom filter confidently declares that the data definitely doesn’t exist in a particular SSTable, the LSM tree skips that SSTable and moves on to the next one. It’s like the bloom filter acts as a reliable guide, showing the LSM tree which SSTables are worth exploring and which can be skipped, saving time and effort.

And that’s how reading works in an LSM tree! The combination of checking the in-memory layer, leveraging the bloom filter, and searching the on-disk SSTables allows for efficient and speedy retrieval of data.

Diverse range of LSM use cases.

use cases of LSM tree
  1. NoSQL databases One of the primary use cases of LSM trees is in NoSQL databases. These databases are designed to handle large amounts of unstructured or semi-structured data, and the LSM tree architecture aligns perfectly with their requirements. LSM trees offer excellent write performance, which is crucial for managing the high data ingestion rates typically encountered in NoSQL databases. The ability to efficiently handle write-intensive workloads makes LSM trees an ideal choice for storing and managing the vast amounts of data these databases handle.
  2. Time series databases are another area where LSM trees shine. Time series data is characterized by its timestamped nature, where data points are associated with specific time intervals. LSM trees provide efficient storage and retrieval of time series data, thanks to their sorted structure. This allows for quick access to data points based on timestamps, enabling efficient analysis and querying of time-based data.
  3. Search engines, LSM trees play a vital role in facilitating fast and accurate search operations. Search engines need to index and retrieve vast amounts of data quickly to provide relevant search results. LSM trees, with their ability to handle large datasets and provide efficient read operations, are a natural fit for search engine architectures. They allow for speedy retrieval of indexed data, making search queries lightning-fast and providing a seamless user experience.
  4. Log systems LSM trees also find their place in log systems, such as those used for real-time event streaming or log processing. In log systems, data needs to be written in an append-only manner, preserving the order of events. LSM trees excel in this scenario, as they offer efficient write operations by sequentially appending new data to the in-memory layer. The write-ahead log (WAL) ensures durability and recovery in case of system failures, further enhancing the reliability of log systems.

So, whether it’s handling massive amounts of data in NoSQL databases, managing time series data efficiently, powering lightning-fast search engines, or enabling reliable log systems, LSM trees have established themselves as a go-to choice in various domains. Their unique characteristics, such as efficient write performance, scalability, and data compression, make them a valuable asset in today’s data-intensive applications.

In case you have any doubts or suggestions leave them in the comments :)

References

--

--

Ankit Dwivedi

Engineer at Stripe | ex- Google | Building largescale systems