PhoenixDB: Empowering Scalability and Fault Tolerance with Erlang and LevelDB

Romil Choudhary Rc
redbus India Blog
Published in
6 min readDec 15, 2023

Introduction

A key-value store is a crucial part of an application and for many use cases we want a low cost and efficient database which can store our data in compressed format and has high speed. This was our motivation at redBus to use levelDB. We have built a key-value store which supports logging through Folsom and inbuilt sharding capabilities. It also has support for automatic cleanup of expired data.

What is LevelDB?

LevelDB is an open-source, high-performance key-value storage library developed by Google. It is designed to provide a reliable, efficient, and lightweight solution for storing and retrieving key-value pairs. It internally stores data inside a single file and uses Goggle’s snappy for compression.

We wanted to scale the features provided by LevelDB as it is a single file based storage there is high chance of getting bottleneck so we handled sharding in our application using Erlang.

Key features of LevelDB are :

  1. Sorted Order: LevelDB maintains data in sorted order based on keys. This allows for efficient range queries, where you can retrieve a range of keys in a sorted manner.
  2. Atomic Batch Operations: LevelDB supports atomic batch write operations, allowing you to group multiple write operations into a single atomic batch. This ensures that either all the operations in the batch succeed, or none of them do.
  3. Snapshot Isolation: LevelDB provides snapshot isolation for reads. This means that once a read transaction begins, it sees a consistent snapshot of the database, even if other write transactions are in progress. This helps in providing a consistent view of the data.
  4. Compaction: LevelDB uses a background process called compaction to efficiently manage and reclaim storage space. It merges overlapping key ranges and discards obsolete data, helping to maintain optimal performance and reduce storage usage.
  5. Write-Ahead Logging: LevelDB uses a write-ahead log to ensure durability. Before any data modification is performed, it is first written to a log on disk. This ensures that modifications are recoverable in case of a crash or unexpected shutdown.
  6. Portability: LevelDB is designed to be portable and can be used on various platforms. It provides APIs for C++ as well as bindings for other programming languages.

LevelDB has been used in various projects, and it served as the foundation for other storage systems, including Apache Cassandra and Hyperledger Fabric’s state database. It is also one of the supported backends for Riak.

What is Erlang?

Erlang is a programming language designed for building scalable and fault-tolerant systems, particularly in the context of concurrent and distributed computing. It was developed by Ericsson, a Swedish telecommunications company, in the late 1980s. The language has gained prominence for its unique features that make it well-suited for building robust and highly available systems.

Here are some key features and characteristics of Erlang:

  1. Concurrency-Oriented: Erlang is built around the concept of lightweight, concurrent processes. These processes are independent units of execution with their own memory space, enabling massive concurrency. Erlang’s concurrency model is based on message passing between processes, allowing for communication and coordination.
  2. Fault Tolerance: Erlang places a strong emphasis on fault tolerance. It allows for the isolation of processes, and if one process fails, it does not affect the others. Processes can be monitored, and supervision trees can be built to handle errors gracefully, making Erlang well-suited for building robust and reliable systems.
  3. Hot Code Swapping: One of the unique features of Erlang is the ability to perform hot code swapping, allowing new code to be loaded into a running system without disrupting its operation. This is particularly useful in applications that require high availability and continuous uptime.
  4. Functional Programming: Erlang is a functional programming language, emphasizing immutability, pure functions, and declarative programming. This functional paradigm makes it well-suited for expressing complex concurrent and distributed systems.
  5. Distributed Computing: Erlang is designed for distributed computing from the ground up. It includes facilities for transparent communication between processes running on different nodes, allowing for the creation of distributed and scalable systems.
  6. Telecom Heritage: Erlang was initially developed by Ericsson for telecommunication systems. Its design was influenced by the need for highly reliable, fault-tolerant, and scalable systems in the telecommunications industry.
  7. Pattern Matching: Erlang’s pattern-matching capabilities make it expressive and concise. Pattern matching is used extensively in functions and case statements, contributing to readable and maintainable code.
  8. Garbage Collection: Erlang includes a sophisticated garbage collector that automatically manages memory, making it easier for developers to focus on application logic rather than manual memory management.
  9. Open Source: Erlang is open source and has a permissive license, making it freely available for use and modification.
  10. Ecosystem: Erlang has a rich ecosystem with libraries, frameworks, and tools that support various application domains, including web development, distributed systems, and telecommunications.

Why LevelDB with Erlang ?

  1. Key-Value Store: Erlang’s data model is inherently well-suited for key-value stores, and LevelDB is a key-value storage engine. This alignment makes it easier to integrate LevelDB into Erlang applications seamlessly.
  2. Concurrency: Erlang is designed for concurrent and distributed programming, and LevelDB handles concurrency well. This concurrency support is crucial in systems where multiple processes or nodes may access the database simultaneously. Erlang’s lightweight processes and message-passing model complement LevelDB’s ability to handle concurrent read and write operations.
  3. Sorted Order and Range Queries: LevelDB maintains data in sorted order based on keys. This is advantageous when you need to perform range queries or retrieve keys in a specific order, aligning with Erlang’s pattern-matching capabilities and functional programming paradigm.
  4. Atomically Batched Writes: LevelDB supports atomic batch write operations, allowing you to group multiple write operations into a single atomic batch. This can be beneficial when you need to perform multiple writes atomically, which is in line with Erlang’s emphasis on fault-tolerance and reliability.
  5. Portability: LevelDB provides C++ APIs, and Erlang can easily interface with C and C++ libraries. This makes it straightforward to integrate LevelDB into Erlang applications, allowing developers to take advantage of LevelDB’s features while benefiting from Erlang’s strengths.
  6. Community and Ecosystem: Both Erlang and LevelDB have active communities and are used in various projects. This means that there may be community support, libraries, and resources available when integrating LevelDB with Erlang.

Initial Setup Steps:

1. sudo apt-get install build-essential
2. sudo apt-get install libssl-dev
3. Download and install Erlang OTP(24) : https://stackoverflow.com/questions/44685813/how-do-i-install-a-specific-version-of-erlang-otp
4. Install snappy using : apt install libsnappy-dev
5. Clone from repository: https://github.com/RomilChoudhary1/PhoenixDB

Test Application Locally using

# Ensure you already have folder : /opt/phoenix_datastore/leveldb-data 
Run : sudo ./rebar3 as test shell

Building & Release

1) Take Latest Pull in your repository
2) Inside code repository run: make deb-package
3) Navigate to the debian package and run : sudo dpkg -i phoenix-0.1.0_x86_64.deb
4) sudo systemctl daemon-reload
5) Run or restart app using : sudo service phoenix start/restart

Important Notes:

1) Default expired data cleanup for each shard is 1 hour which can be changed in phoenix_cleanup_server file.
2) There are only GET, POST & DELETE API's exposed but various other methods have also been added anyone could easily add new API's for those functionalities.

API DOCUMENTATION:

1) Add element to store:
curl - location 'localhost:8001/phoenix/shard/k5?exp=50'
- header 'Content-Type: application/json'
- data '1234'

2) Get element from store:
curl - location - request GET 'localhost:8001/phoenix/shard/k5'
- data ''

3) Delete element from store:
curl - location - request DELETE 'localhost:8001/phoenix/shard/k5'
- data ''

Here field ‘shard’ is the shard or partition key, and ‘k5’ is the key name. Expiry is set in seconds using query parameter exp.

Wrapping up

In conclusion, the integration of LevelDB and Erlang to build a key-value store results in a powerful and flexible system with several notable advantages.

For our use case, where we had large JSON object data from different vendors to be stored in compressed format in disk storage, these were the latency figures we observed on our machine with configuration as m5a.large with 2 CPU, 8Gb memory and request count upto 341K RPM.

We saw by combining Erlang’s strengths in concurrency and fault tolerance with LevelDB’s efficient key-value storage, this architecture creates a robust and scalable key-value store suitable for a variety of applications, ranging from real-time systems to distributed databases.

Thanks for reading this article, have a great day!

--

--