Unlocking the Power of JunoDB: PayPal’s Key-Value Store Goes Open-Source
Today we are delighted to share JunoDB as an open-source project on Github, allowing others to benefit from our efforts to have an extremely scalable, secure and highly available NoSQL infrastructure.
JunoDB is a distributed key-value store that plays a critical role in powering PayPal’s diverse range of applications. Virtually every core back-end service at PayPal relies on JunoDB, from login to risk to final transaction processing. With JunoDB, applications can efficiently store and cache data for fast access and load reduction on relational databases and other services. However, JunoDB is not just another NoSQL solution. It was built specifically to address the unique needs of PayPal, delivering security, consistency, and high availability with low latency, all while scaling to handle hundreds of thousands of connections. While other NoSQL solutions may perform well in certain use-cases, JunoDB is unmatched when it comes to meeting PayPal’s extreme scale, security, and availability needs. From the ground up, JunoDB was designed to be cost-effective, ensuring that PayPal can maintain its high standards of quality and operational excellence while keeping costs manageable.
JunoDB has been a critical part of PayPal’s infrastructure, evolving through multiple generations to become the highly reliable, consistent, secure, scalable and performant distributed key-value store that it is today. It started as a single-threaded C++ program but has since been rewritten in Golang to be highly concurrent and multi-core friendly. JunoDB has also evolved from an in-memory short TTL (Time To Live) data store to a persistent data store that supports long TTLs, providing improved data security via on disk encryption and TLS in transit by default. JunoDB’s journey has also involved quick scaling out through data redistribution, enabling it to handle the ever-increasing volume of requests. Today, JunoDB with six 9’s of system availability powers almost all PayPal applications, handling 350 billion daily requests.
“JunoDB plays an important role in powering applications and back-end services across PayPal. By making JunoDB available today on GitHub, we hope to enable the industry to leverage our highly scalable, secure and available NoSQL infrastructure.” — Archie Deskus, EVP and Chief Information Officer, PayPal
Common Use Cases
Below are the most prevalent use cases of JunoDB, starting with the simplest and moving on to more complicated ones. Some use cases may be present/implemented in combination over a single user call within PayPal.
- Caching is a well-known and popular pattern, and JunoDB is often used as a temporary cache to store data that doesn’t change often, ranging from a few seconds to a few days. This can include user preferences, account details, API responses, access tokens, and more. Using JunoDB for caching helps reduce calls to expensive databases and downstream services across all domains, making it a versatile solution. JunoDB has been used to effectively replace legacy caching use cases.
- Idempotency is another popular pattern where we use JunoDB as a short TTL, high available store to ensure that an operation is idempotent and remove duplicate processing. As an example, JunoDB is used to ensure that we do not reprocess payments during retries or resend messages from the notification platforms. In the distributed locking variation, JunoDB is used to ensure that only one process is executing a required operation.
- Counters are a newer and smaller set of use cases for JunoDB — as an example, we have used JunoDB to provide a limits type counter when certain resources are unavailable. This enables PayPal to be available and compliant.
- SoR While JunoDB is not considered a Permanent SoR (System of Record), we do use JunoDB for a limited set of long term (multi-year) SoR needs.
- Latency bridging using JunoDB’s quick inter-cluster replication helps address replication delays in the Oracle processing, enabling near-instant, consistent reads everywhere. This “latency bridging technique” ensures high availability, reliability, and read scaling for multi data center active-active processing for account/user creation and payments processing.
JunoDB Architecture: A High-Level Overview
We’ll first provide you with a very high-level overview of the JunoDB architecture, laying the foundation for a deeper understanding of the system. As we progress, we will delve into the various aspects of the design that support scalability, availability, performance, and security, revealing how JunoDB addresses each of these critical concerns.
The JunoDB architecture is a highly reliable and scalable solution designed with simplicity, scalability, security, and adaptability in mind. It is based on a proxy-based design that enables linear horizontal connection scaling and simplifies the development process by removing complexity from the client library and keeping complex logic and configuration out of applications. JunoDB uses consistent hashing to partition data and minimize data movement when clusters are expanded or shrunk. To achieve zero downtime, JunoDB uses within-data center and cross-data center replication, and it ensures data consistency with a quorum-based protocol and two-phase commit.
Security is a top priority, with TLS support and payload encryption protecting data both over the wire and at rest. Finally, the pluggable storage engine design enables easy upgrades to new storage technologies, ensuring that JunoDB remains adaptable and can evolve over time to meet changing needs and requirements.
JunoDB comprises three key components that work together seamlessly:
- JunoDB client library resides in applications and provides an API that allows for easy storage, retrieval, and updating of application data through the JunoDB proxy. The JunoDB thin client library is implemented in several programming languages, such as Java, Golang, C++, Node and Python, making it easy to integrate with applications written in different programming languages.
- JunoDB proxy instances are driven by a load balancer and accept client requests and replication traffic from other sites. Each proxy connects to all JunoDB storage server instances and forwards each request to a group of storage server instances based on the shard mapping maintained in ETCD, the data store which saves JunoDB cluster configurations.
- JunoDB storage server instances accept operation requests from proxy and store data in memory or persistent storage using RocksDB. Each storage server instance is responsible for a set of shards, ensuring smooth and efficient data storage and management.
Over a decade ago, in order to support PayPal’s sustained rapid growth in active customers and payment rates, the company transitioned to a horizontally scaled micro-services architecture at the application layer. However, a drawback of the micro-services architecture is the increased number of persistent inbound connections to key-value stores. Since no commercial or open-source solutions were available to handle the required scale out-of-the-box, we developed our own solution to adopt a horizontal scaling strategy for key-value stores, aligning with the scaling of the application tier. JunoDB addresses two primary scaling needs in distributed key-value stores:
- Accommodating a growing number of client connections to ensure seamless access and system responsiveness.
- Handling data growth and ensuring efficient read and write throughput as data volume and access rates increase.
Scaling for Client Connections
As previously mentioned, a proxy-based architecture has been chosen to facilitate horizontal connection scaling. In this setup, clients are lightweight, eliminating the need to establish connections with all storage nodes. Should client connections ever reach their limits, additional proxies can simply be added. While this method may marginally increase latency, it offers a highly effective solution for scaling out.
Scaling for Data Volume and Access Throughput
As data size grows, it is essential to distribute data across multiple storage nodes or servers to ensure efficient storage and retrieval through data partitioning schemes. JunoDB leverages consistent hashing for effectively partitioning the fixed number of partitions (shards), which are then assigned to physical storage nodes using a shard map. When the number of nodes in the cluster changes due to additions or removals, only a minimal number of shards require reassignment to different storage nodes. Moreover, we have introduced micro-shards within each shard, serving as the building blocks for data redistribution. The total number of shards should be sufficiently large and remain constant throughout the cluster’s lifetime. In production, we typically employ 1,024 shards. The shard map is pre-generated and stored in ETCD. Changes to the shard map will trigger a data redistribution process. Our efficient data redistribution process enables quick incremental scaling of a JunoDB cluster to accommodate traffic growth. Currently, a large JunoDB cluster could comprise over 200 storage nodes, processing over 100 billion requests daily.
Maintaining high availability is crucial for PayPal in upholding its reputation as a reliable and secure payment platform. However, unforeseen system outages can occur due to various reasons such as software bugs, hardware failures (e.g., disk or network component failures), power outages, or even human error. In some cases, these failures can lead to data loss, slow system response times, or complete system unavailability, which can negatively impact users’ experience and trust in the platform. To mitigate these challenges, JunoDB employs a combination of within-data center and cross-data center data replication as well as failover strategies to achieve six 9s of system availability.
Replication within cluster
Within a cluster, JunoDB storage nodes are logically organized into a grid, where each column represents a zone, and each row signifies a storage group. Data are partitioned into shards and assigned to storage groups. Within a storage group, each shard is synchronously replicated across various zones based on the quorum protocol. In a bare metal environment, each zone resides in a separate physical rack, without shared power or switch connections. In cloud environments, zones correspond to multiple availability zones within the same region.
A quorum-based protocol is used to reach consensus on a value within the storage group. To ensure data consistency, it is crucial to adhere to two key rules.
- First, the sum of the read quorum (R) and write quorum (W) must be greater than the number of zones (N): W+R > N. This ensures that the read quorum includes at least one node containing the most recent version of the data.
- Second, the write quorum must be more than half of the number of zones: W > N/2, which prevents two concurrent write operations on the same key. Typically, in production, PayPal utilizes a configuration with 5 zones, a read quorum of 3, and a write quorum of 3.
In the event of a node malfunction, JunoDB’s failover process is both automatic and instantaneous, eliminating the need for leader re-election or data redistribution. Proxies can detect node failure through a lost connection or a timed-out read, typically configured with a read timeout of 100 milliseconds or less. JunoDB can withstand multiple node failures in a cluster, as long as no more than two breakdowns occur within the same row/group. Additionally, this design allows for taking an entire zone offline for maintenance purposes, such as software or OS upgrades, without causing downtime. This robust architecture enables an exceptionally high level of availability, ensuring uninterrupted service even during node failures.
Cross-data center replication between clusters
To further enhance the resilience and reliability of the JunoDB system, cross-data center replication is also implemented by asynchronously replicating data between proxies of each cluster across different data centers. This strategy ensures that even in the event of a catastrophic failure at one data center, the system can continue to operate without downtime. By incorporating cross-data center replication alongside the existing within-cluster replication approach, JunoDB strengthens its high availability and fault tolerance, guaranteeing uninterrupted service and a seamless user experience for PayPal.
Performance at Scale
JunoDB delivers high performance at scale, handling the most demanding workloads while maintaining single-digit millisecond response times and a seamless user experience. Furthermore, JunoDB allows applications to achieve linear scalability without sacrificing performance, providing high throughput and low latencies. This allows applications to focus on addressing business problems without worrying about scalability. The following benchmark results demonstrate JunoDB’s ability to scale with large number of persistent connections and high throughput in a 4-node cluster with 2/3 read and 1/3 write workload.
For more information on the specific testing methodologies and machine configurations used in the benchmark, please refer to the complete benchmark report available in the JunoDB Repository.
As one of the most trusted brands in the world, security is of paramount importance at PayPal. The JunoDB system has been designed to secure data both in transit and at rest. To maintain data security during transmission, TLS is enabled between the client and proxy, as well as between proxies located in different data centers for data replication. Payload encryption is performed either at the client or proxy level (outside the storage server) to prevent multiple encryptions of the same data. Ideally, encryption should occur at the client side; however, if not executed by the client, the proxy will detect this through a metadata flag and carry out the encryption. All data received by the storage server and stored in the storage engine are encrypted to maintain security at rest. A key management module is employed to manage certificates for TLS, sessions, and the distribution of encryption keys as well as to facilitate key rotation.
From a Successful Past to a Bright Future
This post outlines our motivations for developing JunoDB, its business value for PayPal, and its key features. In summary, JunoDB is a distributed key-value store that plays a crucial role in supporting various PayPal applications, providing efficient data storage and caching for fast access and database load reduction. What sets JunoDB apart is that it was purpose-built to meet PayPal’s unique requirements, delivering security, consistency, and high availability with low latency while handling a large number of connections and high throughput. Additionally, JunoDB is cost-effective, allowing PayPal to maintain high standards while managing costs.
Be on the lookout for future articles that will delve deeper into JunoDB’s design and implementation. To gain an understanding of how to utilize JunoDB effectively, we invite you to explore our server setup (manual build, docker build) demo video and the accompanying client-building tutorial video.
Having matured for over a decade within PayPal’s production environment and evolved through multiple revisions and rewrites, JunoDB is now accessible to the broader community as an Apache 2-licensed project. We encourage you to check out our guidelines on contributing and help us further enhance this powerful product.
Looking ahead, we are excited to announce the upcoming features on the horizon, including JunoDB clients in Golang, as well as a JunoDB operator for Kubernetes. Stay tuned for these developments and more.
A special thanks goes out to all the folks who helped developed JunoDB over the years, namely the folks below:
Yaping Shi, Xuetao Li, Kamlakar Singh, Mukundan Narayanan, Varun Sankar, Eric Leu, Vera Cai, Yuan Yu, Joseph Stanislas, Neetish Pathak, Mark Lippincott, Kunal Somani, Dwain Theobald, Lalitha Natraj, Nitish Tripathi, Vivek Reddy Kotha, Nishant Vyas
Our sincere thanks to Vivek Reddy Kotha, John Kanagaraj, Vera Cai, Bala Natarajan, Varun Sankar, Michael Lee, Paul Staats and Mani Iyer for their insights, use cases, performance benchmarks, feedback, and comments.