ScalarDBに関する論文がPVLDB (VDLB’23)にアクセプトされました

Hiroyuki Yamada
Scalar Engineering (JA)
4 min readOct 10, 2023

“ScalarDB: Universal Transaction Manager for Polystores” という論文がデータ工学およびデータベース分野のトップジャーナル(国際会議)であるPVLDB (VLDB’23) にアクセプトされました。この論文では、トランザクションマネージャ製品であるScalarDBにおいて、既存の研究をどのように拡張し、どのような技術的選択をし、現実世界のプロダクトとして作り上げる上での種々の課題をどう解決しかたについて論述しています。このエントリでは論文のAbstractとIntroductionを紹介します。

Abstract

This paper presents ScalarDB, a universal transaction manager that achieves distributed transactions across multiple disparate databases. ScalarDB provides a database-agnostic transaction manager on top of its database abstraction; thus, it achieves transactions spanning various databases without depending on the transactional capability of underlying databases. ScalarDB is based on several research works and extended to provide a strong correctness guarantee (i.e., strict serializability), further performance optimizations, and several critical mechanisms for productization. In this paper, we describe the design and implementation of ScalarDB. We also present evaluation results showing that ScalarDB achieves database-spanning transactions with reasonable performance and near-linear scalability without sacrificing correctness. Finally, we share some case studies and lessons learned while building and running ScalarDB.

One size does not fit all is becoming common sense in data management systems. Major cloud vendors offer several purpose-built database products to meet various users’ needs. For instance, Amazon AWS offers more than 10 database products, such as Aurora (relational), DynamoDB (key-value), Neptune (graph), and QLDB (ledger). Unsurprisingly, there are many cases where an application uses multiple databases to provide its services.

A microservice architecture accelerates the trend of managing multiple (potentially disparate) databases. Each microservice of a single application is encouraged to use an isolated database, which is selected based on the service’s use cases and the developers’ experiences for better maintainability and productivity. This architectural style is likely to make an application have different kinds of databases or multiple database instances of the same database.

Managing multiple disparate databases is not uncommon in enterprise systems as well. An enterprise comprises several organizations, departments, and business units to support agile business operations. This leads to siloed information systems; different organizations manage different applications at disparate locations, and the applications use different databases.

Obviously, a federation of multiple disparate databases is attractive for such applications to mitigate the complexity of dealing with the databases separately. Federated database systems and multi-database systems have been explored since around the 1990s to address the goal. Due to the increasing deployments of multiple disparate databases, there are also new demands, such as supporting various query notations, providing all the functionalities of underlying databases, and providing distributed transactions across multiple databases that do not support the same transaction model. The database community recently named such new federated database systems polystores to distinguish them from previous federated database systems.

In this paper, we present a universal transaction manager for polystores called ScalarDB, which achieves distributed transactions across multiple disparate databases. Specifically, ScalarDB provides a database-agnostic transaction manager on top of its database abstraction; thus, it achieves transactions spanning multiple disparate databases without depending on the transactional capability of underlying databases. ScalarDB is based on several research works and extended to provide a strong correctness guarantee (i.e., strict serializability), further performance optimizations, and several critical mechanisms for productization.

ScalarDB has been built to meet several key design goals: database agnosticism as a primary goal, strong correctness, reasonable performance, high scalability, and high availability. Specifically, ScalarDB provides a database-agnostic property and can run transactions not only on major relational database systems such as MySQL, MariaDB, PostgreSQL, Oracle Database, and Microsoft SQL Server but also NoSQL databases such as Apache Cassandra, Amazon DynamoDB, and Azure Cosmos DB while addressing the other design goals. Therefore, for example, it achieves scalable, strict serializable ACID transactions spanning PostgreSQL and Amazon DynamoDB, which cannot be easily realized with existing solutions.

Existing off-the-shelf solutions aiming to achieve distributed transactions over multiple disparate databases do not fully match our design goals. Oracle Tuxedo, Atomikos, and Seata (XA mode) are middleware that manage distributed transactions over multiple databases based on X/Open XA, which is a standard specification for allowing multiple independent resources to participate in a single and distributed transaction by using the two-phase commit (2PC) protocol. Although the two-phase commit protocol based on XA can work on XA-compliant databases, such as major relational databases, it cannot run transactions on other databases, such as NoSQL databases and recent distributed SQL engines that are not XA compliant. Making databases XA compliant is not necessarily straightforward or impractical due to the rigid XA specification. Several frameworks help users to implement other approaches, such as Try-Confirm/Cancel (TCC) and Saga, to run distributed transactions over multiple databases in a non-strict and lightweight way. These approaches could work on a wider range of databases. However, they only guarantee eventual consistency and weaker isolation than serializable because they realize an application-level transaction by using multiple database-level transactions; the applications must deal with transaction anomalies by themselves.

ScalarDB is a production-grade system that has been used for real-world applications. It is also cloud-agnostic and designed for cloud-native applications. ScalarDB is provided as a Docker container and can easily be deployed to various environments such as Kubernetes. The source code for the core components of ScalarDB is available on GitHub under the Apache 2.0 License.

This paper makes the following contributions:

  • We describe the design and implementation of ScalarDB, a universal transaction manager that achieves database-agnostic and database-spanning transactions. Specifically, we describe how ScalarDB has incorporated and extended previous research efforts to build a practical and cloud-native product.
  • We present evaluation results showing that ScalarDB achieves database-spanning transactions with reasonable performance and near-linear scalability. We also show ScalarDB’s database-agnostic property by evaluating ScalarDB on several database systems.
  • We share a couple of case studies of ScalarDB. We also share lessons learned not only from the experiences of building ScalarDB over the last five years but also from our production experiences.

Summary

このエントリでは、PVLDB (VLDB’23) にアクセプトされたScalarDB論文のAbstractとIntroductionを紹介しました。ご興味がある方はぜひ論文を読んでみてください。

--

--

Hiroyuki Yamada
Scalar Engineering (JA)

CTO of Scalar, Inc. Passionate about parallel and distributed data management systems.