Distributed transactions spanning multiple databases with Scalar DB (Part 1)

Toshihiro Suzuki
Scalar Engineering
Published in
4 min readMay 23, 2022

--

As we explained in Scalar: A Reliable Data Management Company, Scalar DB addresses a transaction consistency challenge that arises when multiple separated databases need to interact with each other. This article explains how Scalar DB addresses the challenge.

We provide the following two features for distributed transactions spanning multiple databases in Scalar DB:

  • Multi-storage Transactions
  • Two-phase commit Transactions

In this article, we first discuss Multi-storage Transactions (Part 1). And we are going to explain Two-phase commit Transactions in the following article (Part 2).

Overview of Multi-storage Transactions

Multi-storage Transactions is a feature that enables transactions to span multiple databases/storages for a single service/application.

As we explained in Scalar DB: Universal transaction manager, Scalar DB uses a client-coordinated transaction protocol called “Consensus Commit”. Therefore, it can execute ACID transactions without depending on the local transaction capabilities of individual databases/storages. So, with Scalar DB, we can not only execute ACID transactions on non-ACID distributed databases/storages, but also execute ACID transactions that span multiple databases/storages.

In addition, Scalar DB provides a database abstraction and a database-agnostic ACID transaction manager on top of the abstraction. Therefore, the transaction manager does not even know what database systems it interacts with. With this, we can naturally execute transactions across heterogeneous databases/storages in Multi-storage Transactions.

The protocol’s details in Multi-storage Transactions is the same as described in the article. We just apply the protocol to the records in multiple databases/storages in Multi-storage Transactions.

XA is another approach that enables transactions to span multiple databases. Although many relational databases support XA, NoSQLs such as Cassandra, Amazon DynamoDB, and Azure Cosmos DB don’t support it. By contrast, Scalar DB’s transactions can span various kinds of databases because of the database-agnostic approach. Scalar DB currently supports Cassandra, Amazon DynamoDB, Azure Cosmos DB, and JDBC databases (MySQL, PostgreSQL, Oracle Database, Microsoft SQL Server, Amazon Aurora).

Sample Application

We will explain how to use Multi-storage Transactions, using the same EC application we created in the previous article. Since this article assumes that you have read the previous article, we recommend that you read the article before reading this one.

You can get all the source code and configuration files on the following page. Please take a look if you want to run the application.

https://github.com/scalar-labs/scalardb-samples/tree/main/multi-storage-transaction-sample

Schema

The schema for the sample application is as follows:

ER diagram for the schema

The schema is almost the same as in the previous article. The difference is that the customers table is created in the customer namespace, and the orders table, the statements table, and the items table are created in the order namespace.

Transactions

The transactions are also the same as the ones in the previous article:

  1. Getting customer information
  2. Placing an order. An order is paid by a credit card. It first checks if the amount of the money of the order exceeds the credit limit. If the check passes, it records order histories and updates the credit_total
  3. Repayment. It reduces the amount of credit_total

Getting Started with Multi-storage Transactions

This section explains the steps to create the sample application.

Step1. Create a configuration file

In Multi-storage Transactions, you need to configure storage settings for each storage and which table operations are mapped to which storage.

database.properties

This configuration defines two storages, cassandra and mysql, in the scalar.db.multi_storage.storages property. And the storage settings of each of them is configured in the scalar.db.multi_storage.storages.cassandra.* properties and the scalar.db.multi_storage.storages.mysql.* properties respectively. The scalar.db.multi_storage.namespace_mapping property defines the mapping between namespaces and storages, in this case, operations for the tables in the customer namespace are mapped to the mysql storage, and operations for the tables in the order namespace are mapped to the cassandra storage. Note that it also defines that operations for the tables in the coordinator namespace are mapped to the cassandra storage. The tables in the coordinator namespace are created automatically and used in Consensus Commit. And the scalar.db.multi_storage.default_storage property defines the default storage that is used if a specified table doesn’t have any table mapping. In this case, if a specified table doesn’t have any table mapping, operations for the table are mapped to the cassandra storage.

Please see below for the details of Multi-storage Transaction configurations: https://github.com/scalar-labs/scalardb/blob/master/docs/multi-storage-transactions.md

Step2. Create the schema

The way to create the schema is the same as in the previous article: using a tool called Schema Loader.

The schema file is as follows. It’s also almost the same as the one in the previous article except for the namespace names.

schema.json

Then, you create the schema with the following command:

# java -jar scalardb-schema-loader-3.6.0.jar — config database.properties — schema-file schema.json — coordinator — replication-factor 1

This command creates the customer.customers table in MySQL, and the order.orders, order.statements, and order.items tables in Cassandra.

Step3. Implement the transactions

The transaction implementation will be almost the same (except for the namespace names) as in the previous article, so we will skip the explanation. If you would like to know more details, please read the previous article or see the following:

https://github.com/scalar-labs/scalardb-samples/tree/main/multi-storage-transaction-sample

Summary

This article explained distributed transactions spanning multiple databases with Scalar DB: it gave an overview of Multi-storage Transactions and its usage with the sample applications.

In the next article, we would like to explain Two-phase commit Transactions, the other feature of distributed transactions spanning multiple databases in Scalar DB.

--

--