ScalarDB Benchmark with TPC-C

Published in

Scalar Engineering

13 min readDec 6, 2022

This blog post introduces how to benchmark the performance of ScalarDB using TPC-C, which is an industry-standard benchmark for online transaction processing (OLTP). (Japanese version is here.)

The benchmark tool used here is implemented as a ScalarDB application program, which generates TPC-C workloads based on the specification. This post also introduces how real-world complicated workloads and applications like TPC-C can be implemented using ScalarDB’s data model and interfaces and what are the key design considerations. So, we hope that those considerations will be helpful for you when developing ScalarDB applications.

Overview of TPC-C

TPC-C is an industry-standard benchmark for OLTP databases. TPC-C models a warehouse-centric order processing application, and the database used in the TPC-C benchmark consists of nine tables, such as Warehouse, Customer, Order, Item, and so on (See the below ER diagram cited from TPC-C Standard Specification Revision 5.11). Except for the item table, each record is populated on a per-warehouse basis, and the number of warehouses can be configurable as a scale factor.

TPC-C has five transaction types: NewOrder, Payment, OrderStatus, Delivery, and StockLevel. The request rate of each transaction is defined in the specification, and almost 90% of transactions are NewOrder and Payment, which are write-intensive. TPC-C transactions mostly access a single (local) warehouse, but about 10% of transactions interact with another (remote) warehouse.

How to benchmark ScalarDB with TPC-C

The benchmark tools for ScalarDB are provided in the following repository. Currently, we provide TPC-C and YCSB as available workloads.

ScalarDB Benchmarks

First, download the latest archive of Kepie from here, which is a simple yet general framework for performing end-to-end testing since ScalarDB benchmark tools depend on Kelpie.

Then, build benchmark tools. Prepare an environment to run Java (OpenJDK 8 or higher) and execute the below command.

./gradlew shadowJar

Then, define the tables for the benchmark using ScalarDB Schema Loader. Get the latest schema loader here and execute it with “tpcc-schema.json”.

java -jar scalardb-schema-loader-<version>.jar --config /path/to/scalardb.properties -f tpcc-schema.json --coordinator

Then, prepare a configuration file for the benchmark, which describes your preferred scale factor, the number of worker threads for loading and executing, and so on. A sample configuration file, “tpcc-benchmark-config.toml” is in the root directory of the benchmark repository.

The scale factor can be configured by num_warehouse in the [tpcc_config] section, and the number of worker threads for loading can be configured by load_concurrency in the same section. The number of worker threads for executing can be configured by concurrency in the [common] section. For other options, refer to the README.

Now, you are all set. Execute the following Kelpie command with the prepared configuration file to run the benchmark. If successfully done, you can get throughput and average latency.

${kelpie}/bin/kelpie --config your_config.toml

Note that ${kelpie} is a Kelpie directory, which is extracted from the archive you downloaded above. You can specify the --only-pre option to only execute the initial data loading or the --except-pre option to skip the data loading and run the benchmark using existing data

TPC-C benchmark implementation

Then, we describe some design considerations to implement the TPC-C benchmark with ScalarDB.

Schema Design

One of the most important points is how the TPC-C table schema is represented in the ScalarDB data model.

Many queries in TPC-C transactions are point queries that access a single record by specifying a primary key. Therefore, if you specify each table’s primary key column(s) as partition keys in ScalarDB, you can write such point queries of TPC-C using the get/put/delete interfaces of ScalarDB. In addition, columns and data types can basically be mapped in ScalarDB as they are.

For some TPC-C tables, scan queries are issued to get multiple records at once. ScalaDB, which employs a multidimensional key-value model, can handle these queries more intuitively and simply than the naive key-value model. To scan records efficiently, it is important to properly separate those tables’ composite primary key into the partition key and clustering key.

For example, the Order-Status transaction must retrieve an order’s details (order lines) by specifying the Warehouse ID, District ID, and Order ID. The Stock-Level transaction also requires retrieving the last 20 order lines for a district of a warehouse (by specifying Warehouse ID, District ID).

The primary key of the order-line table is (Warehouse ID, District ID, Order ID, Order-Line Number). If we assume only Order-Status transactions and specify (Warehouse ID, District ID, Order ID) as the partition key, and Order-Line Number as the clustering key, the Stock-Level transaction would have to issue a total of 20 scans to retrieve the details for each order. We thus use (Warehouse ID, District ID) as the partition key, and the rest as the clustering key and enable both transactions to query records efficiently by using scan filters.

Secondary Index

Some scan queries for TPC-C transactions require a secondary index. These are the following two cases.

Scan for Customer ID by specifying (Warehouse ID, District ID, Customer Last Name)
Scan for Order ID by specifying (Warehouse ID, District ID,
Customer ID)

ScalarDB currently supports the secondary indexing feature for a single column only. Therefore, the TPC-C implementation in scalardb-benchmarks uses an index column to store a single string that is encoded from the values of the target index keys.

If you do not use ScalarDB’s secondary index due to limitations of the underlying database system or other reasons, there is another way to achieve the same functionality as secondary indexes using regular tables. If you want to use this alternative in scalardb-benchmarks, you can specify “ — use-table-index”. However, it is disabled by default because it increases the number of reads and updates of records, which affects performance.

Join and aggregate queries

TPC-C is a benchmark for OLTP systems, but some queries require simple aggregation via COUNT() or SUM() and join. Although the ScalarDB data access API does not provide aggregate or join queries, these queries can be easily implemented by the application (without sophisticated optimizers and execution engines).

Specifically, the TPC-C implementation of scalardb-benchmarks has application-side aggregations in the following two cases.

Delivery transaction calculates the total amount for an order (aggregation)
Stock-Level transaction counts the number of items specified in the most recent order whose stock in the warehouse is below a threshold (join and aggregation)

Summary

This post introduced how to evaluate the performance of ScalarDB using TPC-C, a standard benchmark for OLTP systems, and the implementation of the TPC-C benchmark using ScalarDB. For details on usage and implementation, see README and code in the following repository.

ScalarDB Benchmarks

ScalarDB Benchmark with TPC-C

Overview of TPC-C

How to benchmark ScalarDB with TPC-C

TPC-C benchmark implementation

Summary

Written by Jun Nemoto