Use go-ycsb to benchmark different databases (1)

siddontang
5 min readJan 13, 2019

--

Yahoo! Cloud Serving Benchmarking (aka YCSB) is a famous benchmark platform written in Java, to benchmark different databases.

Benchmark Tiers

YCSB focuses on two aspects — performance and scaling.

For the performance, YCSB mainly focuses on latency. There is a tradeoff between latency and throughput, under the condition that hardware is fixed, the latency will be increased if the workload increased because of the contention in the disk, CPU, etc. So we need to know how many machines should be prepared to satisfy the customer’s need for latency and throughput. Of course, fewer machines means the better performance of the system.

For benchmarking performance, YCSB uses a common way inspired by Wisconsin Sizeup — keeping the hardware constant. increasing the requests until the system meets bottleneck and overloads.

For benchmarking scaling, one way is “scale up”, which means that the machines, data size, and workload are all increased proportionally, and the latency should be kept the same. The other way is “elastic speedup”, which means that after the new servers are added, the latency should be decreased if the system is elastic.

Hello go-ycsb

At first, we also wanted to use it to benchmark our database TiKV, but at that time, there was no Java client for TiKV so we couldn’t use it. Luckily, after investigating YCSB, we found it is easy to port with Go, so we develop go-ycsb.

Building go-ycsb is easy, you need to install Go (Version >= 1.11) at first, then:

git clone https://github.com/pingcap/go-ycsb.git
cd go-ycsb
make

The go-ycsb execution binary is installed in the ./bin directory.

If you are familiar with YCSB, you can nearly use the same way to use go-ycsb. You need to load data at first, then run the different workloads.

Workload

You should define a workload to run YCSB. In the workload, you can define the proportion of operations (Insert/Update/Read/Scan), the data size, the request distribution (Uniform, Zipfian, Latest, Multinomial), etc, to bench the system in many dimensions.

YCSB provides some default workloads:

For example, the default workload A is:

recordcount=1000
operationcount=1000
workload=core
readallfields=truereadproportion=0.5
updateproportion=0.5
scanproportion=0
insertproportion=0
requestdistribution=zipfian

Use go-ycsb

We will show you how to use go-ycsb to benchmark MySQL. First start MySQL:

docker run --name mysql -e MYSQL_ALLOW_EMPTY_PASSWORD=yes -p 3306:3306 -d mysql:5.7

Here we start a MySQl 5.7 server, listening 3306 port. We also need to login to MySQL server and create a test database.

Then we load data — insert 1000 rows with 2 concurrencies:

./bin/go-ycsb load mysql -p mysql.host=127.0.0.1 -p mysql.port=3306 -p mysq.user=root -p mysql.db=test -p recordcount=1000 -p threadcount=2 
***************** properties *****************
"recordcount"="1000"
"threadcount"="2"
"silence"="false"
"dotransactions"="false"
"mysql.host"="127.0.0.1"
"mysql.port"="3306"
"mysq.user"="root"
"mysql.db"="test"
**********************************************
Run finished, takes 2.663707279s
INSERT - Takes(s): 2.7, Count: 1000, OPS: 376.8, Avg(us): 5269, Min(us): 3071, Max(us): 64465, 95th(us): 8000, 99th(us): 18000

YCSB will create a usertable table, with 1 primary key and 10 fields:

mysql> desc usertable;
+----------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+--------------+------+-----+---------+-------+
| YCSB_KEY | varchar(64) | NO | PRI | NULL | |
| FIELD0 | varchar(100) | YES | | NULL | |
| FIELD1 | varchar(100) | YES | | NULL | |
| FIELD2 | varchar(100) | YES | | NULL | |
| FIELD3 | varchar(100) | YES | | NULL | |
| FIELD4 | varchar(100) | YES | | NULL | |
| FIELD5 | varchar(100) | YES | | NULL | |
| FIELD6 | varchar(100) | YES | | NULL | |
| FIELD7 | varchar(100) | YES | | NULL | |
| FIELD8 | varchar(100) | YES | | NULL | |
| FIELD9 | varchar(100) | YES | | NULL | |
+----------+--------------+------+-----+---------+-------+

We can check whether the load is successful or not:

mysql> select count(*) from usertable;
+----------+
| count(*) |
+----------+
| 1000 |
+----------+

Then we run workload A:

./bin/go-ycsb run mysql -P workloads/workloada -p mysql.host=127.0.0.1 -p mysql.port=3306 -p mysq.user=root -p mysql.db=test -p recordcount=1000 -p threadcount=2
***************** properties *****************
"requestdistribution"="zipfian"
"mysql.host"="127.0.0.1"
"readproportion"="0.5"
"recordcount"="1000"
"mysql.port"="3306"
"updateproportion"="0.5"
"mysql.db"="test"
"insertproportion"="0"
"workload"="core"
"dotransactions"="true"
"scanproportion"="0"
"operationcount"="1000"
"readallfields"="true"
"threadcount"="2"
"mysq.user"="root"
**********************************************
Run finished, takes 2.166502174s
READ - Takes(s): 2.1, Count: 482, OPS: 226.2, Avg(us): 1004, Min(us): 475, Max(us): 6142, 95th(us): 2000, 99th(us): 3000
UPDATE - Takes(s): 2.1, Count: 518, OPS: 242.4, Avg(us): 7304, Min(us): 3475, Max(us): 35581, 95th(us): 12000, 99th(us): 20000

Benchmark your Database

If you want to benchmark your database in go-ycsb, you only need to implement the DB and DBCreator interface.

The DB interface is below:

type DB interface {
Close() error
InitThread(ctx context.Context, threadID int, threadCount int) context.Context
CleanupThread(ctx context.Context)
Read(ctx context.Context, table string, key string, fields []string) (map[string][]byte, error)
Scan(ctx context.Context, table string, startKey string, count int, fields []string) ([]map[string][]byte, error)
Update(ctx context.Context, table string, key string, values map[string][]byte) error
Insert(ctx context.Context, table string, key string, values map[string][]byte) error
Delete(ctx context.Context, table string, key string) error
}

The definitions of functions - Read, Scan, Update, Insert, Delete are straightforward and simple. You only need to care InitThread and CleanupThread, in go-ycsb, DB will be used in multi threads(goroutines) and must be thread-safe. Sometimes you need some local-thread variables, so we can assign these variables at the start of the thread in InitThread, then clean up them at the end of the thread in CleanupThread.

Let’s use MySQL for example:

func (db *mysqlDB) InitThread(ctx context.Context, _ int, _ int) context.Context {
state := &mysqlState{
stmtCache: make(map[string]*sql.Stmt),
}

return context.WithValue(ctx, stateKey, state)
}

func (db *mysqlDB) CleanupThread(ctx context.Context) {
state := ctx.Value(stateKey).(*mysqlState)

for _, stmt := range state.stmtCache {
stmt.Close()
}
}

For every thread, at the start, we create a statement cache, bind the cache to the context so we can reuse this statement in the thread later. Then we close the statements at the end of the thread.

The DBCreator interface is below:

type DBCreator interface {
Create(p *properties.Properties) (DB, error)
}

You can use current properties to create the DB, you need to register the Creator to YCSB and must use a unique name. For example, for MySQL, we use:

func init() {
ycsb.RegisterDBCreator("mysql", mysqlCreator{})
}

Then we import MySQL package in main.go:

import _ "github.com/pingcap/go-ycsb/db/mysql"

Epilogue

go-ycsb has supported many databases now, you can see them here. It is very appreciated that you can add your database to it. If you have any question, feel free to create an issue and let us know.

In the next article, we will show you more examples of benchmarking different databases.

--

--