Tarantool/Vinyl: 200K transactions per second on a disk-based database

As you may know, guys from Tarantool’s team (https://tarantool.org) have recently launched a brand new storage engine named Vinyl.

Here is the announce: https://twitter.com/rtsisyk/status/781471650097356800

And here is a funny picture for the release. I love it!


The difference between Vinyl and the current storage engine named Memtx is that with Memtx your dataset is limited to the size of your RAM, whilst with Vinyl you can store on disk up to 100 times more data than the size of your RAM.

You may ask me: Ok, but how is it different from traditional databases like MySQL (InnoDB), Oracle, Postgres? The main difference is that Vinyl is heavily write optimized. And here is the big reason why your database should be heavily write optimized: you can always handle your read workload by caching, but you can’t do that with your write workload (unless you want to lose your changes after a server reboots)!

Here is a table view of all said above:

From the technology standpoint, the difference between Vinyl and InnoDB or other similar database engines is that guys from Tarantool adopted a new technique of log structured merge tree (LSM) instead of B+ tree. Honestly, Vinyl is not the only one LSM-driven engine and not the first one. There are also LevelDB by Google and RocksDB by Facebook. Now Vinyl is also around here. :)

So, I did benchmarks for Vinyl. And the results were fantastic! First thing, I got my Tarantool’s benchmark (for Memtx) that proved one million transactions per second on a single CPU core at AWS: https://gist.github.com/danikin/a5ddc6fe0cedc6257853. Then I changed it slightly to work with Vinyl:

I substituted in this file


the body of the function bootstrap()

with this

box.schema.space.create(‘example’, { engine = ‘vinyl’})
box.space.example:create_index(‘primary’, {type=’TREE’})
box.schema.user.grant(‘guest’, ‘read,write,execute’, ‘universe’)

The main difference here is that the engine of a space is now ‘vinyl’ and it used to be default aka ‘memtx’.

Then I ran the test and got the result like this:

[ec2-user@ip-172–31–20–244 tar_test]$ ./tar_test write 1 10000000
Requests per second: 252592, Responses per second: 250376, Pending requests: 36014, Latency: 143.839665 ms
Requests per second: 208137, Responses per second: 208296, Pending requests: 35855, Latency: 172.134847 ms
Requests per second: 208136, Responses per second: 208296, Pending requests: 35695, Latency: 171.366709 ms
Requests per second: 210157, Responses per second: 208822, Pending requests: 37030, Latency: 177.328059 ms
Requests per second: 242489, Responses per second: 243012, Pending requests: 36507, Latency: 150.227149 ms
Requests per second: 230365, Responses per second: 231824, Pending requests: 35048, Latency: 151.183657 ms
Requests per second: 268759, Responses per second: 268402, Pending requests: 35405, Latency: 131.910343 ms
Requests per second: 270779, Responses per second: 271800, Pending requests: 34384, Latency: 126.504783 ms
Requests per second: 214199, Responses per second: 213172, Pending requests: 35411, Latency: 166.114687 ms
Requests per second: 260675, Responses per second: 260896, Pending requests: 35190, Latency: 134.881332 ms
[ec2-user@ip-172–31–20–244 tar_test]$ ./tar_test read 5 10000000
Requests per second: 21627, Responses per second: 21537, Pending requests: 94700, Latency: 4397.084088 ms
Requests per second: 20633, Responses per second: 21882, Pending requests: 93451, Latency: 4270.679097 ms
Requests per second: 23117, Responses per second: 21498, Pending requests: 95070, Latency: 4422.271839 ms
Requests per second: 24610, Responses per second: 21921, Pending requests: 97759, Latency: 4459.604945 ms
Requests per second: 20385, Responses per second: 21498, Pending requests: 96646, Latency: 4495.580984 ms

As you can see, there are some problems with latency on read requests, but on the bright side the throughput is tremendous! 200K RPS for write workload and 20K RPS for read workload! And that is for a real disk based database (no in-memory).

One funny thing about 20KRPS for read workload. Is was CPU bound workload within this test. Why is it so? Because Vinyl reads data by 64K blocks. So with 20KRPS we have around 1.2Gb of memory being copied per second which should be to much for a single CPU core :-) Another observation is that the benchmark is not good enough for read workload because it turns out that Vinyl totally leverages cache in this case. On the bright side 20KRPS for read workload even from the disk cache on a single CPU core seems to be quite good. But of course, the main result that this benchmark shows is all about write workload.

That said, you can really achieve 200K transactions per second on a disk database with Vinyl.