Presto with Kubernetes and S3 — Benchmark

Published in

The Startup

5 min readApr 29, 2020

In the first part of this blog, I described how to deploy a Presto cluster with Kubernetes and configure it to access data in S3. With Presto on Kubernetes, by putting data in FlashBlade S3, and using PSO to provide FlashArrary as a service to Kubernetes pods, I was able to easily and quickly spin up and down multiple Presto clusters any time.

In this second part, I will describe a TPC-DS benchmark we did in that environment. One of the reason we did the benchmark is to see how storage performance affects Presto response time. Therefore, we ran the same benchmark on FlashBlade and another S3-compatible object storage (HyperStore).

Benchmark Environment

My benchmark environment looks like this:

Presto cluster:

1 coordinator, 5~8 workers, running on Kubernetes
8 vCores, 32 GB RAM per pod
Kubernetes runs on 3 bare metal servers, each with 16 cores and 256GB RAM

Object storage:

FlashBlade: 15 blades (a blade is like a small storage server)
HyperStore: 3 bare metal servers

TPC-DS benchmark:

1TB data (sf1000), ORC format
Selected 16 small to mid-size queries

Test client: JMeter JDBC Request, single user

Benchmark Result and Observation

The result indicates storage performance plays an important role in Presto query response time. Total runtime, Presto with FlashBlade is 2x faster than with HyperStore. With 9 Presto nodes on FlashBlade, all the 16 queries completed in 15 miniutes. Same number of Presto nodes on HyperStore took 31 minutes.

My intension here is not raw storage performance comparison as Presto CPU becoming the bottleneck at some point.

Observation — Presto with FlashBlade

FlashBlade delivered 2.97 GB/s peak throughput with less than 5ms latency to the 9 Presto nodes running on the Kubernetes cluster. A 15-blades FlashBlade system supports 15 GB/s maximum read throughput. My FlashBlade environment is likely to be able to support up to 5x more Presto nodes.

During testing Kubernetes cluster CPU utilisation was around 60~70%.

Kubernetes cluster CPU utilisation, with FlashBlade

Observation — Presto with HyperStore

Adding more servers to HyperStore can also speed up Presto and push Presto node to higher CPU utilization. HyperStore delivered 560MB/s peak throughput to the 9 Presto nodes. Storage CPU was 85% busy, so it is close to the maximum throughput my HyperStore setup can deliver.

With HyperStore, the Kubernetes cluster was running at 15~60% CPU utilisation. This indicates Presto could run faster if HyperStore can support higher I/O throughput.

Kubernetes cluster CPU utilisation, with HyperStore

Conclusion

Storage performance can make a big difference in query response time of big data SQL engine like Presto.

Object store is considered to be slow. This is typically the case, especially on public cloud platforms. Therefore, local storage cache with master data sitting in remote object store has become a popular architecture in many modern query engines. However, cache management for large data set can be complex and resource consuming.

What if your object store is as fast as local storages? Fast object store like FlashBlade S3 delivers high throughput I/O at consistent low latency to Presto. Storage performance becomes important as query engines embrace architecture to separate compute and storage. It also has the potential to simplify or even eliminate local cache with fast object storage.

Keep reading if you are interested in how the benchmark was executed.

Executing the Benchmark

My benchmark execution flow looks like this:

Generate TPC-DS data in ORC format in FlashBlade S3.
Copy the ORC data from FlashBlade to HyperStore.
Create and analyze TPC-DS tables.
Create and execute JMeter test plan.

Creating TPC-DS Schema and Data

I use the Presto TPCDS Connector to generate the 1TB test data.

CREATE SCHEMA hive.tpcds_sf1000_orc WITH (location = 's3://bucket/tpcds_sf1000_orc.db');CREATE TABLE hive.tpcds_sf1000_orc.store_sales WITH (format = 'orc') AS SELECT * FROM tpcds.sf1000.store_sales;

This creates a TPC-DS schema pointing to S3 so that its table data will be put in my bucket by default. The script also creates tables in the schema and generates 1TB of data in ORC format.

Copying Data Between Multiple Object Stores

To copy the data from FlashBlade S3 to HyperStore, I need a tool that supports per-bucket configuration for each of the bucket in FlashBlade and HyperStore. I chose Spark DistCp, which is a Spark re-implementation of Hadoop DistCp. Spark DistCp uses Hadoop S3A for S3 I/O. S3A supports per bucket setting like the below:

"spark.hadoop.fs.s3a.bucket.mybucket1.endpoint": "192.168.170.12"
"spark.hadoop.fs.s3a.bucket.mybucket1.access.key": "ACCESS_KEY_1"
"spark.hadoop.fs.s3a.bucket.mybucket1.secret.key": "SECRET_KEY_1""spark.hadoop.fs.s3a.bucket.mybucket2.endpoint": "192.168.170.22"
"spark.hadoop.fs.s3a.bucket.mybucket2.access.key": "ACCESS_KEY_2"
"spark.hadoop.fs.s3a.bucket.mybucket2.secret.key": "SECRET_KEY_2"

In the above setting, mybucket1 is a bucket from FlashBlade, while mybucket2 is from HyperStore. With this, I can run the distributed copy job:

./spark-submit --class com.coxautodata.SparkDistCP s3a://mybucket1/tpcds_sf1000_orc.db s3a://mybucket2

My Spark DistCp job also runs on Kubernetes.

Analyzing Tables

Table statistics are import for Presto and many databases to optimize query plan using techniques like cost based optimizations.

In my test, this is especially important because my tables in HyperStore had zero statistics as they were created after the distcp job. For each of the tables, run the below:

ANALYZE $SCHEMA.$TABLE;

Create and Execute JMeter Test Plan

Apache JMeter is a handy tool to load test functional behavior and measure performance. It comes with a simple GUI to build and run test plans. It also supports running test plans in CLI.

Testing all the 99 TPC-DS queries was not my goal, so I simply picked up 16 small to mid-size queries from Presto’s TPC-DS benchmark repositor, created my test plan in JMeter UI and ran the plan in CLI.

bin/jmeter \
 -n -t jmeter-tpcds.jmx \
 -l reports/tpcds-log.jtl \
 -e -o reports

After the run, JMeter collects metrics and generates report like the below.

JMeter TPC-DS Report — 9 Presto nodes with Kubernetes on FlashBlade

As always, all the code for this benchmark are available at my Github repository.