Benchmark: Spark SQL VS Presto
Published in
1 min readFeb 17, 2017
Cluster Setup:
Presto:
- Presto 0.152 (latest)
- 1 c3.xlarge node as coordinator. No work scheduled on master
- 3 c3.2xlarge node as worker
- 8 vCPUs, 15GB mem per worker node
- Max query per node 9GB
- Hive metastore and thrift server running on coordinator node
Spark
- Spark 1.6.1 with default params
- 1 c3.xlarge node as master
- 3 c3.2xlarge node as workers
- 8 vCPUs, 15GB mem per worker node
Tuning made on Presto:
- distributed-joins-enabled=false
- optimizer.processing-optimization=columnar_dictionary
- hive.parquet-optimized-reader.enabled=true hive.parquet-predicate-pushdown.enabled=true
Benchmark result:
I don’t know why presto sucks when perform join on the large data set.