Benchmark: Spark SQL VS Presto

Hao Gao
Hao Gao
Feb 17, 2017 · 1 min read

Cluster Setup:

Presto:

  • Presto 0.152 (latest)
  • 1 c3.xlarge node as coordinator. No work scheduled on master
  • 3 c3.2xlarge node as worker
  • 8 vCPUs, 15GB mem per worker node
  • Max query per node 9GB
  • Hive metastore and thrift server running on coordinator node

Spark

  • Spark 1.6.1 with default params
  • 1 c3.xlarge node as master
  • 3 c3.2xlarge node as workers
  • 8 vCPUs, 15GB mem per worker node

Tuning made on Presto:

  • distributed-joins-enabled=false
  • optimizer.processing-optimization=columnar_dictionary
  • hive.parquet-optimized-reader.enabled=true hive.parquet-predicate-pushdown.enabled=true

Benchmark result:

I don’t know why presto sucks when perform join on the large data set.

Hadoop Noob

Elephant trainers

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store