Tagged in

Presto

Hadoop Noob

Elephant trainers

More information

Followers

161

More, on Medium

Presto

Hao Gao in Hadoop Noob

Nov 3, 2017

Presto In Production

So what’s Presto

“Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.”

Challenges

Hao Gao in Hadoop Noob

Jul 13, 2017

Presto Parquet Reader

Recently I am working on getting all our warehouse data queryable by Presto. We have lots of data in parquet format and our batch data pipelines are all spark jobs. They are normal ETL jobs. Data flows into Kafka, then Spark/Flink and finally are persisted on S3.

1 response

Hao Gao in Hadoop Noob

May 19, 2017

Query Kafka on Presto

Recently I am working on a new data pipeline, it need to consume the Kafka data and do some transformation then persist the data on hdfs. When I finished my data pipelines, I need to start integration test on staging cluster. When some records are missing on hdfs, I need to figure out…

7 responses

Hao Gao in Hadoop Noob

Feb 17, 2017

Benchmark: Spark SQL VS Presto

Cluster Setup:

Presto:

Presto 0.152 (latest)
1 c3.xlarge node as coordinator. No work scheduled on master
3 c3.2xlarge node as worker

2 responses