Benchmark: Spark SQL VS Presto
- Presto 0.152 (latest)
- 1 c3.xlarge node as coordinator. No work scheduled on master
- 3 c3.2xlarge node as worker
Recently I am working on a new data pipeline, it need to consume the Kafka data and do some transformation then persist the data on hdfs. When I finished my data pipelines, I need to start integration test on staging cluster. When some records are missing on hdfs, I need to figure out…
After I added protobuf and avro decoders into Presto, right now I can query my Kafka cluster through Presto. It saved me lots of time debugging data issues in my data pipelines. Basically If I didn’t see data in Kafka, I do not need to debug my downstream data pipelines.