Presto on Google Cloud Platform

Ayush Bilala
Walmart Global Tech Blog
3 min readDec 23, 2019
Source: Presto, Google Cloud

Presto is undoubtedly the fastest growing distributed SQL query engine and uses an architecture similar to a classic massively parallel processing (MPP) database management system.

Presto can be deployed on many different platforms and locations. Whether in the cloud or on-premises, the technology is truly platform agnostic. Combining Presto with the cloud computing services provided by GCP allows you to never get deprived of resources and continue to gain useful insights out of your data.

This article talks about setting up 3 nodes automated autoscaling Presto cluster on Google Cloud Platform. We’ll be making use of Google Compute Engine, Instance Group and Autoscaler.

Disclaimer: Presto configuration used here is definitely not suitable for production setup by all means. You can follow the article for a quick development environment setup.

Environment

  1. We need 3 virtual machines, one for Presto coordinator and two for Presto workers
  2. Set Presto on all the machines
  3. Create a Presto worker image and template
  4. Create Presto worker Managed Instance Group
  5. Set auto-scaling policy
  6. Validate the setup

Let’s start with 3VMs

We need at the very least 3 virtual machines, one for Presto coordinator and 2 for Presto workers.

Install Java

Well, you need to install Java on both coordinator and workers. Presto needs minimum Java 8, though I recommend you to use Java 11 for better GC.

How to install Java?? There are tons of articles available on the internet on setting up Java. Follow one and get it installed.

Next, Configure Presto Coordinator

Start the Coordinator

At this very point in time, you are all set with your Presto coordinator. Run the Presto launcher script and check the Presto UI at http://{coordinator_ip}:8080

Presto UI: http://{coordinator_ip}:8080

Hey! Aren’t we missing something?

As you can see in Presto UI, the Active Workers count is 0. We have not yet set the Presto workers!! Let’s set them up next.

Steps 1 and 2 are the same as coordinator setup instructions. Step 3 is what makes workers different from the coordinator.

Next...Create Managed Instance Group for Workers

At this point in time, if you go refresh the Presto UI webpage, below is what you’ll see!

Presto UI: http://{coordinator_ip}:8080

Voila! We have a Presto cluster ready.

Let’s continue playing around and set auto-scaling

Autoscaling helps your applications gracefully handle increases in traffic and reduce costs when the need for resources is lower.

Per the below autoscaling policy, GCP will add upto 10 instances to your instance group when there is more load (upscaling), and delete instances when the need for instances is lowered (downscaling).

What about Presto CLI?

You can make use of Presto CLI to query the data from Presto. We have configured the JMX catalog, so you can query JMX metrics.

Presto CLI

If you want to connect to Hive, which most probably you will, go ahead and configure a hive catalog and restart the cluster.

Now that the Hive catalog is configured, let’s query Hive data from Presto CLI.

Presto CLI

CLI is not the only option!

Presto gives you the freedom to use the BI tools that work best for you. You can choose any BI tool of your own choice. It could be Superset, Tableau, Power BI, Zeppelin, Jupyter notebook, and the list goes on...

Well..this is it. You have autoscaling enabled Presto cluster set on GCP for all your analytical needs.

Stay tuned for an amazing insight into Presto on GCP as I am planning to come with more such posts.

--

--