How to Install Presto Over a Cluster

4 min readMar 28, 2014

This post describes the steps to follow to install Facebook’s Presto on your cluster.
Although the Presto documentation describes a simple enough way of installing, I found some of the points vague and might cause some confusion. So here’s the modified version.

Prerequisites

The Presto would require you to have the following configured:

A working Hadoop installation : You can do it by following the steps mentioned here.
Hive : Required for running the hive-server in later steps and then for modifying database. Presto can only be used for running queries and not for creating tables. For that purpose, a hive installation is required. You can install it by following the steps mentioned here.

The installation part is described in three major parts:
1. Deploying Presto
2. Discovery Service
3. Command Line Interface

Deploying Presto

For the sake of clarity, we’ll call the directory used to install presto as the home directory.
Start by downloading the presto tarball on the master as well as the slaves and extract it.

cd home
wget http://central.maven.org/maven2/com/facebook/presto/presto-server/0.60/presto-server-0.60.tar.gz
tar zxvf presto-server-0.60.tar.gz

This will create a directory presto-server-0.60. Lets call this the Presto directory. Now create a data directory inside your presto directory. Presto needs this for storing logs, local metadata etc. Also create a MetaStore directory required for storing metadata.

cd presto-server-0.60
mkdir data
mkdir metastore

Configuring Presto :

Create a directory inside the Presto directory named etc. This will be used to hold all your configuration files.

cd presto-server-0.60
mkdir etc

Following files have to be created inside the etc directory.

node.properties
This file is typically created by the deployment system when Presto is first installed. The following is a minimal etc/node.properties(same for master and slaves):

node.environment=production
node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
node.data-dir=/home/data

Note : node id can be obtained by typing uuid on your system. Don’t change it once you’ve configured it inside this file.

jvm.config
The following provides a good starting point for creating etc/jvm.config:

-server
-Xmx16G
-XX:+UseConcMarkSweepGC
-XX:+ExplicitGCInvokesConcurrent
-XX:+CMSClassUnloadingEnabled
-XX:+AggressiveOpts
-XX:+HeapDumpOnOutOfMemoryError
-XX:OnOutOfMemoryError=kill -9 %p
-XX:PermSize=150M
-XX:MaxPermSize=150M
-XX:ReservedCodeCacheSize=150M
-Xbootclasspath/p:/home/presto-server-0.60/lib/floatingdecimal-0.1.jar

config.properties
The following is a configuration for the master.

coordinator=true
datasources=jmx
http-server.http.port=8080
presto-metastore.db.type=h2
presto-metastore.db.filename=/home/presto-server-0.60/metastore
task.max-memory=1GB
discovery-server.enabled=true
discovery.uri=http://<master>:8080

and this is the configuration for the workers.

coordinator=false
datasources=jmx,hive
http-server.http.port=8080
presto-metastore.db.type=h2
presto-metastore.db.filename=/home/presto-server-0.60/metastore
task.max-memory=1GB
discovery.uri=http://<master>:8080

where <master> is the IP of the master being used here.

log.properties
Add this single line inside the log.properties file:

com.facebook.presto=DEBUG

catalog.properties
Catalogs are registered by creating a catalog properties file in the etc/catalog directory. For example, create etc/catalog/jmx.properties with the following contents to mount the jmx connector as the jmx catalog:

connector.name=jmx

Presto includes Hive connectors for multiple versions of Hadoop:

hive-hadoop1: Apache Hadoop 1.x
hive-hadoop2: Apache Hadoop 2.x
hive-cdh4: Cloudera CDH4

Create etc/catalog/hive.properties with the following contents to mount the hive-cdh4 connector as the hive catalog, replacing hive-cdh4with the proper connector for your version of Hadoop and example.net:9083 with the correct host and port for your Hive metastore Thrift service:

connector.name=hive-cdh4
hive.metastore.uri=thrift://<master>:10000

You can have as many catalogs as you need, so if you have additional Hive clusters, simply add another properties file to etc/catalog with a different name (making sure it ends in .properties).

Running Presto

To run presto as a foreground process, run

bin/launcher run

You can also run it as a daemon (logs will be written to stdout/stderr):

bin/launcher start

Discovery Service:

Presto uses the Discovery service to find all the nodes in the cluster. Every Presto instance will register itself with the Discovery service on startup.
Discovery is configured and run the same way as Presto. Download discovery-server-1.16.tar.gz, unpack it to create the installation directory, create the data directory, then configure it to run on a different port than Presto.

wget http://central.maven.org/maven2/io/airlift/discovery/discovery-server/1.16/discovery-server-1.16.tar.gz
cd discovery-server-1.16

Again, create another data directory for discovery server which is used later.

mkdir data

Configuring Discovery

As with presto, create an etc directory inside the discovery server directory to hold the following files:

node.properties

node.environment=production
node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
node.data-dir=/home/discovery-server-1.16/data

jvm.config

-server
-Xmx1G
-XX:+UseConcMarkSweepGC
-XX:+ExplicitGCInvokesConcurrent
-XX:+AggressiveOpts
-XX:+HeapDumpOnOutOfMemoryError
-XX:OnOutOfMemoryError=kill -9 %p

config.properties

http-server.http.port=8411

2.2 Running Discovery:Run Discovery the same way as Presto, either as a foreground process:

bin/launcher run

or as a daemon

bin/launcher start

Command Line Interface

To access presto-cli, you first need to start hive thrift server. Go to hive directory and do the following:

bin/hive —service hiveserver -p 10000

Now, Download presto-cli-0.60-executable.jar, rename it to presto, then run it:

wget http://central.maven.org/maven2/com/facebook/presto/presto-cli/0.60/presto-cli-0.60-executable.jar
mv presto-cli-0.60-executable.jar presto
./presto —server <master>:8080 —catalog hive —schema default

This will start a Presto CLI for you to run your queries.