2.1. Deploying Presto
Installing Presto
Download the Presto server tarball, presto-server-0.229.tar.gz, and unpack it.The tarball will contain a single top-level directory,presto-server-0.229
, which we will call the installation directory.
Presto needs a data directory for storing logs, etc.We recommend creating a data directory outside of the installation directory,which allows it to be easily preserved when upgrading Presto.
Configuring Presto
Create an etc
directory inside the installation directory.This will hold the following configuration:
- Node Properties: environmental configuration specific to each node
- JVM Config: command line options for the Java Virtual Machine
- Config Properties: configuration for the Presto server
- Catalog Properties: configuration for Connectors (data sources)
Node Properties
The node properties file, etc/node.properties
, contains configurationspecific to each node. A node is a single installed instance of Prestoon a machine. This file is typically created by the deployment system whenPresto is first installed. The following is a minimal etc/node.properties
:
- node.environment=production
- node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
- node.data-dir=/var/presto/data
The above properties are described below:
node.environment
:The name of the environment. All Presto nodes in a cluster musthave the same environment name.node.id
:The unique identifier for this installation of Presto. This must beunique for every node. This identifier should remain consistent acrossreboots or upgrades of Presto. If running multiple installations ofPresto on a single machine (i.e. multiple nodes on the same machine),each installation must have a unique identifier.node.data-dir
:The location (filesystem path) of the data directory. Presto will storelogs and other data here.
JVM Config
The JVM config file, etc/jvm.config
, contains a list of command lineoptions used for launching the Java Virtual Machine. The format of the fileis a list of options, one per line. These options are not interpreted bythe shell, so options containing spaces or other special characters shouldnot be quoted.
The following provides a good starting point for creating etc/jvm.config
:
- -server
- -Xmx16G
- -XX:+UseG1GC
- -XX:G1HeapRegionSize=32M
- -XX:+UseGCOverheadLimit
- -XX:+ExplicitGCInvokesConcurrent
- -XX:+HeapDumpOnOutOfMemoryError
- -XX:+ExitOnOutOfMemoryError
Because an OutOfMemoryError
will typically leave the JVM in aninconsistent state, we write a heap dump (for debugging) and forciblyterminate the process when this occurs.
Config Properties
The config properties file, etc/config.properties
, contains theconfiguration for the Presto server. Every Presto server can functionas both a coordinator and a worker, but dedicating a single machineto only perform coordination work provides the best performance onlarger clusters.
The following is a minimal configuration for the coordinator:
- coordinator=true
- node-scheduler.include-coordinator=false
- http-server.http.port=8080
- query.max-memory=50GB
- query.max-memory-per-node=1GB
- query.max-total-memory-per-node=2GB
- discovery-server.enabled=true
- discovery.uri=http://example.net:8080
And this is a minimal configuration for the workers:
- coordinator=false
- http-server.http.port=8080
- query.max-memory=50GB
- query.max-memory-per-node=1GB
- query.max-total-memory-per-node=2GB
- discovery.uri=http://example.net:8080
Alternatively, if you are setting up a single machine for testing thatwill function as both a coordinator and worker, use this configuration:
- coordinator=true
- node-scheduler.include-coordinator=true
- http-server.http.port=8080
- query.max-memory=5GB
- query.max-memory-per-node=1GB
- query.max-total-memory-per-node=2GB
- discovery-server.enabled=true
- discovery.uri=http://example.net:8080
These properties require some explanation:
coordinator
:Allow this Presto instance to function as a coordinator(accept queries from clients and manage query execution).node-scheduler.include-coordinator
:Allow scheduling work on the coordinator.For larger clusters, processing work on the coordinatorcan impact query performance because the machine’s resources are notavailable for the critical task of scheduling, managing and monitoringquery execution.http-server.http.port
:Specifies the port for the HTTP server. Presto uses HTTP for allcommunication, internal and external.query.max-memory
:The maximum amount of distributed memory that a query may use.query.max-memory-per-node
:The maximum amount of user memory that a query may use on any one machine.query.max-total-memory-per-node
:The maximum amount of user and system memory that a query may use on any one machine,where system memory is the memory used during execution by readers, writers, and network buffers, etc.discovery-server.enabled
:Presto uses the Discovery service to find all the nodes in the cluster.Every Presto instance will register itself with the Discovery serviceon startup. In order to simplify deployment and avoid running an additionalservice, the Presto coordinator can run an embedded version of theDiscovery service. It shares the HTTP server with Presto and thus usesthe same port.discovery.uri
:The URI to the Discovery server. Because we have enabled the embeddedversion of Discovery in the Presto coordinator, this should be theURI of the Presto coordinator. Replaceexample.net:8080
to matchthe host and port of the Presto coordinator. This URI must not endin a slash.You may also wish to set the following properties:jmx.rmiregistry.port
:Specifies the port for the JMX RMI registry. JMX clients should connect to this port.jmx.rmiserver.port
:Specifies the port for the JMX RMI server. Presto exports many metricsthat are useful for monitoring via JMX.See also Resource Groups.
Log Levels
The optional log levels file, etc/log.properties
, allows setting theminimum log level for named logger hierarchies. Every logger has a name,which is typically the fully qualified name of the class that uses the logger.Loggers have a hierarchy based on the dots in the name (like Java packages).For example, consider the following log levels file:
- com.facebook.presto=INFO
This would set the minimum level to INFO
for bothcom.facebook.presto.server
and com.facebook.presto.hive
.The default minimum level is INFO
(thus the above example does not actually change anything).There are four levels: DEBUG
, INFO
, WARN
and ERROR
.
Catalog Properties
Presto accesses data via connectors, which are mounted in catalogs.The connector provides all of the schemas and tables inside of the catalog.For example, the Hive connector maps each Hive database to a schema,so if the Hive connector is mounted as the hive
catalog, and Hivecontains a table clicks
in database web
, that table would be accessedin Presto as hive.web.clicks
.
Catalogs are registered by creating a catalog properties filein the etc/catalog
directory.For example, create etc/catalog/jmx.properties
with the followingcontents to mount the jmx
connector as the jmx
catalog:
- connector.name=jmx
See Connectors for more information about configuring connectors.
Running Presto
The installation directory contains the launcher script in bin/launcher
.Presto can be started as a daemon by running the following:
- bin/launcher start
Alternatively, it can be run in the foreground, with the logs and otheroutput being written to stdout/stderr (both streams should be capturedif using a supervision system like daemontools):
- bin/launcher run
Run the launcher with —help
to see the supported commands andcommand line options. In particular, the —verbose
option isvery useful for debugging the installation.
After launching, you can find the log files in var/log
:
launcher.log
:This log is created by the launcher and is connected to the stdoutand stderr streams of the server. It will contain a few log messagesthat occur while the server logging is being initialized and anyerrors or diagnostics produced by the JVM.server.log
:This is the main log file used by Presto. It will typically containthe relevant information if the server fails during initialization.It is automatically rotated and compressed.http-request.log
:This is the HTTP request log which contains every HTTP requestreceived by the server. It is automatically rotated and compressed.