4.3. Properties Reference

4.3. Properties Reference

This section describes the most important config properties thatmay be used to tune Presto or alter its behavior when required.

General Properties

join-distribution-type

Type: string
Allowed values: AUTOMATIC, PARTITIONED, BROADCAST
Default value: PARTITIONED
The type of distributed join to use. When set to PARTITIONED, presto willuse hash distributed joins. When set to BROADCAST, it will broadcast theright table to all nodes in the cluster that have data from the left table.Partitioned joins require redistributing both tables using a hash of the join key.This can be slower (sometimes substantially) than broadcast joins, but allows muchlarger joins. In particular broadcast joins will be faster if the right table ismuch smaller than the left. However, broadcast joins require that the tables on the rightside of the join after filtering fit in memory on each node, whereas distributed joinsonly need to fit in distributed memory across all nodes. When set to AUTOMATIC,Presto will make a cost based decision as to which distribution type is optimal.It will also consider switching the left and right inputs to the join. In AUTOMATICmode, Presto will default to hash distributed joins if no cost could be computed, such as ifthe tables do not have statistics. This can also be specified on a per-query basis usingthe join_distribution_type session property.

redistribute-writes

Type: boolean
Default value: true
This property enables redistribution of data before writing. This caneliminate the performance impact of data skew when writing by hashing itacross nodes in the cluster. It can be disabled when it is known that theoutput data set is not skewed in order to avoid the overhead of hashing andredistributing all the data across the network. This can also be specifiedon a per-query basis using the redistribute_writes session property.

Memory Management Properties

query.max-memory-per-node

Type: data size
Default value: JVM max memory * 0.1
This is the max amount of user memory a query can use on a worker.User memory is allocated during execution for things that are directlyattributable to or controllable by a user query. For example, memory usedby the hash tables built during execution, memory used during sorting, etc.When the user memory allocation of a query on any worker hits this limitit will be killed.

query.max-total-memory-per-node

Type: data size
Default value: JVM max memory * 0.3
This is the max amount of user and system memory a query can use on a worker.System memory is allocated during execution for things that are not directlyattributable to or controllable by a user query. For example, memory allocatedby the readers, writers, network buffers, etc. When the sum of the user andsystem memory allocated by a query on any worker hits this limit it will be killed.The value of query.max-total-memory-per-node must be greater thanquery.max-memory-per-node.

query.max-memory

Type: data size
Default value: 20GB
This is the max amount of user memory a query can use across the entire cluster.User memory is allocated during execution for things that are directlyattributable to or controllable by a user query. For example, memory usedby the hash tables built during execution, memory used during sorting, etc.When the user memory allocation of a query across all workers hits this limitit will be killed.

query.max-total-memory

Type: data size
Default value: query.max-total-memory * 2
This is the max amount of user and system memory a query can use across the entire cluster.System memory is allocated during execution for things that are not directlyattributable to or controllable by a user query. For example, memory allocatedby the readers, writers, network buffers, etc. When the sum of the user andsystem memory allocated by a query across all workers hits this limit it will bekilled. The value of query.max-total-memory must be greater thanquery.max-memory.

memory.heap-headroom-per-node

Type: data size
Default value: JVM max memory * 0.3
This is the amount of memory set aside as headroom/buffer in the JVM heapfor allocations that are not tracked by Presto.

Spilling Properties

experimental.spill-enabled

Type: boolean
Default value: false
Try spilling memory to disk to avoid exceeding memory limits for the query.
Spilling works by offloading memory to disk. This process can allow a query with a large memoryfootprint to pass at the cost of slower execution times. Currently, spilling is supported only foraggregations and joins (inner and outer), so this property will not reduce memory usage required forwindow functions, sorting and other join types.
Be aware that this is an experimental feature and should be used with care.
This config property can be overridden by the spill_enabled session property.

experimental.spiller-spill-path

Type: string
No default value. Must be set when spilling is enabled
Directory where spilled content will be written. It can be a comma separatedlist to spill simultaneously to multiple directories, which helps to utilizemultiple drives installed in the system.
It is not recommended to spill to system drives. Most importantly, do not spillto the drive on which the JVM logs are written, as disk overutilization mightcause JVM to pause for lengthy periods, causing queries to fail.

experimental.spiller-max-used-space-threshold

Type: double
Default value: 0.9
If disk space usage ratio of a given spill path is above this threshold,this spill path will not be eligible for spilling.

experimental.spiller-threads

Type: integer
Default value: 4
Number of spiller threads. Increase this value if the default is not ableto saturate the underlying spilling device (for example, when using RAID).

experimental.max-spill-per-node

Type: data size
Default value: 100 GB
Max spill space to be used by all queries on a single node.

experimental.query-max-spill-per-node

Type: data size
Default value: 100 GB
Max spill space to be used by a single query on a single node.

experimental.aggregation-operator-unspill-memory-limit

Type: data size
Default value: 4 MB
Limit for memory used for unspilling a single aggregation operator instance.

experimental.spill-compression-enabled

Type: boolean
Default value: false
Enables data compression for pages spilled to disk

experimental.spill-encryption-enabled

Type: boolean
Default value: false
Enables using a randomly generated secret key (per spill file) to encrypt and decryptdata spilled to disk

Exchange Properties

Exchanges transfer data between Presto nodes for different stages ofa query. Adjusting these properties may help to resolve inter-nodecommunication issues or improve network utilization.

exchange.client-threads

Type: integer
Minimum value: 1
Default value: 25
Number of threads used by exchange clients to fetch data from other Prestonodes. A higher value can improve performance for large clusters or clusterswith very high concurrency, but excessively high values may cause a dropin performance due to context switches and additional memory usage.

exchange.concurrent-request-multiplier

Type: integer
Minimum value: 1
Default value: 3
Multiplier determining the number of concurrent requests relative toavailable buffer memory. The maximum number of requests is determinedusing a heuristic of the number of clients that can fit into availablebuffer space based on average buffer usage per request times thismultiplier. For example, with an exchange.max-buffer-size of 32 MBand 20 MB already used and average size per request being 2MB,the maximum number of clients ismultiplier ((32MB - 20MB) / 2MB) = multiplier 6. Tuning thisvalue adjusts the heuristic, which may increase concurrency and improvenetwork utilization.

exchange.max-buffer-size

Type: data size
Default value: 32MB
Size of buffer in the exchange client that holds data fetched from othernodes before it is processed. A larger buffer can increase networkthroughput for larger clusters and thus decrease query processing time,but will reduce the amount of memory available for other usages.

exchange.max-response-size

Type: data size
Minimum value: 1MB
Default value: 16MB
Maximum size of a response returned from an exchange request. The responsewill be placed in the exchange client buffer which is shared across allconcurrent requests for the exchange.
Increasing the value may improve network throughput if there is highlatency. Decreasing the value may improve query performance for largeclusters as it reduces skew due to the exchange client buffer holdingresponses for more tasks (rather than hold more data from fewer tasks).

sink.max-buffer-size

Type: data size
Default value: 32MB
Output buffer size for task data that is waiting to be pulled by upstreamtasks. If the task output is hash partitioned, then the buffer will beshared across all of the partitioned consumers. Increasing this value mayimprove network throughput for data transferred between stages if thenetwork has high latency or if there are many nodes in the cluster.

Task Properties

task.concurrency

Type: integer
Restrictions: must be a power of two
Default value: 16
Default local concurrency for parallel operators such as joins and aggregations.This value should be adjusted up or down based on the query concurrency and workerresource utilization. Lower values are better for clusters that run many queriesconcurrently because the cluster will already be utilized by all the runningqueries, so adding more concurrency will result in slow downs due to contextswitching and other overhead. Higher values are better for clusters that only runone or a few queries at a time. This can also be specified on a per-query basisusing the task_concurrency session property.

task.http-response-threads

Type: integer
Minimum value: 1
Default value: 100
Maximum number of threads that may be created to handle HTTP responses. Threads arecreated on demand and are cleaned up when idle, thus there is no overhead to a largevalue if the number of requests to be handled is small. More threads may be helpfulon clusters with a high number of concurrent queries, or on clusters with hundredsor thousands of workers.

task.http-timeout-threads

Type: integer
Minimum value: 1
Default value: 3
Number of threads used to handle timeouts when generating HTTP responses. This valueshould be increased if all the threads are frequently in use. This can be monitoredvia the com.facebook.presto.server:name=AsyncHttpExecutionMBean:TimeoutExecutorJMX object. If ActiveCount is always the same as PoolSize, increase thenumber of threads.

task.info-update-interval

Type: duration
Minimum value: 1ms
Maximum value: 10s
Default value: 3s
Controls staleness of task information, which is used in scheduling. Larger valuescan reduce coordinator CPU load, but may result in suboptimal split scheduling.

task.max-partial-aggregation-memory

Type: data size
Default value: 16MB
Maximum size of partial aggregation results for distributed aggregations. Increasing thisvalue can result in less network transfer and lower CPU utilization by allowing moregroups to be kept locally before being flushed, at the cost of additional memory usage.

task.max-worker-threads

Type: integer
Default value: Node CPUs * 2
Sets the number of threads used by workers to process splits. Increasing this numbercan improve throughput if worker CPU utilization is low and all the threads are in use,but will cause increased heap space usage. Setting the value too high may cause a dropin performance due to a context switching. The number of active threads is availablevia the RunningSplits property of thecom.facebook.presto.execution.executor:name=TaskExecutor.RunningSplits JXM object.

task.min-drivers

Type: integer
Default value: task.max-worker-threads * 2
The target number of running leaf splits on a worker. This is a minimum value becauseeach leaf task is guaranteed at least 3 running splits. Non-leaf tasks are alsoguaranteed to run in order to prevent deadlocks. A lower value may improve responsivenessfor new tasks, but can result in underutilized resources. A higher value can increaseresource utilization, but uses additional memory.

task.writer-count

Type: integer
Restrictions: must be a power of two
Default value: 1
The number of concurrent writer threads per worker per query. Increasing this value mayincrease write speed, especially when a query is not I/O bound and can take advantageof additional CPU for parallel writes (some connectors can be bottlenecked on CPU whenwriting due to compression or other factors). Setting this too high may cause the clusterto become overloaded due to excessive resource utilization. This can also be specified ona per-query basis using the task_writer_count session property.

Node Scheduler Properties

node-scheduler.max-splits-per-node

Type: integer
Default value: 100
The target value for the total number of splits that can be running foreach worker node.
Using a higher value is recommended if queries are submitted in large batches(e.g., running a large group of reports periodically) or for connectors thatproduce many splits that complete quickly. Increasing this value may improvequery latency by ensuring that the workers have enough splits to keep themfully utilized.
Setting this too high will waste memory and may result in lower performancedue to splits not being balanced across workers. Ideally, it should be setsuch that there is always at least one split waiting to be processed, butnot higher.

node-scheduler.max-pending-splits-per-task

Type: integer
Default value: 10
The number of outstanding splits that can be queued for each worker nodefor a single stage of a query, even when the node is already at the limit fortotal number of splits. Allowing a minimum number of splits per stage isrequired to prevent starvation and deadlocks.
This value must be smaller than node-scheduler.max-splits-per-node,will usually be increased for the same reasons, and has similar drawbacksif set too high.

node-scheduler.min-candidates

Type: integer
Minimum value: 1
Default value: 10
The minimum number of candidate nodes that will be evaluated by thenode scheduler when choosing the target node for a split. Settingthis value too low may prevent splits from being properly balancedacross all worker nodes. Setting it too high may increase querylatency and increase CPU usage on the coordinator.

node-scheduler.network-topology

Type: string
Allowed values: legacy, flat
Default value: legacy
Sets the network topology to use when scheduling splits. legacy will ignorethe topology when scheduling splits. flat will try to schedule splits on the hostwhere the data is located by reserving 50% of the work queue for local splits.It is recommended to use flat for clusters where distributed storage runs onthe same nodes as Presto workers.

Optimizer Properties

optimizer.dictionary-aggregation

Type: boolean
Default value: false
Enables optimization for aggregations on dictionaries. This can also be specifiedon a per-query basis using the dictionary_aggregation session property.

optimizer.optimize-hash-generation

Type: boolean
Default value: true
Compute hash codes for distribution, joins, and aggregations early during execution,allowing result to be shared between operations later in the query. This can reduceCPU usage by avoiding computing the same hash multiple times, but at the cost ofadditional network transfer for the hashes. In most cases it will decrease overallquery processing time. This can also be specified on a per-query basis using theoptimize_hash_generation session property.
It is often helpful to disable this property when using EXPLAIN in orderto make the query plan easier to read.

optimizer.optimize-metadata-queries

Type: boolean
Default value: false
Enable optimization of some aggregations by using values that are stored as metadata.This allows Presto to execute some simple queries in constant time. Currently, thisoptimization applies to max, min and approx_distinct of partitionkeys and other aggregation insensitive to the cardinality of the input (includingDISTINCT aggregates). Using this may speed up some queries significantly.
The main drawback is that it can produce incorrect results if the connector returnspartition keys for partitions that have no rows. In particular, the Hive connectorcan return empty partitions if they were created by other systems (Presto cannotcreate them).

optimizer.optimize-single-distinct

Type: boolean
Default value: true
The single distinct optimization will try to replace multiple DISTINCT clauseswith a single GROUP BY clause, which can be substantially faster to execute.

optimizer.push-aggregation-through-join

Type: boolean
Default value: true
When an aggregation is above an outer join and all columns from the outer side of the joinare in the grouping clause, the aggregation is pushed below the outer join. This optimizationis particularly useful for correlated scalar subqueries, which get rewritten to an aggregationover an outer join. For example:
SELECT * FROM item i    WHERE i.i_current_price > (        SELECT AVG(j.i_current_price) FROM item j            WHERE i.i_category = j.i_category);
Enabling this optimization can substantially speed up queries by reducingthe amount of data that needs to be processed by the join. However, it may slow down somequeries that have very selective joins. This can also be specified on a per-query basis usingthe push_aggregation_through_join session property.

optimizer.push-table-write-through-union

Type: boolean
Default value: true
Parallelize writes when using UNION ALL in queries that write data. This improves thespeed of writing output tables in UNION ALL queries because these writes do not requireadditional synchronization when collecting results. Enabling this optimization can improveUNION ALL speed when write speed is not yet saturated. However, it may slow down queriesin an already heavily loaded system. This can also be specified on a per-query basisusing the push_table_write_through_union session property.

optimizer.join-reordering-strategy

Type: string
Allowed values: AUTOMATIC, ELIMINATE_CROSS_JOINS, NONE
Default value: ELIMINATE_CROSS_JOINS
The join reordering strategy to use. NONE maintains the order the tables are listed in thequery. ELIMINATE_CROSS_JOINS reorders joins to eliminate cross joins where possible andotherwise maintains the original query order. When reordering joins it also strives to maintain theoriginal table order as much as possible. AUTOMATIC enumerates possible orders and usesstatistics-based cost estimation to determine the least cost order. If stats are not available or iffor any reason a cost could not be computed, the ELIMINATE_CROSS_JOINS strategy is used. This canalso be specified on a per-query basis using the join_reordering_strategy session property.

optimizer.max-reordered-joins

Type: integer
Default value: 9
When optimizer.join-reordering-strategy is set to cost-based, this property determines the maximumnumber of joins that can be reordered at once.
Warning
The number of possible join orders scales factorially with the number of relations,so increasing this value can cause serious performance issues.

Regular Expression Function Properties

The following properties allow tuning the Regular Expression Functions.

regex-library

Type: string
Allowed values: JONI, RE2J
Default value: JONI
Which library to use for regular expression functions.JONI is generally faster for common usage, but can require exponentialtime for certain expression patterns. RE2J uses a different algorithmwhich guarantees linear time, but is often slower.

re2j.dfa-states-limit

Type: integer
Minimum value: 2
Default value: 2147483647
The maximum number of states to use when RE2J builds the fastbut potentially memory intensive deterministic finite automaton (DFA)for regular expression matching. If the limit is reached, RE2J will fallback to the algorithm that uses the slower, but less memory intensivenon-deterministic finite automaton (NFA). Decreasing this value decreases themaximum memory footprint of a regular expression search at the cost of speed.

re2j.dfa-retries

Type: integer
Minimum value: 0
Default value: 5
The number of times that RE2J will retry the DFA algorithm whenit reaches a states limit before using the slower, but less memoryintensive NFA algorithm for all future inputs for that search. If hitting thelimit for a given input row is likely to be an outlier, you want to be ableto process subsequent rows using the faster DFA algorithm. If you are likelyto hit the limit on matches for subsequent rows as well, you want to use thecorrect algorithm from the beginning so as not to waste time and resources.The more rows you are processing, the larger this value should be.