NUM_NODES Query Option
Limit the number of nodes that process a query, typically during debugging.
Type: numeric
Allowed values: Only accepts the values 0 (meaning all nodes) or 1 (meaning all work is done on the coordinator node).
Default: 0
Usage notes:
If you are diagnosing a problem that you suspect is due to a timing issue due to distributed query processing, you can set NUM_NODES=1
to verify if the problem still occurs when all the work is done on a single node.
You might set the NUM_NODES
option to 1 briefly, during INSERT
or CREATE TABLE AS SELECT
statements. Normally, those statements produce one or more data files per data node. If the write operation involves small amounts of data, a Parquet table, and/or a partitioned table, the default behavior could produce many small files when intuitively you might expect only a single output file. SET NUM_NODES=1
turns off the “distributed” aspect of the write operation, making it more likely to produce only one or a few data files.
Warning:
Because this option results in increased resource utilization on a single host, it could cause problems due to contention with other Impala statements or high resource usage. Symptoms could include queries running slowly, exceeding the memory limit, or appearing to hang. Use it only in a single-user development/test environment; do not use it in a production environment or in a cluster with a high-concurrency or high-volume or performance-critical workload.
Parent topic: Query Options for the SET Statement