SCHEDULE_RANDOM_REPLICA Query Option (Impala 2.5 or higher only)
The SCHEDULE_RANDOM_REPLICA
query option fine-tunes the scheduling algorithm for deciding which host processes each HDFS data block or Kudu tablet to reduce the chance of CPU hotspots.
By default, Impala estimates how much work each host has done for the query, and selects the host that has the lowest workload. This algorithm is intended to reduce CPU hotspots arising when the same host is selected to process multiple data blocks / tablets. Use the SCHEDULE_RANDOM_REPLICA
query option if hotspots still arise for some combinations of queries and data layout.
The SCHEDULE_RANDOM_REPLICA
query option only applies to tables and partitions that are not enabled for the HDFS caching.
Type: Boolean; recognized values are 1 and 0, or true
and false
; any other value interpreted as false
Default: false
Added in: Impala 2.5.0
Related information:
Using HDFS Caching with Impala (Impala 2.1 or higher only), Avoiding CPU Hotspots for HDFS Cached Data , REPLICA_PREFERENCE Query Option (Impala 2.7 or higher only)
Parent topic: Query Options for the SET Statement