Scaling Limits and Guidelines
This topic lists the scalability limitation in Impala. For a given functional feature, it is recommended that you respect these limitations to achieve optimal scalability and performance. For example, you might be able to create a table with 2000 columns, you will experience performance problems while querying the table. This topic does not cover functional limitations in Impala.
Unless noted otherwise, the limits were tested and certified.
The limits noted as “generally safe“ are not certified, but recommended as generally safe. A safe range is not a hard limit as unforeseen errors or troubles in your particular environment can affect the range.
Parent topic: Scalability Considerations for Impala
Deployment Limits
- Number of Impalad Executors
- 80 nodes in Impala 2.11 and lower
- 150 nodes in Impala 2.12 and higher
Number of Impalad Coordinators: 1 coordinator for at most every 50 executors
The number of Impala clusters per deployment
- 1 Impala cluster in Impala 3.1 and lower
- Multiple clusters in Impala 3.2 and higher is generally safe.
Data Storage Limits
There are no hard limits for the following, but you will experience gradual performance degradation as you increase these numbers.
- Number of databases
- Number of tables - total, per database
- Number of partitions - total, per table
- Number of files - total, per table, per table per partition
- Number of views - total, per database
- Number of user-defined functions - total, per database
- Parquet
- Number of columns per row group
- Number of row groups per block
- Number of HDFS blocks per file
Schema Design Limits
Number of columns
300 for Kudu tables
See Kudu Usage Limitations for more information.
1000 for other types of tables
Security Limits
- Number of roles: 10,000 for Ranger
Query Limits - Compile Time
- Maximum number of columns in a query, included in a
SELECT
list,INSERT
, and in an expression: no limit - Number of tables referenced: no limit
- Number of plan nodes: no limit
- Number of plan fragments: no limit
- Depth of expression tree: 1000 hard limit
- Width of expression tree: 10,000 hard limit
Query Limits - Runtime Time
- Codegen
- Very deeply nested expressions within queries can exceed internal Impala limits, leading to excessive memory usage. Setting the query option
disable_codegen=true
may reduce the impact, at a cost of longer query runtime.
- Very deeply nested expressions within queries can exceed internal Impala limits, leading to excessive memory usage. Setting the query option