- Known Issues and Workarounds in Impala
- Impala Known Issues: Startup
- Impala Known Issues: Performance
- Impala Known Issues: JDBC and ODBC Drivers
- Impala Known Issues: Security
- Impala Known Issues: Resources
- Impala Known Issues: Correctness
- Impala Known Issues: Interoperability
- Queries Stuck on Failed HDFS Calls and not Timing out
- DESCRIBE FORMATTED gives error on Avro table
- Configuration needed for Flume to be compatible with Impala
- Avro Scanner fails to parse some schemas
- Impala BE cannot parse Avro schema that contains a trailing semi-colon
- Incorrect results with basic predicate on CHAR typed column
- Tables and databases sharing same name can cause query failures
- Impala Known Issues: Limitations
- Impala Known Issues: Miscellaneous
- Impala Known Issues: Crashes and Hangs
Known Issues and Workarounds in Impala
The following sections describe known issues and workarounds in Impala, as of the current production release. This page summarizes the most serious or frequently encountered issues in the current release, to help you make planning decisions about installing and upgrading. Any workarounds are listed here. The bug links take you to the Impala issues site, where you can see the diagnosis and whether a fix is in the pipeline.
Note: The online issue tracking system for Impala contains comprehensive information and is updated in real time. To verify whether an issue you are experiencing has already been reported, or which release an issue is fixed in, search on the issues.apache.org JIRA tracker.
For issues fixed in various Impala releases, see Fixed Issues in Apache Impala.
Parent topic: Impala Release Notes
Impala Known Issues: Startup
These issues can prevent one or more Impala-related daemons from starting properly.
Impala requires FQDN from hostname command on Kerberized clusters
The method Impala uses to retrieve the host name while constructing the Kerberos principal is the gethostname()
system call. This function might not always return the fully qualified domain name, depending on the network configuration. If the daemons cannot determine the FQDN, Impala does not start on a Kerberized cluster.
Workaround: Test if a host is affected by checking whether the output of the hostname command includes the FQDN. On hosts where hostname, only returns the short name, pass the command-line flag ‑‑hostname=fully_qualified_domain_name
in the startup options of all Impala-related daemons.
Apache Issue: IMPALA-4978
Impala Known Issues: Performance
These issues involve the performance of operations such as queries or DDL statements.
Metadata operations block read-only operations on unrelated tables
Metadata operations that change the state of a table, like COMPUTE STATS
or ALTER RECOVER PARTITIONS
, may delay metadata propagation of unrelated unloaded tables triggered by statements like DESCRIBE
or SELECT
queries.
Apache Issue: IMPALA-6671
Impala Known Issues: JDBC and ODBC Drivers
These issues affect applications that use the JDBC or ODBC APIs, such as business intelligence tools or custom-written applications in languages such as Java or C++.
ImpalaODBC: Can not get the value in the SQLGetData(m-x th column) after the SQLBindCol(m th column)
If the ODBC SQLGetData
is called on a series of columns, the function calls must follow the same order as the columns. For example, if data is fetched from column 2 then column 1, the SQLGetData
call for column 1 returns NULL
.
Apache Issue: IMPALA-1792
Workaround: Fetch columns in the same order they are defined in the table.
Impala Known Issues: Security
These issues are related to security features, such as Kerberos authentication, Sentry authorization, encryption, auditing, and redaction.
Impala does not support Heimdal Kerberos
Heimdal Kerberos is not supported in Impala.
Apache Issue: IMPALA-7072
Affected Versions: All versions of Impala
Impala does not allow the use of insecure clusters with public IPs
Starting in Impala 2.12, Impala, by default, will only allow unencrypted or unauthenticated connections from trusted subnets: 127.0.0.0/8
, 10.0.0.0/8
, 172.16.0.0/12
, 192.168.0.0/16
, 169.254.0.0/16
. Unencrypted or unauthenticated connections from publicly routable IPs will be rejected.
The trusted subnets can be configured using the --trusted_subnets
flag. Set it to ‘0.0.0.0/0
‘ to allow unauthenticated connections from all remote IP addresses. However, if network access is not otherwise restricted by a firewall, malicious users may be able to gain unauthorized access.
Impala Known Issues: Resources
These issues involve memory or disk usage, including out-of-memory conditions, the spill-to-disk feature, and resource management features.
Configuration to prevent crashes caused by thread resource limits
Impala could encounter a serious error due to resource usage under very high concurrency. The error message is similar to:
F0629 08:20:02.956413 29088 llvm-codegen.cc:111] LLVM hit fatal error: Unable to allocate section memory!
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::thread_resource_error> >'
Apache Issue: IMPALA-5605
Severity: High
Workaround: To prevent such errors, configure each host running an impalad daemon with the following settings:
echo 2000000 > /proc/sys/kernel/threads-max
echo 2000000 > /proc/sys/kernel/pid_max
echo 8000000 > /proc/sys/vm/max_map_count
Add the following lines in /etc/security/limits.conf:
impala soft nproc 262144
impala hard nproc 262144
Breakpad minidumps can be very large when the thread count is high
The size of the breakpad minidump files grows linearly with the number of threads. By default, each thread adds 8 KB to the minidump size. Minidump files could consume significant disk space when the daemons have a high number of threads.
Workaround: Add --minidump_size_limit_hint_kb=size to set a soft upper limit on the size of each minidump file. If the minidump file would exceed that limit, Impala reduces the amount of information for each thread from 8 KB to 2 KB. (Full thread information is captured for the first 20 threads, then 2 KB per thread after that.) The minidump file can still grow larger than the “hinted” size. For example, if you have 10,000 threads, the minidump file can be more than 20 MB.
Apache Issue: IMPALA-3509
Process mem limit does not account for the JVM’s memory usage
Some memory allocated by the JVM used internally by Impala is not counted against the memory limit for the impalad daemon.
Workaround: To monitor overall memory usage, use the top command, or add the memory figures in the Impala web UI /memz tab to JVM memory usage shown on the /metrics tab.
Apache Issue: IMPALA-691
Impala Known Issues: Correctness
These issues can cause incorrect or unexpected results from queries. They typically only arise in very specific circumstances.
Incorrect result due to constant evaluation in query with outer join
An OUTER JOIN
query could omit some expected result rows due to a constant such as FALSE
in another join clause. For example:
explain SELECT 1 FROM alltypestiny a1
INNER JOIN alltypesagg a2 ON a1.smallint_col = a2.year AND false
RIGHT JOIN alltypes a3 ON a1.year = a1.bigint_col;
+---------------------------------------------------------+
| Explain String |
+---------------------------------------------------------+
| Estimated Per-Host Requirements: Memory=1.00KB VCores=1 |
| |
| 00:EMPTYSET |
+---------------------------------------------------------+
Apache Issue: IMPALA-3094
Severity: High
% escaping does not work correctly when occurs at the end in a LIKE clause
If the final character in the RHS argument of a LIKE
operator is an escaped \%
character, it does not match a %
final character of the LHS argument.
Apache Issue: IMPALA-2422
Crash: impala::Coordinator::ValidateCollectionSlots
A query could encounter a serious error if includes multiple nested levels of INNER JOIN
clauses involving subqueries.
Apache Issue: IMPALA-2603
Impala Known Issues: Interoperability
These issues affect the ability to interchange data between Impala and other database systems. They cover areas such as data types and file formats.
Queries Stuck on Failed HDFS Calls and not Timing out
In Impala 3.2 and higher, if the following error appears multiple times in a short duration while running a query, it would mean that the connection between the impalad
and the HDFS NameNode is in a bad state and hence the impalad
would have to be restarted:
"hdfsOpenFile() for <filename> at backend <hostname:port> failed to finish before the <hdfs_operation_timeout_sec> second timeout "
In Impala 3.1 and lower, the same issue would cause Impala to wait for a long time or hang without showing the above error message.
Apache Issue: HADOOP-15720
Affected Versions: All versions of Impala
Workaround: Restart the impalad
in the bad state.
DESCRIBE FORMATTED gives error on Avro table
This issue can occur either on old Avro tables (created prior to Hive 1.1) or when changing the Avro schema file by adding or removing columns. Columns added to the schema file will not show up in the output of the DESCRIBE FORMATTED
command. Removing columns from the schema file will trigger a NullPointerException
.
As a workaround, you can use the output of SHOW CREATE TABLE
to drop and recreate the table. This will populate the Hive metastore database with the correct column definitions.
Warning:
Only use this for external tables, or Impala will remove the data files. In case of an internal table, set it to external first:
ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
(The part in parentheses is case sensitive.) Make sure to pick the right choice between internal and external when recreating the table. See Overview of Impala Tables for the differences between internal and external tables.
Severity: High
Configuration needed for Flume to be compatible with Impala
For compatibility with Impala, the value for the Flume HDFS Sink hdfs.writeFormat
must be set to Text
, rather than its default value of Writable
. The hdfs.writeFormat
setting must be changed to Text
before creating data files with Flume; otherwise, those files cannot be read by either Impala or Hive.
Resolution: This information has been requested to be added to the upstream Flume documentation.
Avro Scanner fails to parse some schemas
The default value in Avro schema must match type of first union type, e.g. if the default value is null
, then the first type in the UNION
must be "null"
.
Apache Issue: IMPALA-635
Workaround:Swap the order of the fields in the schema specification. For example, use ["null", "string"]
instead of ["string", "null"]
. Note that the files written with the problematic schema must be rewritten with the new schema because Avro files have embedded schemas.
Impala BE cannot parse Avro schema that contains a trailing semi-colon
If an Avro table has a schema definition with a trailing semicolon, Impala encounters an error when the table is queried.
Apache Issue: IMPALA-1024
Severity: Remove trailing semicolon from the Avro schema.
Incorrect results with basic predicate on CHAR typed column
When comparing a CHAR
column value to a string literal, the literal value is not blank-padded and so the comparison might fail when it should match.
Apache Issue: IMPALA-1652
Workaround: Use the RPAD()
function to blank-pad literals compared with CHAR
columns to the expected length.
Tables and databases sharing same name can cause query failures
A table and a database that share the same name can cause a query failure if the table is not readable by Impala, for example, the table was created in Hive in the Open CSV Serde format. The following exception will return:
CAUSED BY: TableLoadingException: Unrecognized table type for table
Apache Issue: IMPALA-8953
Workaround: Do not create databases and tables with the same names.
Impala Known Issues: Limitations
These issues are current limitations of Impala that require evaluation as you plan how to integrate Impala into your data management workflow.
Set limits on size of expression trees
Very deeply nested expressions within queries can exceed internal Impala limits, leading to excessive memory usage.
Apache Issue: IMPALA-4551
Severity: High
Workaround: Avoid queries with extremely large expression trees. Setting the query option disable_codegen=true
may reduce the impact, at a cost of longer query runtime.
Impala does not support running on clusters with federated namespaces
Impala does not support running on clusters with federated namespaces. The impalad
process will not start on a node running such a filesystem based on the org.apache.hadoop.fs.viewfs.ViewFs
class.
Apache Issue: IMPALA-77
Anticipated Resolution: Limitation
Workaround: Use standard HDFS on all Impala nodes.
Impala Known Issues: Miscellaneous
These issues do not fall into one of the above categories or have not been categorized yet.
A failed CTAS does not drop the table if the insert fails
If a CREATE TABLE AS SELECT
operation successfully creates the target table but an error occurs while querying the source table or copying the data, the new table is left behind rather than being dropped.
Apache Issue: IMPALA-2005
Workaround: Drop the new table manually after a failed CREATE TABLE AS SELECT
.
Casting scenarios with invalid/inconsistent results
Using a CAST()
function to convert large literal values to smaller types, or to convert special values such as NaN
or Inf
, produces values not consistent with other database systems. This could lead to unexpected results from queries.
Apache Issue: IMPALA-1821
Impala should tolerate bad locale settings
If the LC_*
environment variables specify an unsupported locale, Impala does not start.
Apache Issue: IMPALA-532
Workaround: Add LC_ALL="C"
to the environment settings for both the Impala daemon and the Statestore daemon. See Modifying Impala Startup Options for details about modifying these environment settings.
Resolution: Fixing this issue would require an upgrade to Boost 1.47 in the Impala distribution.
Log Level 3 Not Recommended for Impala
The extensive logging produced by log level 3 can cause serious performance overhead and capacity issues.
Workaround: Reduce the log level to its default value of 1, that is, GLOG_v=1
. See Setting Logging Levels for details about the effects of setting different logging levels.
Impala Known Issues: Crashes and Hangs
These issues can cause Impala to quit or become unresponsive.
Unable to view large catalog objects in catalogd Web UI
In catalogd
Web UI, you can list metadata objects and view their details. These details are accessed via a link and printed to a string formatted using thrift’s DebugProtocol
. Printing large objects (> 1 GB) in Web UI can crash catalogd
.
Apache Issue: IMPALA-6841