SQL-based ingestion known issues

Known issues - 图1info

This page describes SQL-based batch ingestion using the druid-multi-stage-query extension, new in Druid 24.0. Refer to the ingestion methods table to determine which ingestion method is right for you.

Multi-stage query task runtime

  • Fault tolerance is partially implemented. Workers get relaunched when they are killed unexpectedly. The controller does not get relaunched if it is killed unexpectedly.

  • Worker task stage outputs are stored in the working directory given by druid.indexer.task.baseDir. Stages that generate a large amount of output data may exhaust all available disk space. In this case, the query fails with an UnknownError with a message including “No space left on device”.

SELECT Statement

  • SELECT from a Druid datasource does not include unpublished real-time data.

  • GROUPING SETS and UNION ALL are not implemented. Queries using these features return a QueryNotSupported error.

  • For some COUNT DISTINCT queries, you’ll encounter a QueryNotSupported error that includes Must not have 'subtotalsSpec' as one of its causes. This is caused by the planner attempting to use GROUPING SETs, which are not implemented.

  • The numeric varieties of the EARLIEST and LATEST aggregators do not work properly. Attempting to use the numeric varieties of these aggregators lead to an error like java.lang.ClassCastException: class java.lang.Double cannot be cast to class org.apache.druid.collections.SerializablePair. The string varieties, however, do work properly.

INSERT and REPLACE Statements

  • The INSERT and REPLACE statements with column lists, like INSERT INTO tbl (a, b, c) SELECT ..., is not implemented.

  • INSERT ... SELECT and REPLACE ... SELECT insert columns from the SELECT statement based on column name. This differs from SQL standard behavior, where columns are inserted based on position.

  • INSERT and REPLACE do not support all options available in ingestion specs, including the createBitmapIndex and multiValueHandling dimension properties, and the indexSpec tuningConfig property.

EXTERN Function

  • The schemaless dimensions feature is not available. All columns and their types must be specified explicitly using the signature parameter of the EXTERN function.

  • EXTERN with input sources that match large numbers of files may exhaust available memory on the controller task.

  • EXTERN refers to external files. Use FROM to access druid input sources.