SQL-based ingestion known issues

Known issues - 图1info

This page describes SQL-based batch ingestion using the druid-multi-stage-query extension, new in Druid 24.0. Refer to the ingestion methods table to determine which ingestion method is right for you.

Multi-stage query task runtime

  • Fault tolerance is partially implemented. Workers get relaunched when they are killed unexpectedly. The controller does not get relaunched if it is killed unexpectedly.

  • Worker task stage outputs are stored in the working directory given by druid.indexer.task.baseDir. Stages that generate a large amount of output data may exhaust all available disk space. In this case, the query fails with an UnknownError with a message including “No space left on device”.

SELECT Statement

  • GROUPING SETS are not implemented. Queries using these features return a QueryNotSupported error.

INSERT and REPLACE Statements

  • The INSERT and REPLACE statements with column lists, like INSERT INTO tbl (a, b, c) SELECT ..., is not implemented.

  • INSERT ... SELECT and REPLACE ... SELECT insert columns from the SELECT statement based on column name. This differs from SQL standard behavior, where columns are inserted based on position.

  • INSERT and REPLACE do not support all options available in ingestion specs, including the createBitmapIndex and multiValueHandling dimension properties, and the indexSpec tuningConfig property.

EXTERN Function

  • The schemaless dimensions feature is not available. All columns and their types must be specified explicitly using the signature parameter of the EXTERN function.

  • EXTERN with input sources that match large numbers of files may exhaust available memory on the controller task.

  • EXTERN refers to external files. Use FROM to access druid input sources.

WINDOW Function

  • The maximum number of elements in a window cannot exceed a value of 100,000.
  • To avoid leafOperators in MSQ engine, window functions have an extra scan stage after the window stage for cases where native engine has a non-empty leafOperator.

Automatic compaction

The following known issues and limitations affect automatic compaction with the MSQ task engine:

  • The metricSpec field is only supported for certain aggregators. For more information, see Supported aggregators.
  • Only dynamic and range-based partitioning are supported.
  • Set rollup to true if and only if metricSpec is not empty or null.
  • You can only partition on string dimensions. However, multi-valued string dimensions are not supported.
  • The maxTotalRows config is not supported in DynamicPartitionsSpec. Use maxRowsPerSegment instead.
  • Segments can only be sorted on __time as the first column.