SQL-based ingestion known issues
info
This page describes SQL-based batch ingestion using the druid-multi-stage-query extension, new in Druid 24.0. Refer to the ingestion methods table to determine which ingestion method is right for you.
Multi-stage query task runtime
Fault tolerance is partially implemented. Workers get relaunched when they are killed unexpectedly. The controller does not get relaunched if it is killed unexpectedly.
Worker task stage outputs are stored in the working directory given by
druid.indexer.task.baseDir
. Stages that generate a large amount of output data may exhaust all available disk space. In this case, the query fails with an UnknownError with a message including “No space left on device”.
SELECT
Statement
GROUPING SETS
are not implemented. Queries using these features return a QueryNotSupported error.
INSERT
and REPLACE
Statements
The
INSERT
andREPLACE
statements with column lists, likeINSERT INTO tbl (a, b, c) SELECT ...
, is not implemented.INSERT ... SELECT
andREPLACE ... SELECT
insert columns from theSELECT
statement based on column name. This differs from SQL standard behavior, where columns are inserted based on position.INSERT
andREPLACE
do not support all options available in ingestion specs, including thecreateBitmapIndex
andmultiValueHandling
dimension properties, and theindexSpec
tuningConfig property.Queries using
EXTERN
to export data sometimes do not contain all the results. Certain rows maybe missing from the created files.
EXTERN
Function
The schemaless dimensions feature is not available. All columns and their types must be specified explicitly using the
signature
parameter of the EXTERN function.EXTERN
with input sources that match large numbers of files may exhaust available memory on the controller task.EXTERN
refers to external files. UseFROM
to accessdruid
input sources.