Apache Parquet Extension
This Apache Druid module extends Druid Hadoop based indexing to ingest data directly from offline Apache Parquet files.
Note: If using the parquet-avro
parser for Apache Hadoop based indexing, druid-parquet-extensions
depends on the druid-avro-extensions
module, so be sure to include both.
The druid-parquet-extensions
provides the Parquet input format, the Parquet Hadoop parser, and the Parquet Avro Hadoop Parser with druid-avro-extensions
. The Parquet input format is available for native batch ingestion and the other 2 parsers are for Hadoop batch ingestion. Please see corresponding docs for details.