S3-compatible

S3 extension

This extension allows you to do 2 things:

To use this Apache Druid extension, include druid-s3-extensions in the extensions load list.

Reading data from S3

The S3 input source is supported by the Parallel task to read objects directly from S3. If you use the Hadoop task, you can read data from S3 by specifying the S3 paths in your inputSpec.

To configure the extension to read objects from S3 you need to configure how to connect to S3.

Deep Storage

S3-compatible deep storage means either AWS S3 or a compatible service like Google Storage which exposes the same API as S3.

S3 deep storage needs to be explicitly enabled by setting druid.storage.type=s3. Only after setting the storage type to S3 will any of the settings below take effect.

To correctly configure this extension for deep storage in S3, first configure how to connect to S3. In addition to this you need to set additional configuration, specific for deep storage

Deep storage specific configuration

PropertyDescriptionDefault
druid.storage.bucketBucket to store in.Must be set.
druid.storage.baseKeyA prefix string that will be prepended to the object names for the segments published to S3 deep storageMust be set.
druid.storage.typeGlobal deep storage provider. Must be set to s3 to make use of this extension.Must be set (likely s3).
druid.storage.archiveBucketS3 bucket name for archiving when running the archive task.none
druid.storage.archiveBaseKeyS3 object key prefix for archiving.none
druid.storage.disableAclBoolean flag to disable ACL. If this is set to false, the full control would be granted to the bucket owner. This may require to set additional permissions. See S3 permissions settings.false
druid.storage.useS3aSchemaIf true, use the “s3a” filesystem when using Hadoop-based ingestion. If false, the “s3n” filesystem will be used. Only affects Hadoop-based ingestion.false

Configuration

S3 authentication methods

Druid uses the following credentials provider chain to connect to your S3 bucket (whether a deep storage bucket or source bucket). Note : You can override the default credentials provider chain for connecting to source bucket by specifying an access key and secret key using Properties Object parameters in the ingestionSpec.

ordertypedetails
1Druid config fileBased on your runtime.properties if it contains values druid.s3.accessKey and druid.s3.secretKey
2Custom properties fileBased on custom properties file where you can supply sessionToken, accessKey and secretKey values. This file is provided to Druid through druid.s3.fileSessionCredentials properties
3Environment variablesBased on environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
4Java system propertiesBased on JVM properties aws.accessKeyId and aws.secretKey
5Profile informationBased on credentials you may have on your druid instance (generally in ~/.aws/credentials)
6ECS container credentialsBased on environment variables available on AWS ECS (AWS_CONTAINER_CREDENTIALS_RELATIVE_URI or AWS_CONTAINER_CREDENTIALS_FULL_URI) as described in the EC2ContainerCredentialsProviderWrapper documentation
7Instance profile informationBased on the instance profile you may have attached to your druid instance

You can find more information about authentication method here
Note : Order is important here as it indicates the precedence of authentication methods.
So if you are trying to use Instance profile information, you must not set druid.s3.accessKey and druid.s3.secretKey in your Druid runtime.properties

S3 permissions settings

s3:GetObject and s3:PutObject are basically required for pushing/loading segments to/from S3. If druid.storage.disableAcl is set to false, then s3:GetBucketAcl and s3:PutObjectAcl are additionally required to set ACL for objects.

AWS region

The AWS SDK requires that the target region be specified. Two ways of doing this are by using the JVM system property aws.region or the environment variable AWS_REGION.

As an example, to set the region to ‘us-east-1’ through system properties:

  • Add -Daws.region=us-east-1 to the jvm.config file for all Druid services.
  • Add -Daws.region=us-east-1 to druid.indexer.runner.javaOpts in Middle Manager configuration so that the property will be passed to Peon (worker) processes.

Connecting to S3 configuration

PropertyDescriptionDefault
druid.s3.accessKeyS3 access key. See S3 authentication methods for more detailsCan be omitted according to authentication methods chosen.
druid.s3.secretKeyS3 secret key. See S3 authentication methods for more detailsCan be omitted according to authentication methods chosen.
druid.s3.fileSessionCredentialsPath to properties file containing sessionToken, accessKey and secretKey value. One key/value pair per line (format key=value). See S3 authentication methods for more detailsCan be omitted according to authentication methods chosen.
druid.s3.protocolCommunication protocol type to use when sending requests to AWS. http or https can be used. This configuration would be ignored if druid.s3.endpoint.url is filled with a URL with a different protocol.https
druid.s3.disableChunkedEncodingDisables chunked encoding. See AWS document for details.false
druid.s3.enablePathStyleAccessEnables path style access. See AWS document for details.false
druid.s3.forceGlobalBucketAccessEnabledEnables global bucket access. See AWS document for details.false
druid.s3.endpoint.urlService endpoint either with or without the protocol.None
druid.s3.endpoint.signingRegionRegion to use for SigV4 signing of requests (e.g. us-west-1).None
druid.s3.proxy.hostProxy host to connect through.None
druid.s3.proxy.portPort on the proxy host to connect through.None
druid.s3.proxy.usernameUser name to use when connecting through a proxy.None
druid.s3.proxy.passwordPassword to use when connecting through a proxy.None
druid.storage.sse.typeServer-side encryption type. Should be one of s3, kms, and custom. See the below Server-side encryption section for more details.None
druid.storage.sse.kms.keyIdAWS KMS key ID. This is used only when druid.storage.sse.type is kms and can be empty to use the default key ID.None
druid.storage.sse.custom.base64EncodedKeyBase64-encoded key. Should be specified if druid.storage.sse.type is custom.None

Server-side encryption

You can enable server-side encryption by setting druid.storage.sse.type to a supported type of server-side encryption. The current supported types are: