s3

Overview

This is a source plugin that reads events from Amazon Simple Storage Service (Amazon S3) objects.

OptionRequiredTypeDescription
notification_typeYesStringMust be sqs
compressionNoStringThe compression algorithm to apply: none, gzip, or automatic. Default is none.
codecYesCodecThe codec to apply. Must be newline, json, or csv.
sqsYessqsThe Amazon Simple Queue Service (Amazon SQS) configuration. See sqs for details.
awsYesawsThe AWS configuration. See aws for details.
on_errorNoStringDetermines how to handle errors in Amazon SQS. Can be either retain_messages or delete_messages. If retain_messages, then Data Prepper will leave the message in the SQS queue and try again. This is recommended for dead-letter queues. If delete_messages, then Data Prepper will delete failed messages. Default is retain_messages.
buffer_timeoutNoDurationThe timeout for writing events to the Data Prepper buffer. Any events that the S3 Source cannot write to the buffer in this time will be discarded. Default is 10 seconds.
records_to_accumulateNoIntegerThe number of messages that accumulate before writing to the buffer. Default is 100.
metadata_root_keyNoStringBase key for adding S3 metadata to each Event. The metadata includes the key and bucket for each S3 object. Defaults to s3/.
disable_bucket_ownership_validationNoBooleanIf true, then the S3 Source will not attempt to validate that the bucket is owned by the expected account. The only expected account is the same account that owns the SQS queue. Defaults to false.

sqs

The following are configure usage of Amazon SQS in the S3 Source plugin.

OptionRequiredTypeDescription
queue_urlYesStringThe URL of the Amazon SQS queue from which messages are received.
maximum_messagesNoIntegerThe maximum number of messages to receive from the SQS queue in any single request. Default is 10.
visibility_timeoutNoDurationThe visibility timeout to apply to messages read from the SQS queue. This should be set to the amount of time that Data Prepper may take to read all the S3 objects in a batch. Default is 30s.
wait_timeNoDurationThe time to wait for long polling on the SQS API. Default is 20s.
poll_delayNoDurationA delay to place between reading and processing a batch of SQS messages and making a subsequent request. Default is 0s.

aws

OptionRequiredTypeDescription
regionNoStringThe AWS Region to use for credentials. Defaults to standard SDK behavior to determine the Region.
sts_role_arnNoStringThe AWS Security Token Service (AWS STS) role to assume for requests to Amazon SQS and Amazon S3. Defaults to null, which will use the standard SDK behavior for credentials.

file

Source for flat file input.

OptionRequiredTypeDescription
pathYesStringPath to the input file (e.g. logs/my-log.log).
formatNoStringFormat of each line in the file. Valid options are json or plain. Default is plain.
record_typeNoStringThe record type to store. Valid options are string or event. Default is string. If you would like to use the file source for log analytics use cases like grok, set this option to event.

pipeline

Source for reading from another pipeline.

OptionRequiredTypeDescription
nameYesStringName of the pipeline to read from.

stdin

Source for console input. Can be useful for testing. No options.