Pipelines
To use Data Prepper, you define pipelines in a configuration YAML file. Each pipeline is a combination of a source, a buffer, zero or more preppers, and one or more sinks. For example:
simple-sample-pipeline:
workers: 2 # the number of workers
delay: 5000 # in milliseconds, how long workers wait between read attempts
source:
random:
buffer:
bounded_blocking:
buffer_size: 1024 # max number of records the buffer accepts
batch_size: 256 # max number of records the buffer drains after each read
processor:
- string_converter:
upper_case: true
sink:
- stdout:
Sources define where your data comes from. In this case, the source is a random UUID generator (
random
).Buffers store data as it passes through the pipeline.
By default, Data Prepper uses its one and only buffer, the
bounded_blocking
buffer, so you can omit this section unless you developed a custom buffer or need to tune the buffer settings.Preppers perform some action on your data: filter, transform, enrich, etc.
You can have multiple preppers, which run sequentially from top to bottom, not in parallel. The
string_converter
prepper transform the strings by making them uppercase.Sinks define where your data goes. In this case, the sink is stdout.
Examples
This section provides some pipeline examples that you can use to start creating your own pipelines. For more information, see Data Prepper configuration reference guide.
The Data Prepper repository has several sample applications to help you get started.
Log ingestion pipeline
The following example demonstrates how to use HTTP source and Grok prepper plugins to process unstructured log data.
log-pipeline:
source:
http:
ssl: false
processor:
- grok:
match:
log: [ "%{COMMONAPACHELOG}" ]
sink:
- opensearch:
hosts: [ "https://opensearch:9200" ]
insecure: true
username: admin
password: admin
index: apache_logs
This example uses weak security. We strongly recommend securing all plugins which open external ports in production environments.
Trace Analytics pipeline
The following example demonstrates how to build a pipeline that supports the Trace Analytics OpenSearch Dashboards plugin. This pipeline takes data from the OpenTelemetry Collector and uses two other pipelines as sinks. These two separate pipelines index trace and the service map documents for the dashboard plugin.
entry-pipeline:
delay: "100"
source:
otel_trace_source:
ssl: false
sink:
- pipeline:
name: "raw-pipeline"
- pipeline:
name: "service-map-pipeline"
raw-pipeline:
source:
pipeline:
name: "entry-pipeline"
prepper:
- otel_trace_raw_prepper:
sink:
- opensearch:
hosts: ["https://localhost:9200"]
insecure: true
username: admin
password: admin
trace_analytics_raw: true
service-map-pipeline:
delay: "100"
source:
pipeline:
name: "entry-pipeline"
prepper:
- service_map_stateful:
sink:
- opensearch:
hosts: ["https://localhost:9200"]
insecure: true
username: admin
password: admin
trace_analytics_service_map: true
Migrating from Logstash
Data Prepper supports Logstash configuration files for a limited set of plugins. Simply use the logstash config to run Data Prepper.
docker run --name data-prepper \
-v /full/path/to/logstash.conf:/usr/share/data-prepper/pipelines.conf \
opensearchproject/opensearch-data-prepper:latest
This feature is limited by feature parity of Data Prepper. As of Data Prepper 1.2 release, the following plugins from the Logstash configuration are supported:
- HTTP Input plugin
- Grok Filter plugin
- Elasticsearch Output plugin
- Amazon Elasticsearch Output plugin
Configure the Data Prepper server
Data Prepper itself provides administrative HTTP endpoints such as /list
to list pipelines and /metrics/prometheus
to provide Prometheus-compatible metrics data. The port that has these endpoints has a TLS configuration and is specified by a separate YAML file. By default, these endpoints are secured by Data Prepper docker images. We strongly recommend providing your own configuration file for securing production environments. Here is an example data-prepper-config.yaml
:
ssl: true
keyStoreFilePath: "/usr/share/data-prepper/keystore.jks"
keyStorePassword: "password"
privateKeyPassword: "other_password"
serverPort: 1234
To configure the Data Prepper server, run Data Prepper with the additional yaml file.
docker run --name data-prepper -v /full/path/to/pipelines.yaml:/usr/share/data-prepper/pipelines.yaml \
/full/path/to/data-prepper-config.yaml:/usr/share/data-prepper/data-prepper-config.yaml \
opensearchproject/opensearch-data-prepper:latest