Data streams - Set up a data stream - 《Elasticsearch v7.9 Reference》

Set up a data stream

Set up a data stream

To set up a data stream, follow these steps:

Check the prerequisites.
Optional: Configure an ILM lifecycle policy for a data stream.
Create an index template for a data stream.
Create a data stream.
Get information about a data stream to verify it exists.
Secure a data stream.

After you set up a data stream, you can use the data stream for indexing, searches, and other supported operations.

If you no longer need it, you can delete a data stream and its backing indices.

Prerequisites

Elasticsearch data streams are intended for time series data only. Each document indexed to a data stream must contain the @timestamp field. This field must be mapped as a date or date_nanos field data type.
Data streams are best suited for time-based, append-only use cases. If you frequently need to update or delete existing documents, we recommend using an index alias and an index template instead.

Optional: Configure an ILM lifecycle policy for a data stream

You can use index lifecycle management (ILM) to automatically manage a data stream’s backing indices. For example, you could use ILM to:

Spin up a new write index for the data stream when the current one reaches a certain size or age.
Move older backing indices to slower, less expensive hardware.
Delete stale backing indices to enforce data retention standards.

To use ILM with a data stream, you must configure a lifecycle policy. This lifecycle policy should contain the automated actions to take on backing indices and the triggers for such actions.

While optional, we recommend using ILM to manage the backing indices associated with a data stream.

You can create the policy through the Kibana UI. In Kibana, open the menu and go to Stack Management > Index Lifecycle Policies. Click Index Lifecycle Policies.

You can also create a policy using the create lifecycle policy API.

The following request configures the my-data-stream-policy lifecycle policy. The policy uses the rollover action to create a new write index for the data stream when the current one reaches 25GB in size. The policy also deletes backing indices 30 days after their rollover.

PUT /_ilm/policy/my-data-stream-policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "25GB"
          }
        }
      },
      "delete": {
        "min_age": "30d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

Create an index template for a data stream

A data stream uses an index template to configure its backing indices. A template for a data stream must specify:

One or more index patterns that match the name of the stream.
The mappings and settings for the stream’s backing indices.
That the template is used exclusively for data streams.
A priority for the template.

Elasticsearch has built-in index templates for the metrics-*-* and logs-*-* index patterns, each with a priority of 100. Elastic Agent uses these templates to create data streams. If you use Elastic Agent, assign your index templates a priority lower than 100 to avoid overriding the built-in templates.

Otherwise, to avoid accidentally applying the built-in templates, use a non-overlapping index pattern or assign templates with an overlapping pattern a priority higher than 100.

For example, if you don’t use Elastic Agent and want to create a template for the logs-* index pattern, assign your template a priority of 200. This ensures your template is applied instead of the built-in template for logs-*-*.

Every document indexed to a data stream must have a @timestamp field. This field can be mapped as a date or date_nanos field data type by the stream’s index template. This mapping can include other mapping parameters, such as format. If the template does not specify a mapping, the @timestamp field is mapped as a date field with default options.

We recommend using ILM to manage a data stream’s backing indices. Specify the name of the lifecycle policy with the index.lifecycle.name setting.

We recommend you carefully consider which mappings and settings to include in this template before creating a data stream. Later changes to the mappings or settings of a stream’s backing indices may require reindexing. See Change mappings and settings for a data stream.

You can create an index template through the Kibana UI:

From Kibana, open the menu and go to Stack Management > Index Management.
In the Index Templates tab, click Create template.
In the Create template wizard, use the Data stream toggle to indicate the template is used exclusively for data streams.

You can also create a template using the put index template API. The template must include a data_stream object with an empty body ({ }). This object indicates the template is used exclusively for data streams.

The following request configures the my-data-stream-template index template. Because no field mapping is specified, the @timestamp field uses the date field data type by default.

PUT /_index_template/my-data-stream-template
{
  "index_patterns": [ "my-data-stream*" ],
  "data_stream": { },
  "priority": 200,
  "template": {
    "settings": {
      "index.lifecycle.name": "my-data-stream-policy"
    }
  }
}

Alternatively, the following template maps @timestamp as a date_nanos field.

PUT /_index_template/my-data-stream-template
{
  "index_patterns": [ "my-data-stream*" ],
  "data_stream": { },
  "priority": 200,
  "template": {
    "mappings": {
      "properties": {
        "@timestamp": { "type": "date_nanos" }    
      }
    },
    "settings": {
      "index.lifecycle.name": "my-data-stream-policy"
    }
  }
}

Maps @timestamp as a date_nanos field. You can include other supported mapping parameters in this field mapping.

You cannot delete an index template that’s in use by a data stream. This would prevent the data stream from creating new backing indices.

Create a data stream

You can create a data stream using one of two methods:

Index documents to create a data stream

You can automatically create a data stream using an indexing request. Submit an indexing request to a target matching the index pattern defined in the template’s index_patterns property.

If the indexing request’s target doesn’t exist, Elasticsearch creates the data stream and uses the target name as the name for the stream.

Data streams support only specific types of indexing requests. See Add documents to a data stream.

The following index API request targets my-data-stream, which matches the index pattern for my-data-stream-template. Because no existing index or data stream uses this name, this request creates the my-data-stream data stream and indexes the document to it.

POST /my-data-stream/_doc/
{
  "@timestamp": "2020-12-06T11:04:05.000Z",
  "user": {
    "id": "vlb44hny"
  },
  "message": "Login attempt failed"
}

The API returns the following response. Note the _index property contains .ds-my-data-stream-000001, indicating the document was indexed to the write index of the new data stream.

{
  "_index": ".ds-my-data-stream-000001",
  "_id": "qecQmXIBT4jB8tq1nG0j",
  "_type": "_doc",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}

Manually create a data stream

You can use the create data stream API to manually create a data stream. The name of the data stream must match the index pattern defined in the template’s index_patterns property.

The following create data stream request targets my-data-stream-alt, which matches the index pattern for my-data-stream-template. Because no existing index or data stream uses this name, this request creates the my-data-stream-alt data stream.

PUT /_data_stream/my-data-stream-alt

Get information about a data stream

To view information about a data stream in Kibana, open the menu and go to Stack Management > Index Management. In the Data Streams tab, click a data stream’s name to view information about the stream.

You can also use the get data stream API to retrieve the following information about one or more data streams:

The current backing indices, which is returned as an array. The last item in the array contains information about the stream’s current write index.
The current generation
The data stream’s health status
The index template used to create the stream’s backing indices
The current ILM lifecycle policy in the stream’s matching index template

The following get data stream API request retrieves information about my-data-stream.

GET /_data_stream/my-data-stream

The API returns the following response. Note the indices property contains an array of the stream’s current backing indices. The last item in this array contains information about the stream’s write index, .ds-my-data-stream-000002.

{
  "data_streams": [
    {
      "name": "my-data-stream",
      "timestamp_field": {
        "name": "@timestamp"
      },
      "indices": [
        {
          "index_name": ".ds-my-data-stream-000001",
          "index_uuid": "krR78LfvTOe6gr5dj2_1xQ"
        },
        {
          "index_name": ".ds-my-data-stream-000002",        
          "index_uuid": "C6LWyNJHQWmA08aQGvqRkA"
        }
      ],
      "generation": 2,
      "status": "GREEN",
      "template": "my-data-stream-template",
      "ilm_policy": "my-data-stream-policy"
    }
  ]
}

Last item in the indices array for my-data-stream. This item contains information about the stream’s current write index, .ds-my-data-stream-000002.

Secure a data stream

You can use Elasticsearch security features to control access to a data stream and its data. See Data stream privileges.

Delete a data stream

You can use the Kibana UI to delete a data stream and its backing indices. In Kibana, open the menu and go to Stack Management > Index Management. In the Data Streams tab, click the trash can icon to delete a stream and its backing indices.

You can also use the the delete data stream API to delete a data stream. The following delete data stream API request deletes my-data-stream. This request also deletes the stream’s backing indices and any data they contain.

DELETE /_data_stream/my-data-stream