Tutorial: Create a data stream with a lifecycle

Tutorial: Create a data stream with a lifecycle

To create a data stream with a built-in lifecycle, follow these steps:

  1. Create an index template
  2. Create a data stream
  3. Retrieve lifecycle information

Create an index template

A data stream requires a matching index template. You can configure the data stream lifecycle by setting the lifecycle field in the index template the same as you do for mappings and index settings. You can define an index template that sets a lifecycle as follows:

  • Include the data_stream object to enable data streams.
  • Define the lifecycle in the template section or include a composable template that defines the lifecycle.
  • Use a priority higher than 200 to avoid collisions with built-in templates. See Avoid index pattern collisions.

You can use the create index template API.

  1. resp = client.indices.put_index_template(
  2. name="my-index-template",
  3. index_patterns=[
  4. "my-data-stream*"
  5. ],
  6. data_stream={},
  7. priority=500,
  8. template={
  9. "lifecycle": {
  10. "data_retention": "7d"
  11. }
  12. },
  13. meta={
  14. "description": "Template with data stream lifecycle"
  15. },
  16. )
  17. print(resp)
  1. response = client.indices.put_index_template(
  2. name: 'my-index-template',
  3. body: {
  4. index_patterns: [
  5. 'my-data-stream*'
  6. ],
  7. data_stream: {},
  8. priority: 500,
  9. template: {
  10. lifecycle: {
  11. data_retention: '7d'
  12. }
  13. },
  14. _meta: {
  15. description: 'Template with data stream lifecycle'
  16. }
  17. }
  18. )
  19. puts response
  1. const response = await client.indices.putIndexTemplate({
  2. name: "my-index-template",
  3. index_patterns: ["my-data-stream*"],
  4. data_stream: {},
  5. priority: 500,
  6. template: {
  7. lifecycle: {
  8. data_retention: "7d",
  9. },
  10. },
  11. _meta: {
  12. description: "Template with data stream lifecycle",
  13. },
  14. });
  15. console.log(response);
  1. PUT _index_template/my-index-template
  2. {
  3. "index_patterns": ["my-data-stream*"],
  4. "data_stream": { },
  5. "priority": 500,
  6. "template": {
  7. "lifecycle": {
  8. "data_retention": "7d"
  9. }
  10. },
  11. "_meta": {
  12. "description": "Template with data stream lifecycle"
  13. }
  14. }

Create a data stream

You can create a data stream in two ways:

  1. By manually creating the stream using the create data stream API. The stream’s name must still match one of your template’s index patterns.

    1. resp = client.indices.create_data_stream(
    2. name="my-data-stream",
    3. )
    4. print(resp)
    1. response = client.indices.create_data_stream(
    2. name: 'my-data-stream'
    3. )
    4. puts response
    1. const response = await client.indices.createDataStream({
    2. name: "my-data-stream",
    3. });
    4. console.log(response);
    1. PUT _data_stream/my-data-stream
  2. By indexing requests that target the stream’s name. This name must match one of your index template’s index patterns.

    1. resp = client.bulk(
    2. index="my-data-stream",
    3. operations=[
    4. {
    5. "create": {}
    6. },
    7. {
    8. "@timestamp": "2099-05-06T16:21:15.000Z",
    9. "message": "192.0.2.42 - - [06/May/2099:16:21:15 +0000] \"GET /images/bg.jpg HTTP/1.0\" 200 24736"
    10. },
    11. {
    12. "create": {}
    13. },
    14. {
    15. "@timestamp": "2099-05-06T16:25:42.000Z",
    16. "message": "192.0.2.255 - - [06/May/2099:16:25:42 +0000] \"GET /favicon.ico HTTP/1.0\" 200 3638"
    17. }
    18. ],
    19. )
    20. print(resp)
    1. response = client.bulk(
    2. index: 'my-data-stream',
    3. body: [
    4. {
    5. create: {}
    6. },
    7. {
    8. "@timestamp": '2099-05-06T16:21:15.000Z',
    9. message: '192.0.2.42 - - [06/May/2099:16:21:15 +0000] "GET /images/bg.jpg HTTP/1.0" 200 24736'
    10. },
    11. {
    12. create: {}
    13. },
    14. {
    15. "@timestamp": '2099-05-06T16:25:42.000Z',
    16. message: '192.0.2.255 - - [06/May/2099:16:25:42 +0000] "GET /favicon.ico HTTP/1.0" 200 3638'
    17. }
    18. ]
    19. )
    20. puts response
    1. const response = await client.bulk({
    2. index: "my-data-stream",
    3. operations: [
    4. {
    5. create: {},
    6. },
    7. {
    8. "@timestamp": "2099-05-06T16:21:15.000Z",
    9. message:
    10. '192.0.2.42 - - [06/May/2099:16:21:15 +0000] "GET /images/bg.jpg HTTP/1.0" 200 24736',
    11. },
    12. {
    13. create: {},
    14. },
    15. {
    16. "@timestamp": "2099-05-06T16:25:42.000Z",
    17. message:
    18. '192.0.2.255 - - [06/May/2099:16:25:42 +0000] "GET /favicon.ico HTTP/1.0" 200 3638',
    19. },
    20. ],
    21. });
    22. console.log(response);
    1. PUT my-data-stream/_bulk
    2. { "create":{ } }
    3. { "@timestamp": "2099-05-06T16:21:15.000Z", "message": "192.0.2.42 - - [06/May/2099:16:21:15 +0000] \"GET /images/bg.jpg HTTP/1.0\" 200 24736" }
    4. { "create":{ } }
    5. { "@timestamp": "2099-05-06T16:25:42.000Z", "message": "192.0.2.255 - - [06/May/2099:16:25:42 +0000] \"GET /favicon.ico HTTP/1.0\" 200 3638" }

Retrieve lifecycle information

You can use the get data stream lifecycle API to see the data stream lifecycle of your data stream and the explain data stream lifecycle API to see the exact state of each backing index.

  1. resp = client.indices.get_data_lifecycle(
  2. name="my-data-stream",
  3. )
  4. print(resp)
  1. response = client.indices.get_data_lifecycle(
  2. name: 'my-data-stream'
  3. )
  4. puts response
  1. const response = await client.indices.getDataLifecycle({
  2. name: "my-data-stream",
  3. });
  4. console.log(response);
  1. GET _data_stream/my-data-stream/_lifecycle

The result will look like this:

  1. {
  2. "data_streams": [
  3. {
  4. "name": "my-data-stream",
  5. "lifecycle": {
  6. "enabled": true,
  7. "data_retention": "7d",
  8. "effective_retention": "7d",
  9. "retention_determined_by": "data_stream_configuration"
  10. }
  11. }
  12. ],
  13. "global_retention": {}
  14. }

The name of your data stream.

Shows if the data stream lifecycle is enabled for this data stream.

The retention period of the data indexed in this data stream, as configured by the user.

The retention period that will be applied by the data stream lifecycle. This means that the data in this data stream will be kept at least for 7 days. After that Elasticsearch can delete it at its own discretion.

If you want to see more information about how the data stream lifecycle is applied on individual backing indices use the explain data stream lifecycle API:

  1. resp = client.indices.explain_data_lifecycle(
  2. index=".ds-my-data-stream-*",
  3. )
  4. print(resp)
  1. response = client.indices.explain_data_lifecycle(
  2. index: '.ds-my-data-stream-*'
  3. )
  4. puts response
  1. const response = await client.indices.explainDataLifecycle({
  2. index: ".ds-my-data-stream-*",
  3. });
  4. console.log(response);
  1. GET .ds-my-data-stream-*/_lifecycle/explain

The result will look like this:

  1. {
  2. "indices": {
  3. ".ds-my-data-stream-2023.04.19-000001": {
  4. "index": ".ds-my-data-stream-2023.04.19-000001",
  5. "managed_by_lifecycle": true,
  6. "index_creation_date_millis": 1681918009501,
  7. "time_since_index_creation": "1.6m",
  8. "lifecycle": {
  9. "enabled": true,
  10. "data_retention": "7d"
  11. }
  12. }
  13. }
  14. }

The name of the backing index.

If it is managed by the built-in data stream lifecycle.

Time since the index was created.

The lifecycle configuration that is applied on this backing index.