Split processor
- Request body fields
- Example

Split processor

Introduced 2.17

The split processor splits a string field into an array of substrings based on a specified delimiter.

Request body fields

The following table lists all available request fields.

Field	Data type	Description
`field`	String	The field containing the string to be split. Required.
`separator`	String	The delimiter used to split the string. Specify either a single separator character or a regular expression pattern. Required.
`preserve_trailing`	Boolean	If set to `true`, preserves empty trailing fields (for example, `‘’`) in the resulting array. If set to `false`, then empty trailing fields are removed from the resulting array. Default is `false`.
`target_field`	String	The field in which the array of substrings is stored. If not specified, then the field is updated in place.
`tag`	String	The processor’s identifier.
`description`	String	A description of the processor.
`ignore_failure`	Boolean	If `true`, then OpenSearch ignores any failure of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.

Example

The following example demonstrates using a search pipeline with a split processor.

Setup

Create an index named my_index and index a document containing the field message:

POST /my_index/_doc/1
{
  "message": "ingest, search, visualize, and analyze data",
  "visibility": "public"
}

copy

Creating a search pipeline

The following request creates a search pipeline with a split response processor that splits the message field and stores the results in the split_message field:

PUT /_search/pipeline/my_pipeline
{
  "response_processors": [
    {
      "split": {
        "field": "message",
        "separator": ", ",
        "target_field": "split_message"
      }
    }
  ]
}

copy

Using a search pipeline

Search for documents in my_index without a search pipeline:

GET /my_index/_search

copy

The response contains the field message:

Response

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "my_index",
        "_id": "1",
        "_score": 1,
        "_source": {
          "message": "ingest, search, visualize, and analyze data",
          "visibility": "public"
        }
      }
    ]
  }
}

To search with a pipeline, specify the pipeline name in the search_pipeline query parameter:

GET /my_index/_search?search_pipeline=my_pipeline

copy

The message field is split and the results are stored in the split_message field:

Response

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "my_index",
        "_id": "1",
        "_score": 1,
        "_source": {
          "visibility": "public",
          "message": "ingest, search, visualize, and analyze data",
          "split_message": [
            "ingest",
            "search",
            "visualize",
            "and analyze data"
          ]
        }
      }
    ]
  }
}

You can also use the fields option to search for specific fields in a document:

POST /my_index/_search?pretty&search_pipeline=my_pipeline
{
    "fields": ["visibility", "message"]
}

copy

In the response, the message field is split and the results are stored in the split_message field:

Response

{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "my_index",
        "_id": "1",
        "_score": 1,
        "_source": {
          "visibility": "public",
          "message": "ingest, search, visualize, and analyze data",
          "split_message": [
            "ingest",
            "search",
            "visualize",
            "and analyze data"
          ]
        },
        "fields": {
          "visibility": [
            "public"
          ],
          "message": [
            "ingest, search, visualize, and analyze data"
          ],
          "split_message": [
            "ingest",
            "search",
            "visualize",
            "and analyze data"
          ]
        }
      }
    ]
  }
}