Split processor

Introduced 2.17

The split processor splits a string field into an array of substrings based on a specified delimiter.

Request body fields

The following table lists all available request fields.

FieldData typeDescription
fieldStringThe field containing the string to be split. Required.
separatorStringThe delimiter used to split the string. Specify either a single separator character or a regular expression pattern. Required.
preserve_trailingBooleanIf set to true, preserves empty trailing fields (for example, ‘’) in the resulting array. If set to false, then empty trailing fields are removed from the resulting array. Default is false.
target_fieldStringThe field in which the array of substrings is stored. If not specified, then the field is updated in place.
tagStringThe processor’s identifier.
descriptionStringA description of the processor.
ignore_failureBooleanIf true, then OpenSearch ignores any failure of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is false.

Example

The following example demonstrates using a search pipeline with a split processor.

Setup

Create an index named my_index and index a document containing the field message:

  1. POST /my_index/_doc/1
  2. {
  3. "message": "ingest, search, visualize, and analyze data",
  4. "visibility": "public"
  5. }

copy

Creating a search pipeline

The following request creates a search pipeline with a split response processor that splits the message field and stores the results in the split_message field:

  1. PUT /_search/pipeline/my_pipeline
  2. {
  3. "response_processors": [
  4. {
  5. "split": {
  6. "field": "message",
  7. "separator": ", ",
  8. "target_field": "split_message"
  9. }
  10. }
  11. ]
  12. }

copy

Using a search pipeline

Search for documents in my_index without a search pipeline:

  1. GET /my_index/_search

copy

The response contains the field message:

Response

  1. {
  2. "took": 3,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 1,
  6. "successful": 1,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": {
  12. "value": 1,
  13. "relation": "eq"
  14. },
  15. "max_score": 1,
  16. "hits": [
  17. {
  18. "_index": "my_index",
  19. "_id": "1",
  20. "_score": 1,
  21. "_source": {
  22. "message": "ingest, search, visualize, and analyze data",
  23. "visibility": "public"
  24. }
  25. }
  26. ]
  27. }
  28. }

To search with a pipeline, specify the pipeline name in the search_pipeline query parameter:

  1. GET /my_index/_search?search_pipeline=my_pipeline

copy

The message field is split and the results are stored in the split_message field:

Response

  1. {
  2. "took": 6,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 1,
  6. "successful": 1,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": {
  12. "value": 1,
  13. "relation": "eq"
  14. },
  15. "max_score": 1,
  16. "hits": [
  17. {
  18. "_index": "my_index",
  19. "_id": "1",
  20. "_score": 1,
  21. "_source": {
  22. "visibility": "public",
  23. "message": "ingest, search, visualize, and analyze data",
  24. "split_message": [
  25. "ingest",
  26. "search",
  27. "visualize",
  28. "and analyze data"
  29. ]
  30. }
  31. }
  32. ]
  33. }
  34. }

You can also use the fields option to search for specific fields in a document:

  1. POST /my_index/_search?pretty&search_pipeline=my_pipeline
  2. {
  3. "fields": ["visibility", "message"]
  4. }

copy

In the response, the message field is split and the results are stored in the split_message field:

Response

  1. {
  2. "took": 7,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 1,
  6. "successful": 1,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": {
  12. "value": 1,
  13. "relation": "eq"
  14. },
  15. "max_score": 1,
  16. "hits": [
  17. {
  18. "_index": "my_index",
  19. "_id": "1",
  20. "_score": 1,
  21. "_source": {
  22. "visibility": "public",
  23. "message": "ingest, search, visualize, and analyze data",
  24. "split_message": [
  25. "ingest",
  26. "search",
  27. "visualize",
  28. "and analyze data"
  29. ]
  30. },
  31. "fields": {
  32. "visibility": [
  33. "public"
  34. ],
  35. "message": [
  36. "ingest, search, visualize, and analyze data"
  37. ],
  38. "split_message": [
  39. "ingest",
  40. "search",
  41. "visualize",
  42. "and analyze data"
  43. ]
  44. }
  45. }
  46. ]
  47. }
  48. }