Collapse processor

The collapse response processor discards hits that have the same value for a particular field as a previous document in the result set. This is similar to passing the collapse parameter in a search request, but the response processor is applied to the response after fetching from all shards. The collapse response processor may be used in conjunction with the rescore search request parameter or may be applied after a reranking response processor.

Using the collapse response processor will likely result in fewer than size results being returned because hits are discarded from a set whose size is already less than or equal to size. To increase the likelihood of returning size hits, use the oversample request processor and truncate_hits response processor, as shown in this example.

Request fields

The following table lists all request fields.

FieldData typeDescription
fieldStringThe field whose value will be read from each returned search hit. Only the first hit for each given field value will be returned in the search response. Required.
context_prefixStringMay be used to read the original_size variable from a specific scope in order to avoid collisions. Optional.
tagStringThe processor’s identifier. Optional.
descriptionStringA description of the processor. Optional.
ignore_failureBooleanIf true, OpenSearch ignores any failure of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is false.

Example

The following example demonstrates using a search pipeline with a collapse processor.

Setup

Create many documents containing a field to use for collapsing:

  1. POST /_bulk
  2. { "create":{"_index":"my_index","_id":1}}
  3. { "title" : "document 1", "color":"blue" }
  4. { "create":{"_index":"my_index","_id":2}}
  5. { "title" : "document 2", "color":"blue" }
  6. { "create":{"_index":"my_index","_id":3}}
  7. { "title" : "document 3", "color":"red" }
  8. { "create":{"_index":"my_index","_id":4}}
  9. { "title" : "document 4", "color":"red" }
  10. { "create":{"_index":"my_index","_id":5}}
  11. { "title" : "document 5", "color":"yellow" }
  12. { "create":{"_index":"my_index","_id":6}}
  13. { "title" : "document 6", "color":"yellow" }
  14. { "create":{"_index":"my_index","_id":7}}
  15. { "title" : "document 7", "color":"orange" }
  16. { "create":{"_index":"my_index","_id":8}}
  17. { "title" : "document 8", "color":"orange" }
  18. { "create":{"_index":"my_index","_id":9}}
  19. { "title" : "document 9", "color":"green" }
  20. { "create":{"_index":"my_index","_id":10}}
  21. { "title" : "document 10", "color":"green" }

copy

Create a pipeline that only collapses on the color field:

  1. PUT /_search/pipeline/collapse_pipeline
  2. {
  3. "response_processors": [
  4. {
  5. "collapse" : {
  6. "field": "color"
  7. }
  8. }
  9. ]
  10. }

copy

Using a search pipeline

In this example, you request the top three documents before collapsing on the color field. Because the first two documents have the same color, the second one is discarded, and the request returns the first and third documents:

  1. POST /my_index/_search?search_pipeline=collapse_pipeline
  2. {
  3. "size": 3
  4. }

copy

Response

  1. {
  2. "took" : 2,
  3. "timed_out" : false,
  4. "_shards" : {
  5. "total" : 1,
  6. "successful" : 1,
  7. "skipped" : 0,
  8. "failed" : 0
  9. },
  10. "hits" : {
  11. "total" : {
  12. "value" : 10,
  13. "relation" : "eq"
  14. },
  15. "max_score" : 1.0,
  16. "hits" : [
  17. {
  18. "_index" : "my_index",
  19. "_id" : "1",
  20. "_score" : 1.0,
  21. "_source" : {
  22. "title" : "document 1",
  23. "color" : "blue"
  24. }
  25. },
  26. {
  27. "_index" : "my_index",
  28. "_id" : "3",
  29. "_score" : 1.0,
  30. "_source" : {
  31. "title" : "document 3",
  32. "color" : "red"
  33. }
  34. }
  35. ]
  36. },
  37. "profile" : {
  38. "shards" : [ ]
  39. }
  40. }