Collapse processor
Introduced 2.12
The collapse
response processor discards hits that have the same value for a particular field as a previous document in the result set. This is similar to passing the collapse
parameter in a search request, but the response processor is applied to the response after fetching from all shards. The collapse
response processor may be used in conjunction with the rescore
search request parameter or may be applied after a reranking response processor.
Using the collapse
response processor will likely result in fewer than size
results being returned because hits are discarded from a set whose size is already less than or equal to size
. To increase the likelihood of returning size
hits, use the oversample
request processor and truncate_hits
response processor, as shown in this example.
Request body fields
The following table lists all request fields.
Field | Data type | Description |
---|---|---|
field | String | The field whose value will be read from each returned search hit. Only the first hit for each given field value will be returned in the search response. Required. |
context_prefix | String | May be used to read the original_size variable from a specific scope in order to avoid collisions. Optional. |
tag | String | The processor’s identifier. Optional. |
description | String | A description of the processor. Optional. |
ignore_failure | Boolean | If true , OpenSearch ignores any failure of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is false . |
Example
The following example demonstrates using a search pipeline with a collapse
processor.
Setup
Create many documents containing a field to use for collapsing:
POST /_bulk
{ "create":{"_index":"my_index","_id":1}}
{ "title" : "document 1", "color":"blue" }
{ "create":{"_index":"my_index","_id":2}}
{ "title" : "document 2", "color":"blue" }
{ "create":{"_index":"my_index","_id":3}}
{ "title" : "document 3", "color":"red" }
{ "create":{"_index":"my_index","_id":4}}
{ "title" : "document 4", "color":"red" }
{ "create":{"_index":"my_index","_id":5}}
{ "title" : "document 5", "color":"yellow" }
{ "create":{"_index":"my_index","_id":6}}
{ "title" : "document 6", "color":"yellow" }
{ "create":{"_index":"my_index","_id":7}}
{ "title" : "document 7", "color":"orange" }
{ "create":{"_index":"my_index","_id":8}}
{ "title" : "document 8", "color":"orange" }
{ "create":{"_index":"my_index","_id":9}}
{ "title" : "document 9", "color":"green" }
{ "create":{"_index":"my_index","_id":10}}
{ "title" : "document 10", "color":"green" }
copy
Create a pipeline that only collapses on the color
field:
PUT /_search/pipeline/collapse_pipeline
{
"response_processors": [
{
"collapse" : {
"field": "color"
}
}
]
}
copy
Using a search pipeline
In this example, you request the top three documents before collapsing on the color
field. Because the first two documents have the same color
, the second one is discarded, and the request returns the first and third documents:
POST /my_index/_search?search_pipeline=collapse_pipeline
{
"size": 3
}
copy
Response
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"title" : "document 1",
"color" : "blue"
}
},
{
"_index" : "my_index",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"title" : "document 3",
"color" : "red"
}
}
]
},
"profile" : {
"shards" : [ ]
}
}