Truncate hits processor
Introduced 2.12
The truncate_hits
response processor discards returned search hits after a given hit count is reached. The truncate_hits
processor is designed to work with the oversample request processor but may be used on its own.
The target_size
parameter (which specifies where to truncate) is optional. If it is not specified, then OpenSearch uses the original_size
variable set by the oversample
processor (if available).
The following is a common usage pattern:
- Add the
oversample
processor to a request pipeline to fetch a larger set of results. - In the response pipeline, apply a reranking processor (which may promote results from beyond the originally requested top N) or the
collapse
processor (which may discard results after deduplication). - Apply the
truncate
processor to return (at most) the originally requested number of hits.
Request fields
The following table lists all request fields.
Field | Data type | Description |
---|---|---|
target_size | Integer | The maximum number of search hits to return (>=0). If not specified, the processor will try to read the original_size variable and will fail if it is not available. Optional. |
context_prefix | String | May be used to read the original_size variable from a specific scope in order to avoid collisions. Optional. |
tag | String | The processor’s identifier. Optional. |
description | String | A description of the processor. Optional. |
ignore_failure | Boolean | If true , OpenSearch ignores any failure of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is false . |
Example
The following example demonstrates using a search pipeline with a truncate
processor.
Setup
Create an index named my_index
containing many documents:
POST /_bulk
{ "create":{"_index":"my_index","_id":1}}
{ "doc": { "title" : "document 1" }}
{ "create":{"_index":"my_index","_id":2}}
{ "doc": { "title" : "document 2" }}
{ "create":{"_index":"my_index","_id":3}}
{ "doc": { "title" : "document 3" }}
{ "create":{"_index":"my_index","_id":4}}
{ "doc": { "title" : "document 4" }}
{ "create":{"_index":"my_index","_id":5}}
{ "doc": { "title" : "document 5" }}
{ "create":{"_index":"my_index","_id":6}}
{ "doc": { "title" : "document 6" }}
{ "create":{"_index":"my_index","_id":7}}
{ "doc": { "title" : "document 7" }}
{ "create":{"_index":"my_index","_id":8}}
{ "doc": { "title" : "document 8" }}
{ "create":{"_index":"my_index","_id":9}}
{ "doc": { "title" : "document 9" }}
{ "create":{"_index":"my_index","_id":10}}
{ "doc": { "title" : "document 10" }}
copy
Creating a search pipeline
The following request creates a search pipeline named my_pipeline
with a truncate_hits
response processor that discards hits after the first five:
PUT /_search/pipeline/my_pipeline
{
"response_processors": [
{
"truncate_hits" : {
"tag" : "truncate_1",
"description" : "This processor will discard results after the first 5.",
"target_size" : 5
}
}
]
}
copy
Using a search pipeline
Search for documents in my_index
without a search pipeline:
POST /my_index/_search
{
"size": 8
}
copy
The response contains eight hits:
Response
{
"took" : 13,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 1"
}
}
},
{
"_index" : "my_index",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 2"
}
}
},
{
"_index" : "my_index",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 3"
}
}
},
{
"_index" : "my_index",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 4"
}
}
},
{
"_index" : "my_index",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 5"
}
}
},
{
"_index" : "my_index",
"_id" : "6",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 6"
}
}
},
{
"_index" : "my_index",
"_id" : "7",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 7"
}
}
},
{
"_index" : "my_index",
"_id" : "8",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 8"
}
}
}
]
}
}
To search with a pipeline, specify the pipeline name in the search_pipeline
query parameter:
POST /my_index/_search?search_pipeline=my_pipeline
{
"size": 8
}
copy
The response contains only 5 hits, even though 8 were requested and 10 were available:
Response
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 1"
}
}
},
{
"_index" : "my_index",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 2"
}
}
},
{
"_index" : "my_index",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 3"
}
}
},
{
"_index" : "my_index",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 4"
}
}
},
{
"_index" : "my_index",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 5"
}
}
}
]
}
}
Oversample, collapse, and truncate hits
The following is a more realistic example in which you use oversample
to request many candidate documents, use collapse
to remove documents that duplicate a particular field (to get more diverse results), and then use truncate
to return the originally requested document count (to avoid returning a large result payload from the cluster).
Setup
Create many documents containing a field that you’ll use for collapsing:
POST /_bulk
{ "create":{"_index":"my_index","_id":1}}
{ "title" : "document 1", "color":"blue" }
{ "create":{"_index":"my_index","_id":2}}
{ "title" : "document 2", "color":"blue" }
{ "create":{"_index":"my_index","_id":3}}
{ "title" : "document 3", "color":"red" }
{ "create":{"_index":"my_index","_id":4}}
{ "title" : "document 4", "color":"red" }
{ "create":{"_index":"my_index","_id":5}}
{ "title" : "document 5", "color":"yellow" }
{ "create":{"_index":"my_index","_id":6}}
{ "title" : "document 6", "color":"yellow" }
{ "create":{"_index":"my_index","_id":7}}
{ "title" : "document 7", "color":"orange" }
{ "create":{"_index":"my_index","_id":8}}
{ "title" : "document 8", "color":"orange" }
{ "create":{"_index":"my_index","_id":9}}
{ "title" : "document 9", "color":"green" }
{ "create":{"_index":"my_index","_id":10}}
{ "title" : "document 10", "color":"green" }
copy
Create a pipeline that collapses only on the color
field:
PUT /_search/pipeline/collapse_pipeline
{
"response_processors": [
{
"collapse" : {
"field": "color"
}
}
]
}
copy
Create another pipeline that oversamples, collapses, and then truncates results:
PUT /_search/pipeline/oversampling_collapse_pipeline
{
"request_processors": [
{
"oversample": {
"sample_factor": 3
}
}
],
"response_processors": [
{
"collapse" : {
"field": "color"
}
},
{
"truncate_hits": {
"description": "Truncates back to the original size before oversample increased it."
}
}
]
}
copy
Collapse without oversample
In this example, you request the top three documents before collapsing on the color
field. Because the first two documents have the same color
, the second one is discarded, and the request returns the first and third documents:
POST /my_index/_search?search_pipeline=collapse_pipeline
{
"size": 3
}
copy
Response
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"title" : "document 1",
"color" : "blue"
}
},
{
"_index" : "my_index",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"title" : "document 3",
"color" : "red"
}
}
]
},
"profile" : {
"shards" : [ ]
}
}
Oversample, collapse, and truncate
Now you will use the oversampling_collapse_pipeline
, which requests the top 9 documents (multiplying the size by 3), deduplicates by color
, and then returns the top 3 hits:
POST /my_index/_search?search_pipeline=oversampling_collapse_pipeline
{
"size": 3
}
copy
Response
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"title" : "document 1",
"color" : "blue"
}
},
{
"_index" : "my_index",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"title" : "document 3",
"color" : "red"
}
},
{
"_index" : "my_index",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"title" : "document 5",
"color" : "yellow"
}
}
]
},
"profile" : {
"shards" : [ ]
}
}