Oversample processor

Introduced 2.12

The oversample request processor multiplies the size parameter of the search request by a specified sample_factor (>= 1.0), saving the original value in the original_size pipeline variable. The oversample processor is designed to work with the truncate_hits response processor but may be used on its own.

Request body fields

The following table lists all request fields.

FieldData typeDescription
sample_factorFloatThe multiplicative factor (>= 1.0) that will be applied to the size parameter before processing the search request. Required.
context_prefixStringMay be used to scope the original_size variable in order to avoid collisions. Optional.
tagStringThe processor’s identifier. Optional.
descriptionStringA description of the processor. Optional.
ignore_failureBooleanIf true, OpenSearch ignores any failure of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is false.

Example

The following example demonstrates using a search pipeline with an oversample processor.

Setup

Create an index named my_index containing many documents:

  1. POST /_bulk
  2. { "create":{"_index":"my_index","_id":1}}
  3. { "doc": { "title" : "document 1" }}
  4. { "create":{"_index":"my_index","_id":2}}
  5. { "doc": { "title" : "document 2" }}
  6. { "create":{"_index":"my_index","_id":3}}
  7. { "doc": { "title" : "document 3" }}
  8. { "create":{"_index":"my_index","_id":4}}
  9. { "doc": { "title" : "document 4" }}
  10. { "create":{"_index":"my_index","_id":5}}
  11. { "doc": { "title" : "document 5" }}
  12. { "create":{"_index":"my_index","_id":6}}
  13. { "doc": { "title" : "document 6" }}
  14. { "create":{"_index":"my_index","_id":7}}
  15. { "doc": { "title" : "document 7" }}
  16. { "create":{"_index":"my_index","_id":8}}
  17. { "doc": { "title" : "document 8" }}
  18. { "create":{"_index":"my_index","_id":9}}
  19. { "doc": { "title" : "document 9" }}
  20. { "create":{"_index":"my_index","_id":10}}
  21. { "doc": { "title" : "document 10" }}

copy

Creating a search pipeline

The following request creates a search pipeline named my_pipeline with an oversample request processor that requests 50% more hits than specified in size:

  1. PUT /_search/pipeline/my_pipeline
  2. {
  3. "request_processors": [
  4. {
  5. "oversample" : {
  6. "tag" : "oversample_1",
  7. "description" : "This processor will multiply `size` by 1.5.",
  8. "sample_factor" : 1.5
  9. }
  10. }
  11. ]
  12. }

copy

Using a search pipeline

Search for documents in my_index without a search pipeline:

  1. POST /my_index/_search
  2. {
  3. "size": 5
  4. }

copy

The response contains five hits:

Response

  1. {
  2. "took" : 3,
  3. "timed_out" : false,
  4. "_shards" : {
  5. "total" : 1,
  6. "successful" : 1,
  7. "skipped" : 0,
  8. "failed" : 0
  9. },
  10. "hits" : {
  11. "total" : {
  12. "value" : 10,
  13. "relation" : "eq"
  14. },
  15. "max_score" : 1.0,
  16. "hits" : [
  17. {
  18. "_index" : "my_index",
  19. "_id" : "1",
  20. "_score" : 1.0,
  21. "_source" : {
  22. "doc" : {
  23. "title" : "document 1"
  24. }
  25. }
  26. },
  27. {
  28. "_index" : "my_index",
  29. "_id" : "2",
  30. "_score" : 1.0,
  31. "_source" : {
  32. "doc" : {
  33. "title" : "document 2"
  34. }
  35. }
  36. },
  37. {
  38. "_index" : "my_index",
  39. "_id" : "3",
  40. "_score" : 1.0,
  41. "_source" : {
  42. "doc" : {
  43. "title" : "document 3"
  44. }
  45. }
  46. },
  47. {
  48. "_index" : "my_index",
  49. "_id" : "4",
  50. "_score" : 1.0,
  51. "_source" : {
  52. "doc" : {
  53. "title" : "document 4"
  54. }
  55. }
  56. },
  57. {
  58. "_index" : "my_index",
  59. "_id" : "5",
  60. "_score" : 1.0,
  61. "_source" : {
  62. "doc" : {
  63. "title" : "document 5"
  64. }
  65. }
  66. }
  67. ]
  68. }
  69. }

To search with a pipeline, specify the pipeline name in the search_pipeline query parameter:

  1. POST /my_index/_search?search_pipeline=my_pipeline
  2. {
  3. "size": 5
  4. }

copy

The response contains 8 documents (5 * 1.5 = 7.5, rounded up to 8):

Response

  1. {
  2. "took" : 13,
  3. "timed_out" : false,
  4. "_shards" : {
  5. "total" : 1,
  6. "successful" : 1,
  7. "skipped" : 0,
  8. "failed" : 0
  9. },
  10. "hits" : {
  11. "total" : {
  12. "value" : 10,
  13. "relation" : "eq"
  14. },
  15. "max_score" : 1.0,
  16. "hits" : [
  17. {
  18. "_index" : "my_index",
  19. "_id" : "1",
  20. "_score" : 1.0,
  21. "_source" : {
  22. "doc" : {
  23. "title" : "document 1"
  24. }
  25. }
  26. },
  27. {
  28. "_index" : "my_index",
  29. "_id" : "2",
  30. "_score" : 1.0,
  31. "_source" : {
  32. "doc" : {
  33. "title" : "document 2"
  34. }
  35. }
  36. },
  37. {
  38. "_index" : "my_index",
  39. "_id" : "3",
  40. "_score" : 1.0,
  41. "_source" : {
  42. "doc" : {
  43. "title" : "document 3"
  44. }
  45. }
  46. },
  47. {
  48. "_index" : "my_index",
  49. "_id" : "4",
  50. "_score" : 1.0,
  51. "_source" : {
  52. "doc" : {
  53. "title" : "document 4"
  54. }
  55. }
  56. },
  57. {
  58. "_index" : "my_index",
  59. "_id" : "5",
  60. "_score" : 1.0,
  61. "_source" : {
  62. "doc" : {
  63. "title" : "document 5"
  64. }
  65. }
  66. },
  67. {
  68. "_index" : "my_index",
  69. "_id" : "6",
  70. "_score" : 1.0,
  71. "_source" : {
  72. "doc" : {
  73. "title" : "document 6"
  74. }
  75. }
  76. },
  77. {
  78. "_index" : "my_index",
  79. "_id" : "7",
  80. "_score" : 1.0,
  81. "_source" : {
  82. "doc" : {
  83. "title" : "document 7"
  84. }
  85. }
  86. },
  87. {
  88. "_index" : "my_index",
  89. "_id" : "8",
  90. "_score" : 1.0,
  91. "_source" : {
  92. "doc" : {
  93. "title" : "document 8"
  94. }
  95. }
  96. }
  97. ]
  98. }
  99. }