Diversified sampler

The diversified_sampler aggregation lets you reduce the bias in the distribution of the sample pool by deduplicating documents containing the same field. It does so by using the max_docs_per_value and field settings, which limit the maximum number of documents collected on a shard for the provided field. The max_docs_per_value setting is an optional parameter used to determine the maximum number of documents that will be returned per field. The default value of this setting is 1.

Similarly to the sampler aggregation, you can use the shard_size setting to control the maximum number of documents collected on any one shard, as shown in the following example:

  1. GET opensearch_dashboards_sample_data_logs/_search
  2. {
  3. "size": 0,
  4. "aggs": {
  5. "sample": {
  6. "diversified_": {
  7. "shard_size": 1000,
  8. "field": "response.keyword"
  9. },
  10. "aggs": {
  11. "terms": {
  12. "terms": {
  13. "field": "agent.keyword"
  14. }
  15. }
  16. }
  17. }
  18. }
  19. }

copy

Example response

  1. ...
  2. "aggregations" : {
  3. "sample" : {
  4. "doc_count" : 3,
  5. "terms" : {
  6. "doc_count_error_upper_bound" : 0,
  7. "sum_other_doc_count" : 0,
  8. "buckets" : [
  9. {
  10. "key" : "Mozilla/5.0 (X11; Linux x86_64; rv:6.0a1) Gecko/20110421 Firefox/6.0a1",
  11. "doc_count" : 2
  12. },
  13. {
  14. "key" : "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)",
  15. "doc_count" : 1
  16. }
  17. ]
  18. }
  19. }
  20. }
  21. }