_doc_count field

_doc_count field

Bucket aggregations always return a field named doc_count showing the number of documents that were aggregated and partitioned in each bucket. Computation of the value of doc_count is very simple. doc_count is incremented by 1 for every document collected in each bucket.

While this simple approach is effective when computing aggregations over individual documents, it fails to accurately represent documents that store pre-aggregated data (such as histogram or aggregate_metric_double fields), because one summary field may represent multiple documents.

To allow for correct computation of the number of documents when working with pre-aggregated data, we have introduced a metadata field type named _doc_count. _doc_count must always be a positive integer representing the number of documents aggregated in a single summary field.

When field _doc_count is added to a document, all bucket aggregations will respect its value and increment the bucket doc_count by the value of the field. If a document does not contain any _doc_count field, _doc_count = 1 is implied by default.

  • A _doc_count field can only store a single positive integer per document. Nested arrays are not allowed.
  • If a document contains no _doc_count fields, aggregators will increment by 1, which is the default behavior.

Example

The following create index API request creates a new index with the following field mappings:

  • my_histogram, a histogram field used to store percentile data
  • my_text, a keyword field used to store a title for the histogram
  1. PUT my_index
  2. {
  3. "mappings" : {
  4. "properties" : {
  5. "my_histogram" : {
  6. "type" : "histogram"
  7. },
  8. "my_text" : {
  9. "type" : "keyword"
  10. }
  11. }
  12. }
  13. }

The following index API requests store pre-aggregated data for two histograms: histogram_1 and histogram_2.

  1. PUT my_index/_doc/1
  2. {
  3. "my_text" : "histogram_1",
  4. "my_histogram" : {
  5. "values" : [0.1, 0.2, 0.3, 0.4, 0.5],
  6. "counts" : [3, 7, 23, 12, 6]
  7. },
  8. "_doc_count": 45
  9. }
  10. PUT my_index/_doc/2
  11. {
  12. "my_text" : "histogram_2",
  13. "my_histogram" : {
  14. "values" : [0.1, 0.25, 0.35, 0.4, 0.45, 0.5],
  15. "counts" : [8, 17, 8, 7, 6, 2]
  16. },
  17. "_doc_count": 62
  18. }

Field _doc_count must be a positive integer storing the number of documents aggregated to produce each histogram.

If we run the following terms aggregation on my_index:

  1. GET /_search
  2. {
  3. "aggs" : {
  4. "histogram_titles" : {
  5. "terms" : { "field" : "my_text" }
  6. }
  7. }
  8. }

We will get the following response:

  1. {
  2. ...
  3. "aggregations" : {
  4. "histogram_titles" : {
  5. "doc_count_error_upper_bound": 0,
  6. "sum_other_doc_count": 0,
  7. "buckets" : [
  8. {
  9. "key" : "histogram_2",
  10. "doc_count" : 62
  11. },
  12. {
  13. "key" : "histogram_1",
  14. "doc_count" : 45
  15. }
  16. ]
  17. }
  18. }
  19. }