Filters aggregation

Filters aggregation

A multi-bucket aggregation where each bucket contains the documents that match a query.

Example:

  1. PUT /logs/_bulk?refresh
  2. { "index" : { "_id" : 1 } }
  3. { "body" : "warning: page could not be rendered" }
  4. { "index" : { "_id" : 2 } }
  5. { "body" : "authentication error" }
  6. { "index" : { "_id" : 3 } }
  7. { "body" : "warning: connection timed out" }
  8. GET logs/_search
  9. {
  10. "size": 0,
  11. "aggs" : {
  12. "messages" : {
  13. "filters" : {
  14. "filters" : {
  15. "errors" : { "match" : { "body" : "error" }},
  16. "warnings" : { "match" : { "body" : "warning" }}
  17. }
  18. }
  19. }
  20. }
  21. }

In the above example, we analyze log messages. The aggregation will build two collection (buckets) of log messages - one for all those containing an error, and another for all those containing a warning.

Response:

  1. {
  2. "took": 9,
  3. "timed_out": false,
  4. "_shards": ...,
  5. "hits": ...,
  6. "aggregations": {
  7. "messages": {
  8. "buckets": {
  9. "errors": {
  10. "doc_count": 1
  11. },
  12. "warnings": {
  13. "doc_count": 2
  14. }
  15. }
  16. }
  17. }
  18. }

Anonymous filters

The filters field can also be provided as an array of filters, as in the following request:

  1. GET logs/_search
  2. {
  3. "size": 0,
  4. "aggs" : {
  5. "messages" : {
  6. "filters" : {
  7. "filters" : [
  8. { "match" : { "body" : "error" }},
  9. { "match" : { "body" : "warning" }}
  10. ]
  11. }
  12. }
  13. }
  14. }

The filtered buckets are returned in the same order as provided in the request. The response for this example would be:

  1. {
  2. "took": 4,
  3. "timed_out": false,
  4. "_shards": ...,
  5. "hits": ...,
  6. "aggregations": {
  7. "messages": {
  8. "buckets": [
  9. {
  10. "doc_count": 1
  11. },
  12. {
  13. "doc_count": 2
  14. }
  15. ]
  16. }
  17. }
  18. }

Other Bucket

The other_bucket parameter can be set to add a bucket to the response which will contain all documents that do not match any of the given filters. The value of this parameter can be as follows:

false

Does not compute the other bucket

true

Returns the other bucket either in a bucket (named _other_ by default) if named filters are being used, or as the last bucket if anonymous filters are being used

The other_bucket_key parameter can be used to set the key for the other bucket to a value other than the default _other_. Setting this parameter will implicitly set the other_bucket parameter to true.

The following snippet shows a response where the other bucket is requested to be named other_messages.

  1. PUT logs/_doc/4?refresh
  2. {
  3. "body": "info: user Bob logged out"
  4. }
  5. GET logs/_search
  6. {
  7. "size": 0,
  8. "aggs" : {
  9. "messages" : {
  10. "filters" : {
  11. "other_bucket_key": "other_messages",
  12. "filters" : {
  13. "errors" : { "match" : { "body" : "error" }},
  14. "warnings" : { "match" : { "body" : "warning" }}
  15. }
  16. }
  17. }
  18. }
  19. }

The response would be something like the following:

  1. {
  2. "took": 3,
  3. "timed_out": false,
  4. "_shards": ...,
  5. "hits": ...,
  6. "aggregations": {
  7. "messages": {
  8. "buckets": {
  9. "errors": {
  10. "doc_count": 1
  11. },
  12. "warnings": {
  13. "doc_count": 2
  14. },
  15. "other_messages": {
  16. "doc_count": 1
  17. }
  18. }
  19. }
  20. }
  21. }