Adjacency matrix aggregation

Adjacency matrix aggregation

A bucket aggregation returning a form of adjacency matrix. The request provides a collection of named filter expressions, similar to the filters aggregation request. Each bucket in the response represents a non-empty cell in the matrix of intersecting filters.

Given filters named A, B and C the response would return buckets with the following names:

ABC

A

A

A&B

A&C

B

B

B&C

C

C

The intersecting buckets e.g A&C are labelled using a combination of the two filter names with a default separator of &. Note that the response does not also include a C&A bucket as this would be the same set of documents as A&C. The matrix is said to be symmetric so we only return half of it. To do this we sort the filter name strings and always use the lowest of a pair as the value to the left of the separator.

Example

The following interactions aggregation uses adjacency_matrix to determine which groups of individuals exchanged emails.

  1. PUT emails/_bulk?refresh
  2. { "index" : { "_id" : 1 } }
  3. { "accounts" : ["hillary", "sidney"]}
  4. { "index" : { "_id" : 2 } }
  5. { "accounts" : ["hillary", "donald"]}
  6. { "index" : { "_id" : 3 } }
  7. { "accounts" : ["vladimir", "donald"]}
  8. GET emails/_search
  9. {
  10. "size": 0,
  11. "aggs" : {
  12. "interactions" : {
  13. "adjacency_matrix" : {
  14. "filters" : {
  15. "grpA" : { "terms" : { "accounts" : ["hillary", "sidney"] }},
  16. "grpB" : { "terms" : { "accounts" : ["donald", "mitt"] }},
  17. "grpC" : { "terms" : { "accounts" : ["vladimir", "nigel"] }}
  18. }
  19. }
  20. }
  21. }
  22. }

The response contains buckets with document counts for each filter and combination of filters. Buckets with no matching documents are excluded from the response.

  1. {
  2. "took": 9,
  3. "timed_out": false,
  4. "_shards": ...,
  5. "hits": ...,
  6. "aggregations": {
  7. "interactions": {
  8. "buckets": [
  9. {
  10. "key":"grpA",
  11. "doc_count": 2
  12. },
  13. {
  14. "key":"grpA&grpB",
  15. "doc_count": 1
  16. },
  17. {
  18. "key":"grpB",
  19. "doc_count": 2
  20. },
  21. {
  22. "key":"grpB&grpC",
  23. "doc_count": 1
  24. },
  25. {
  26. "key":"grpC",
  27. "doc_count": 1
  28. }
  29. ]
  30. }
  31. }
  32. }

Parameters

filters

(Required, object) Filters used to create buckets.

Properties of filters

separator

(Optional, string) Separator used to concatenate filter names. Defaults to &.

Response body

key

(string) Filters for the bucket. If the bucket uses multiple filters, filter names are concatenated using a separator.

document_count

(integer) Number of documents matching the bucket’s filters.

Usage

On its own this aggregation can provide all of the data required to create an undirected weighted graph. However, when used with child aggregations such as a date_histogram the results can provide the additional levels of data required to perform dynamic network analysis where examining interactions over time becomes important.

Filter limits

For N filters the matrix of buckets produced can be N²/2 and so there is a default maximum imposed of 100 filters . This setting can be changed using the index.max_adjacency_matrix_filters index-level setting (note this setting is deprecated and will be repaced with indices.query.bool.max_clause_count in 8.0+).