Bucket correlation aggregation
Bucket correlation aggregation
This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.
A sibling pipeline aggregation which executes a correlation function on the configured sibling multi-bucket aggregation.
Parameters
buckets_path
(Required, string) Path to the buckets that contain one set of values to correlate. For syntax, see buckets_path Syntax.
function
(Required, object) The correlation function to execute.
Properties of function
count_correlation
(Required*, object) The configuration to calculate a count correlation. This function is designed for determining the correlation of a term value and a given metric. Consequently, it needs to meet the following requirements.
- The
buckets_path
must point to a_count
metric. - The total count of all the
bucket_path
count values must be less than or equal toindicator.doc_count
. - When utilizing this function, an initial calculation to gather the required
indicator
values is required.
Properties of
count_correlation
indicator
(Required, object) The indicator with which to correlate the configured
bucket_path
values.Properties of
indicator
doc_count
(Required, integer) The total number of documents that initially created the
expectations
. It’s required to be greater than or equal to the sum of all values in thebuckets_path
as this is the originating superset of data to which the term values are correlated.expectations
(Required, array) An array of numbers with which to correlate the configured
bucket_path
values. The length of this value must always equal the number of buckets returned by thebucket_path
.fractions
(Optional, array) An array of fractions to use when averaging and calculating variance. This should be used if the pre-calculated data and the
buckets_path
have known gaps. The length offractions
, if provided, must equalexpectations
.
- The
Syntax
A bucket_correlation
aggregation looks like this in isolation:
{
"bucket_correlation": {
"buckets_path": "range_values>_count",
"function": {
"count_correlation": {
"indicator": {
"expectations": [...],
"doc_count": 10000
}
}
}
}
}
The buckets containing the values to correlate against. | |
The correlation function definition. |
Example
The following snippet correlates the individual terms in the field version
with the latency
metric. Not shown is the pre-calculation of the latency
indicator values, which was done utilizing the percentiles aggregation.
This example is only using the 10s percentiles.
POST correlate_latency/_search?size=0&filter_path=aggregations
{
"aggs": {
"buckets": {
"terms": {
"field": "version",
"size": 2
},
"aggs": {
"latency_ranges": {
"range": {
"field": "latency",
"ranges": [
{ "to": 0.0 },
{ "from": 0, "to": 105 },
{ "from": 105, "to": 225 },
{ "from": 225, "to": 445 },
{ "from": 445, "to": 665 },
{ "from": 665, "to": 885 },
{ "from": 885, "to": 1115 },
{ "from": 1115, "to": 1335 },
{ "from": 1335, "to": 1555 },
{ "from": 1555, "to": 1775 },
{ "from": 1775 }
]
}
},
"bucket_correlation": {
"bucket_correlation": {
"buckets_path": "latency_ranges>_count",
"function": {
"count_correlation": {
"indicator": {
"expectations": [0, 52.5, 165, 335, 555, 775, 1000, 1225, 1445, 1665, 1775],
"doc_count": 200
}
}
}
}
}
}
}
}
}
The term buckets containing a range aggregation and the bucket correlation aggregation. Both are utilized to calculate the correlation of the term values with the latency. | |
The range aggregation on the latency field. The ranges were created referencing the percentiles of the latency field. | |
The bucket correlation aggregation that calculates the correlation of the number of term values within each range and the previously calculated indicator values. |
And the following may be the response:
{
"aggregations" : {
"buckets" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "1.0",
"doc_count" : 100,
"latency_ranges" : {
"buckets" : [
{
"key" : "*-0.0",
"to" : 0.0,
"doc_count" : 0
},
{
"key" : "0.0-105.0",
"from" : 0.0,
"to" : 105.0,
"doc_count" : 1
},
{
"key" : "105.0-225.0",
"from" : 105.0,
"to" : 225.0,
"doc_count" : 9
},
{
"key" : "225.0-445.0",
"from" : 225.0,
"to" : 445.0,
"doc_count" : 0
},
{
"key" : "445.0-665.0",
"from" : 445.0,
"to" : 665.0,
"doc_count" : 0
},
{
"key" : "665.0-885.0",
"from" : 665.0,
"to" : 885.0,
"doc_count" : 0
},
{
"key" : "885.0-1115.0",
"from" : 885.0,
"to" : 1115.0,
"doc_count" : 10
},
{
"key" : "1115.0-1335.0",
"from" : 1115.0,
"to" : 1335.0,
"doc_count" : 20
},
{
"key" : "1335.0-1555.0",
"from" : 1335.0,
"to" : 1555.0,
"doc_count" : 20
},
{
"key" : "1555.0-1775.0",
"from" : 1555.0,
"to" : 1775.0,
"doc_count" : 20
},
{
"key" : "1775.0-*",
"from" : 1775.0,
"doc_count" : 20
}
]
},
"bucket_correlation" : {
"value" : 0.8402398981360937
}
},
{
"key" : "2.0",
"doc_count" : 100,
"latency_ranges" : {
"buckets" : [
{
"key" : "*-0.0",
"to" : 0.0,
"doc_count" : 0
},
{
"key" : "0.0-105.0",
"from" : 0.0,
"to" : 105.0,
"doc_count" : 19
},
{
"key" : "105.0-225.0",
"from" : 105.0,
"to" : 225.0,
"doc_count" : 11
},
{
"key" : "225.0-445.0",
"from" : 225.0,
"to" : 445.0,
"doc_count" : 20
},
{
"key" : "445.0-665.0",
"from" : 445.0,
"to" : 665.0,
"doc_count" : 20
},
{
"key" : "665.0-885.0",
"from" : 665.0,
"to" : 885.0,
"doc_count" : 20
},
{
"key" : "885.0-1115.0",
"from" : 885.0,
"to" : 1115.0,
"doc_count" : 10
},
{
"key" : "1115.0-1335.0",
"from" : 1115.0,
"to" : 1335.0,
"doc_count" : 0
},
{
"key" : "1335.0-1555.0",
"from" : 1335.0,
"to" : 1555.0,
"doc_count" : 0
},
{
"key" : "1555.0-1775.0",
"from" : 1555.0,
"to" : 1775.0,
"doc_count" : 0
},
{
"key" : "1775.0-*",
"from" : 1775.0,
"doc_count" : 0
}
]
},
"bucket_correlation" : {
"value" : -0.5759855613334943
}
}
]
}
}
}