Weighted average

The weighted_avg aggregation calculates the weighted average of numeric values across documents. This is useful when you want to calculate an average but weight some data points more heavily than others.

Weighted average calculation

The weighted average is calculated as (sum of value * weight) / (sum of weights).

Parameters

When using the weighted_avg aggregation, you must define the following parameters:

  • value: The field or script used to obtain the average numeric values
  • weight: The field or script used to obtain the weight for each value

Optionally, you can specify the following parameters:

  • format: A numeric format to apply to the output value
  • value_type: A type hint for the values when using scripts or unmapped fields

For the value or weight, you can specify the following parameters:

  • field: The document field to use
  • missing: A value or weight to use if the field is missing

Using the aggregation

Follow these steps to use the weighted_avg aggregation:

1. Create an index and index some documents

  1. PUT /products
  2. POST /products/_doc/1
  3. {
  4. "name": "Product A",
  5. "rating": 4,
  6. "num_reviews": 100
  7. }
  8. POST /products/_doc/2
  9. {
  10. "name": "Product B",
  11. "rating": 5,
  12. "num_reviews": 20
  13. }
  14. POST /products/_doc/3
  15. {
  16. "name": "Product C",
  17. "rating": 3,
  18. "num_reviews": 50
  19. }

copy

2. Run the weighted_avg aggregation

  1. GET /products/_search
  2. {
  3. "size": 0,
  4. "aggs": {
  5. "weighted_rating": {
  6. "weighted_avg": {
  7. "value": {
  8. "field": "rating"
  9. },
  10. "weight": {
  11. "field": "num_reviews"
  12. }
  13. }
  14. }
  15. }
  16. }

copy

Handling missing values

The missing parameter allows you to specify default values for documents missing the value field or the weight field instead of excluding them from the calculation.

The following is an example of this behavior. First, create an index and add sample documents. This example includes five documents with different combinations of missing values for the rating and num_reviews fields:

  1. PUT /products
  2. {
  3. "mappings": {
  4. "properties": {
  5. "name": {
  6. "type": "text"
  7. },
  8. "rating": {
  9. "type": "double"
  10. },
  11. "num_reviews": {
  12. "type": "integer"
  13. }
  14. }
  15. }
  16. }
  17. POST /_bulk
  18. { "index": { "_index": "products" } }
  19. { "name": "Product A", "rating": 4.5, "num_reviews": 100 }
  20. { "index": { "_index": "products" } }
  21. { "name": "Product B", "rating": 3.8, "num_reviews": 50 }
  22. { "index": { "_index": "products" } }
  23. { "name": "Product C", "rating": null, "num_reviews": 20 }
  24. { "index": { "_index": "products" } }
  25. { "name": "Product D", "rating": 4.2, "num_reviews": null }
  26. { "index": { "_index": "products" } }
  27. { "name": "Product E", "rating": null, "num_reviews": null }

copy

Next, run the following weighted_avg aggregation:

  1. GET /products/_search
  2. {
  3. "size": 0,
  4. "aggs": {
  5. "weighted_rating": {
  6. "weighted_avg": {
  7. "value": {
  8. "field": "rating"
  9. },
  10. "weight": {
  11. "field": "num_reviews"
  12. }
  13. }
  14. }
  15. }
  16. }

copy

In the response, you can see that the missing values for Product E were completely ignored in the calculation.