Geocentroid

The OpenSearch geo_centroid aggregation is a powerful tool that allows you to calculate the weighted geographic center or focal point of a set of spatial data points. This metric aggregation operates on geo_point fields and returns the centroid location as a latitude-longitude pair.

Using the aggregation

Follow these steps to use the geo_centroid aggregation:

1. Create an index with a geopoint field

First, you need to create an index with a geo_point field type. This field stores the geographic coordinates you want to analyze. For example, to create an index called restaurants with a location field of type geo_point, use the following request:

  1. PUT /restaurants
  2. {
  3. "mappings": {
  4. "properties": {
  5. "name": {
  6. "type": "text"
  7. },
  8. "location": {
  9. "type": "geo_point"
  10. }
  11. }
  12. }
  13. }

copy

2. Index documents with spatial data

Next, index your documents containing the spatial data points you want to analyze. Make sure to include the geo_point field with the appropriate latitude-longitude coordinates. For example, index your documents using the following request:

  1. POST /restaurants/_bulk?refresh
  2. {"index": {"_id": 1}}
  3. {"name": "Cafe Delish", "location": "40.7128, -74.0059"}
  4. {"index": {"_id": 2}}
  5. {"name": "Tasty Bites", "location": "51.5074, -0.1278"}
  6. {"index": {"_id": 3}}
  7. {"name": "Sushi Palace", "location": "48.8566, 2.3522"}
  8. {"index": {"_id": 4}}
  9. {"name": "Burger Joint", "location": "34.0522, -118.2437"}

copy

3. Run the geo_centroid aggregation

To caluculate the centroid location across all documents, run a search with the geo_centroid aggregation on the geo_point field. For example, use the following request:

  1. GET /restaurants/_search
  2. {
  3. "size": 0,
  4. "aggs": {
  5. "centroid": {
  6. "geo_centroid": {
  7. "field": "location"
  8. }
  9. }
  10. }
  11. }

copy

The response includes a centroid object with lat and lon properties representing the weighted centroid location of all indexed data point, as shown in the following example:

  1. "aggregations": {
  2. "centroid": {
  3. "location": {
  4. "lat": 43.78224998130463,
  5. "lon": -47.506300045643
  6. },
  7. "count": 4

copy

4. Nest under other aggregations (optional)

You can also nest the geo_centroid aggregation under other bucket aggregations, such as terms, to calculate the centroid for subsets of your data. For example, to find the centroid location for each city, use the following request:

  1. GET /restaurants/_search
  2. {
  3. "size": 0,
  4. "aggs": {
  5. "cities": {
  6. "terms": {
  7. "field": "city.keyword"
  8. },
  9. "aggs": {
  10. "centroid": {
  11. "geo_centroid": {
  12. "field": "location"
  13. }
  14. }
  15. }
  16. }
  17. }
  18. }

copy

This returns a centroid location for each city bucket, allowing you to analyze the geographic center of data points in different cities.

Using geo_centroid with the geohash_grid aggregation

The geohash_grid aggregation partitions geospatial data into buckets based on geohash prefixes.

When a document contains multiple geopoint values in a field, the geohash_grid aggregation assigns the document to multiple buckets, even if one or more of its geopoints are outside the bucket boundaries. This behavior is different from how individual geopoints are treated, where only those within the bucket boundaries are considered.

When you nest the geo_centroid aggregation under the geohash_grid aggregation, each centroid is calculated using all geopoints in a bucket, including those that may be outside the bucket boundaries. This can result in centroid locations that fall outside the geographic area represented by the bucket.

Example

In this example, the geohash_grid aggregation with a precision of 3 creates buckets based on geohash prefixes of length 3. Because each document has multiple geopoints, they may be assigned to multiple buckets, even if some of the geopoints fall outside the bucket boundaries.

The geo_centroid subaggregation calculates the centroid for each bucket using all geopoints assigned to that bucket, including those outside the bucket boundaries. This means that the resulting centroid locations may not necessarily lie within the geographic area represented by the corresponding geohash bucket.

First, create an index and index documents containing multiple geopoints:

  1. PUT /locations
  2. {
  3. "mappings": {
  4. "properties": {
  5. "name": {
  6. "type": "text"
  7. },
  8. "coordinates": {
  9. "type": "geo_point"
  10. }
  11. }
  12. }
  13. }
  14. POST /locations/_bulk?refresh
  15. {"index": {"_id": 1}}
  16. {"name": "Point A", "coordinates": ["40.7128, -74.0059", "51.5074, -0.1278"]}
  17. {"index": {"_id": 2}}
  18. {"name": "Point B", "coordinates": ["48.8566, 2.3522", "34.0522, -118.2437"]}

Then, run geohash_grid with the geo_centroid subaggregation:

  1. GET /locations/_search
  2. {
  3. "size": 0,
  4. "aggs": {
  5. "grid": {
  6. "geohash_grid": {
  7. "field": "coordinates",
  8. "precision": 3
  9. },
  10. "aggs": {
  11. "centroid": {
  12. "geo_centroid": {
  13. "field": "coordinates"
  14. }
  15. }
  16. }
  17. }
  18. }
  19. }

copy

<summary> Response </summary> {: .text-delta}

  1. {
  2. "took": 26,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 1,
  6. "successful": 1,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": {
  12. "value": 2,
  13. "relation": "eq"
  14. },
  15. "max_score": null,
  16. "hits": []
  17. },
  18. "aggregations": {
  19. "grid": {
  20. "buckets": [
  21. {
  22. "key": "u09",
  23. "doc_count": 1,
  24. "centroid": {
  25. "location": {
  26. "lat": 41.45439997315407,
  27. "lon": -57.945750039070845
  28. },
  29. "count": 2
  30. }
  31. },
  32. {
  33. "key": "gcp",
  34. "doc_count": 1,
  35. "centroid": {
  36. "location": {
  37. "lat": 46.11009998945519,
  38. "lon": -37.06685005221516
  39. },
  40. "count": 2
  41. }
  42. },
  43. {
  44. "key": "dr5",
  45. "doc_count": 1,
  46. "centroid": {
  47. "location": {
  48. "lat": 46.11009998945519,
  49. "lon": -37.06685005221516
  50. },
  51. "count": 2
  52. }
  53. },
  54. {
  55. "key": "9q5",
  56. "doc_count": 1,
  57. "centroid": {
  58. "location": {
  59. "lat": 41.45439997315407,
  60. "lon": -57.945750039070845
  61. },
  62. "count": 2
  63. }
  64. }
  65. ]
  66. }
  67. }
  68. }

copy