Star-tree field type

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, join the discussion on the OpenSearch forum.

A star-tree index precomputes aggregations, accelerating the performance of aggregation queries. If a star-tree index is configured as part of an index mapping, the star-tree index is created and maintained as data is ingested in real time.

OpenSearch will automatically use the star-tree index to optimize aggregations if the queried fields are part of star-tree index dimension fields and the aggregations are on star-tree index metric fields. No changes are required in the query syntax or the request parameters.

For more information, see Star-tree index.

Prerequisites

To use a star-tree index, follow the instructions in Enabling a star-tree index.

Examples

The following examples show how to use a star-tree index.

Star-tree index mappings

Define star-tree index mappings in the composite section in mappings.

The following example API request creates a corresponding star-tree index namedrequest_aggs. To compute metric aggregations for request_size and latency fields with queries on port and status fields, configure the following mappings:

  1. PUT logs
  2. {
  3. "settings": {
  4. "index.number_of_shards": 1,
  5. "index.number_of_replicas": 0,
  6. "index.composite_index": true
  7. },
  8. "mappings": {
  9. "composite": {
  10. "request_aggs": {
  11. "type": "star_tree",
  12. "config": {
  13. "max_leaf_docs": 10000,
  14. "skip_star_node_creation_for_dimensions": [
  15. "port"
  16. ],
  17. "ordered_dimensions": [
  18. {
  19. "name": "status"
  20. },
  21. {
  22. "name": "port"
  23. }
  24. ],
  25. "metrics": [
  26. {
  27. "name": "request_size",
  28. "stats": [
  29. "sum",
  30. "value_count",
  31. "min",
  32. "max"
  33. ]
  34. },
  35. {
  36. "name": "latency",
  37. "stats": [
  38. "sum",
  39. "value_count",
  40. "min",
  41. "max"
  42. ]
  43. }
  44. ]
  45. }
  46. }
  47. },
  48. "properties": {
  49. "status": {
  50. "type": "integer"
  51. },
  52. "port": {
  53. "type": "integer"
  54. },
  55. "request_size": {
  56. "type": "integer"
  57. },
  58. "latency": {
  59. "type": "scaled_float",
  60. "scaling_factor": 10
  61. }
  62. }
  63. }
  64. }

Star-tree index configuration options

You can customize your star-tree implementation using the following config options in the mappings section. These options cannot be modified without reindexing.

ParameterDescription
ordered_dimensionsA list of fields based on which metrics will be aggregated in a star-tree index. Required.
metricsA list of metric fields required in order to perform aggregations. Required.
max_leaf_docsThe maximum number of star-tree documents that a leaf node can point to. After the maximum number of documents is reached, child nodes will be created based on the unique value of the next field in the ordered_dimension (if any). Default is 10000. A lower value will use more storage but result in faster query performance. Inversely, a higher value will use less storage but result in slower query performance. For more information, see Star-tree indexing structure.
skip_star_node_creation_for_dimensionsA list of dimensions for which a star-tree index will skip star node creation. When true, this reduces storage size at the expense of query performance. Default is false. For more information about star nodes, see Star-tree indexing structure.

Ordered dimensions

The ordered_dimensions parameter contains fields based on which metrics will be aggregated in a star-tree index. The star-tree index will be selected for querying only if all the fields in the query are part of the ordered_dimensions.

When using the ordered_dimesions parameter, follow these best practices:

  • The order of dimensions matters. You can define the dimensions ordered from the highest cardinality to the lowest cardinality for efficient storage and query pruning.
  • Avoid using high-cardinality fields as dimensions. High-cardinality fields adversely affect storage space, indexing throughput, and query performance.
  • Currently, fields supported by the ordered_dimensions parameter are all numeric field types, with the exception of unsigned_long. For more information, see GitHub issue #15231.
  • Support for other field types, such as keyword and ip, will be added in future versions. For more information, see GitHub issue #16232.
  • A minimum of 2 and a maximum of 10 dimensions are supported per star-tree index.

The ordered_dimensions parameter supports the following property.

ParameterRequired/OptionalDescription
nameRequiredThe name of the field. The field name should be present in the properties section as part of the index mapping. Ensure that the doc_values setting is enabled for any associated fields.

Metrics

Configure any metric fields on which you need to perform aggregations. Metrics are required as part of a star-tree index configuration.

When using metrics, follow these best practices:

  • Currently, fields supported by metrics are all numeric field types, with the exception of unsigned_long. For more information, see GitHub issue #15231.
  • Supported metric aggregations include Min, Max, Sum, Avg, and Value_count.
    • Avg is a derived metric based on Sum and Value_count and is not indexed when a query is run. The remaining base metrics are indexed.
  • A maximum of 100 base metrics are supported per star-tree index.

If Min, Max, Sum, and Value_count are defined as metrics for each field, then up to 25 such fields can be configured, as shown in the following example:

  1. {
  2. "metrics": [
  3. {
  4. "name": "field1",
  5. "stats": [
  6. "sum",
  7. "value_count",
  8. "min",
  9. "max"
  10. ],
  11. ...,
  12. ...,
  13. "name": "field25",
  14. "stats": [
  15. "sum",
  16. "value_count",
  17. "min",
  18. "max"
  19. ]
  20. }
  21. ]
  22. }

Properties

The metrics parameter supports the following properties.

ParameterRequired/OptionalDescription
nameRequiredThe name of the field. The field name should be present in the properties section as part of the index mapping. Ensure that the doc_values setting is enabled for any associated fields.
statsOptionalA list of metric aggregations computed for each field. You can choose between Min, Max, Sum, Avg, and Value Count.
Default is Sum and Value_count.
Avg is a derived metric statistic that will automatically be supported in queries if Sum and Value_Count are present as part of metric stats.

Supported queries and aggregations

For more information about supported queries and aggregations, see Supported queries and aggregations for a star-tree index.