Post-aggregations

Post-aggregations - 图1info

Apache Druid supports two query languages: Druid SQL and native queries. This document describes the native language. For information about functions available in SQL, refer to the SQL documentation.

Post-aggregations are specifications of processing that should happen on aggregated values as they come out of Apache Druid. If you include a post aggregation as part of a query, make sure to include all aggregators the post-aggregator requires.

There are several post-aggregators available.

Arithmetic post-aggregator

The arithmetic post-aggregator applies the provided function to the given fields from left to right. The fields can be aggregators or other post aggregators.

PropertyDescriptionRequired
typeMust be “arithmetic”.Yes
nameOutput name of the post-aggregationYes
fnSupported functions are +, -, *, /, pow and quotientYes
fieldsList of post-aggregator specs which define inputs to the fnYes
orderingIf no ordering (or null) is specified, the default floating point ordering is used. numericFirst ordering always returns finite values first, followed by NaN, and infinite values last.No

Note:

  • / division always returns 0 if dividing by0, regardless of the numerator.
  • quotient division behaves like regular floating point division
  • Arithmetic post-aggregators always use floating point arithmetic.

Example:

  1. {
  2. "type" : "arithmetic",
  3. "name" : "mult",
  4. "fn" : "*",
  5. "fields": [
  6. {"type": "fieldAccess", "fieldName": "someAgg"},
  7. {"type": "fieldAccess", "fieldName": "someOtherAgg"}
  8. ]
  9. }

Field accessor post-aggregators

These post-aggregators return the value produced by the specified dimension or aggregator.

PropertyDescriptionRequired
typeMust be “fieldAccess” or “finalizingFieldAccess”. Use type “fieldAccess” to return the raw aggregation object, or use type “finalizingFieldAccess” to return a finalized value, such as an estimated cardinality.Yes
nameOutput name of the post-aggregationYes if defined as a standalone post-aggregation, but may be omitted if used inline to some other post-aggregator in a fields list
fieldNameThe output name of the dimension or aggregator to referenceYes

Example:

  1. { "type" : "fieldAccess", "name": "someField", "fieldName" : "someAggregator" }

or

  1. { "type" : "finalizingFieldAccess", "name": "someFinalizedField", "fieldName" : "someAggregator" }

Constant post-aggregator

The constant post-aggregator always returns the specified value.

PropertyDescriptionRequired
typeMust be “constant”Yes
nameOutput name of the post-aggregationYes
valueThe constant valueYes

Example:

  1. { "type" : "constant", "name" : "someConstant", "value" : 1234 }

Expression post-aggregator

The expression post-aggregator is defined using a Druid expression.

PropertyDescriptionRequired
typeMust be “expression”Yes
nameOutput name of the post-aggregationYes
expressionNative Druid expression to compute, may refer to any dimension or aggregator output namesYes
orderingIf no ordering (or null) is specified, the “natural” ordering is used. numericFirst ordering always returns finite values first, followed by NaN, and infinite values last. If the expression produces array or complex types, specify ordering as null and use outputType instead to use the correct type native ordering.No
outputTypeOutput type is optional, and can be any native Druid type: LONG, FLOAT, DOUBLE, STRING, ARRAY types (e.g. ARRAY<LONG>), or COMPLEX types (e.g. COMPLEX<json>). If not specified, the output type will be inferred from the expression. If specified and ordering is null, the type native ordering will be used for sorting values. If the expression produces array or complex types, this value must be non-null to ensure the correct ordering is used. If outputType does not match the actual output type of the expression, the value will be attempted to coerced to the specified type, possibly failing if coercion is not possible.No

Example:

  1. {
  2. "type": "expression",
  3. "name": "someExpression",
  4. "expression": "someAgg + someOtherAgg",
  5. "ordering": null,
  6. "outputType": "LONG"
  7. }

Greatest / Least post-aggregators

doubleGreatest and longGreatest computes the maximum of all fields and Double.NEGATIVE_INFINITY. doubleLeast and longLeast computes the minimum of all fields and Double.POSITIVE_INFINITY.

PropertyDescriptionRequired
typeMust be “doubleGreatest”, “doubleLeast”, “longGreatest”, or “longLeast”.Yes
nameOutput name of the post-aggregationYes
fieldsList of post-aggregator specs which define inputs to the greatest or least functionYes

The difference between the doubleMax aggregator and the doubleGreatest post-aggregator is that doubleMax returns the highest value of all rows for one specific column while doubleGreatest returns the highest value of multiple columns in one row. These are similar to the SQL MAX and GREATEST functions.

Example:

  1. {
  2. "type" : "doubleGreatest",
  3. "name" : "theGreatest",
  4. "fields": [
  5. { "type": "fieldAccess", "fieldName": "someAgg" },
  6. { "type": "fieldAccess", "fieldName": "someOtherAgg" }
  7. ]
  8. }

JavaScript post-aggregator

Applies the provided JavaScript function to the given fields. Fields are passed as arguments to the JavaScript function in the given order.

PropertyDescriptionRequired
typeMust be “javascript”Yes
nameOutput name of the post-aggregationYes
fieldNamesList of input dimension or aggregator output namesYes
functionString javascript function which accepts fieldNames as argumentsYes

Example:

  1. {
  2. "type": "javascript",
  3. "name": "someJavascript",
  4. "fieldNames" : ["someAgg", "someOtherAgg"],
  5. "function": "function(someAgg, someOtherAgg) { return 100 * Math.abs(someAgg) / someOtherAgg;"
  6. }

Post-aggregations - 图2info

JavaScript-based functionality is disabled by default. Please refer to the Druid JavaScript programming guide for guidelines about using Druid’s JavaScript functionality, including instructions on how to enable it.

HyperUnique Cardinality post-aggregator

The hyperUniqueCardinality post aggregator is used to wrap a hyperUnique object such that it can be used in post aggregations.

PropertyDescriptionRequired
typeMust be “hyperUniqueCardinality”Yes
nameOutput name of the post-aggregationYes
fieldNameThe output name of a hyperUnique aggregatorYes
  1. {
  2. "type" : "hyperUniqueCardinality",
  3. "name": "someCardinality",
  4. "fieldName" : "someHyperunique"
  5. }

It can be used in a sample calculation as so:

  1. {
  2. ...
  3. "aggregations" : [{
  4. {"type" : "count", "name" : "rows"},
  5. {"type" : "hyperUnique", "name" : "unique_users", "fieldName" : "uniques"}
  6. }],
  7. "postAggregations" : [{
  8. "type" : "arithmetic",
  9. "name" : "average_users_per_row",
  10. "fn" : "/",
  11. "fields" : [
  12. { "type" : "hyperUniqueCardinality", "fieldName" : "unique_users" },
  13. { "type" : "fieldAccess", "name" : "rows", "fieldName" : "rows" }
  14. ]
  15. }]
  16. ...

This post-aggregator will inherit the rounding behavior of the aggregator it references. Note that this inheritance is only effective if you directly reference an aggregator. Going through another post-aggregator, for example, will cause the user-specified rounding behavior to get lost and default to “no rounding”.

Example Usage

In this example, let’s calculate a simple percentage using post aggregators. Let’s imagine our data set has a metric called “total”.

The format of the query JSON is as follows:

  1. {
  2. ...
  3. "aggregations" : [
  4. { "type" : "count", "name" : "rows" },
  5. { "type" : "doubleSum", "name" : "tot", "fieldName" : "total" }
  6. ],
  7. "postAggregations" : [{
  8. "type" : "arithmetic",
  9. "name" : "average",
  10. "fn" : "/",
  11. "fields" : [
  12. { "type" : "fieldAccess", "name" : "tot", "fieldName" : "tot" },
  13. { "type" : "fieldAccess", "name" : "rows", "fieldName" : "rows" }
  14. ]
  15. }]
  16. ...
  17. }
  1. {
  2. ...
  3. "aggregations" : [
  4. { "type" : "doubleSum", "name" : "tot", "fieldName" : "total" },
  5. { "type" : "doubleSum", "name" : "part", "fieldName" : "part" }
  6. ],
  7. "postAggregations" : [{
  8. "type" : "arithmetic",
  9. "name" : "part_percentage",
  10. "fn" : "*",
  11. "fields" : [
  12. { "type" : "arithmetic",
  13. "name" : "ratio",
  14. "fn" : "/",
  15. "fields" : [
  16. { "type" : "fieldAccess", "name" : "part", "fieldName" : "part" },
  17. { "type" : "fieldAccess", "name" : "tot", "fieldName" : "tot" }
  18. ]
  19. },
  20. { "type" : "constant", "name": "const", "value" : 100 }
  21. ]
  22. }]
  23. ...
  24. }

The same could be computed using an expression post-aggregator:

  1. {
  2. ...
  3. "aggregations" : [
  4. { "type" : "doubleSum", "name" : "tot", "fieldName" : "total" },
  5. { "type" : "doubleSum", "name" : "part", "fieldName" : "part" }
  6. ],
  7. "postAggregations" : [{
  8. "type" : "expression",
  9. "name" : "part_percentage",
  10. "expression" : "100 * (part / tot)"
  11. }]
  12. ...
  13. }