Aggregate Functions

Aggregate Functions

Functions for computing a single result from a set of input values. Elasticsearch SQL supports aggregate functions only alongside grouping (implicit or explicit).

General Purpose

AVG

Synopsis:

  1. AVG(numeric_field)

Input:

numeric field. If this field contains only null values, the function returns null. Otherwise, the function ignores null values in this field.

Output: double numeric value

Description: Returns the Average (arithmetic mean) of input values.

  1. SELECT AVG(salary) AS avg FROM emp;
  2. avg
  3. ---------------
  4. 48248.55
  1. SELECT AVG(salary / 12.0) AS avg FROM emp;
  2. avg
  3. ---------------
  4. 4020.7125

COUNT

Synopsis:

  1. COUNT(expression)

Input:

a field name, wildcard () or any numeric value. For COUNT() or COUNT(<literal>), all values are considered, including null or missing ones. For COUNT(<field_name>), null values are not considered.

Output: numeric value

Description: Returns the total number (count) of input values.

  1. SELECT COUNT(*) AS count FROM emp;
  2. count
  3. ---------------
  4. 100

COUNT(ALL)

Synopsis:

  1. COUNT(ALL field_name)

Input:

a field name. If this field contains only null values, the function returns null. Otherwise, the function ignores null values in this field.

Output: numeric value

Description: Returns the total number (count) of all non-null input values. COUNT(<field_name>) and COUNT(ALL <field_name>) are equivalent.

  1. SELECT COUNT(ALL last_name) AS count_all, COUNT(DISTINCT last_name) count_distinct FROM emp;
  2. count_all | count_distinct
  3. ---------------+------------------
  4. 100 |96
  1. SELECT COUNT(ALL CASE WHEN languages IS NULL THEN -1 ELSE languages END) AS count_all, COUNT(DISTINCT CASE WHEN languages IS NULL THEN -1 ELSE languages END) count_distinct FROM emp;
  2. count_all | count_distinct
  3. ---------------+---------------
  4. 100 |6

COUNT(DISTINCT)

Synopsis:

  1. COUNT(DISTINCT field_name)

Input:

a field name

Output: numeric value. If this field contains only null values, the function returns null. Otherwise, the function ignores null values in this field.

Description: Returns the total number of distinct non-null values in input values.

  1. SELECT COUNT(DISTINCT hire_date) unique_hires, COUNT(hire_date) AS hires FROM emp;
  2. unique_hires | hires
  3. ----------------+---------------
  4. 99 |100
  1. SELECT COUNT(DISTINCT DATE_TRUNC('YEAR', hire_date)) unique_hires, COUNT(DATE_TRUNC('YEAR', hire_date)) AS hires FROM emp;
  2. unique_hires | hires
  3. ---------------+---------------
  4. 14 |100

FIRST/FIRST_VALUE

Synopsis:

  1. FIRST(
  2. field_name
  3. [, ordering_field_name])

Input:

target field for the aggregation

optional field used for ordering

Output: same type as the input

Description: Returns the first non-null value (if such exists) of the field_name input column sorted by the ordering_field_name column. If ordering_field_name is not provided, only the field_name column is used for the sorting. E.g.:

ab

100

1

200

1

1

2

2

2

10

null

20

null

null

null

  1. SELECT FIRST(a) FROM t

will result in:

FIRST(a)

1

and

  1. SELECT FIRST(a, b) FROM t

will result in:

FIRST(a, b)

100

  1. SELECT FIRST(first_name) FROM emp;
  2. FIRST(first_name)
  3. --------------------
  4. Alejandro
  1. SELECT gender, FIRST(first_name) FROM emp GROUP BY gender ORDER BY gender;
  2. gender | FIRST(first_name)
  3. ------------+--------------------
  4. null | Berni
  5. F | Alejandro
  6. M | Amabile
  1. SELECT FIRST(first_name, birth_date) FROM emp;
  2. FIRST(first_name, birth_date)
  3. --------------------------------
  4. Remzi
  1. SELECT gender, FIRST(first_name, birth_date) FROM emp GROUP BY gender ORDER BY gender;
  2. gender | FIRST(first_name, birth_date)
  3. --------------+--------------------------------
  4. null | Lillian
  5. F | Sumant
  6. M | Remzi

FIRST_VALUE is a name alias and can be used instead of FIRST, e.g.:

  1. SELECT gender, FIRST_VALUE(first_name, birth_date) FROM emp GROUP BY gender ORDER BY gender;
  2. gender | FIRST_VALUE(first_name, birth_date)
  3. --------------+--------------------------------------
  4. null | Lillian
  5. F | Sumant
  6. M | Remzi
  1. SELECT gender, FIRST_VALUE(SUBSTRING(first_name, 2, 6), birth_date) AS "first" FROM emp GROUP BY gender ORDER BY gender;
  2. gender | first
  3. ---------------+---------------
  4. null |illian
  5. F |umant
  6. M |emzi

FIRST cannot be used in a HAVING clause.

FIRST cannot be used with columns of type text unless the field is also saved as a keyword.

LAST/LAST_VALUE

Synopsis:

  1. LAST(
  2. field_name
  3. [, ordering_field_name])

Input:

target field for the aggregation

optional field used for ordering

Output: same type as the input

Description: It’s the inverse of FIRST/FIRST_VALUE. Returns the last non-null value (if such exists) of the field_name input column sorted descending by the ordering_field_name column. If ordering_field_name is not provided, only the field_name column is used for the sorting. E.g.:

ab

10

1

20

1

1

2

2

2

100

null

200

null

null

null

  1. SELECT LAST(a) FROM t

will result in:

LAST(a)

200

and

  1. SELECT LAST(a, b) FROM t

will result in:

LAST(a, b)

2

  1. SELECT LAST(first_name) FROM emp;
  2. LAST(first_name)
  3. -------------------
  4. Zvonko
  1. SELECT gender, LAST(first_name) FROM emp GROUP BY gender ORDER BY gender;
  2. gender | LAST(first_name)
  3. ------------+-------------------
  4. null | Patricio
  5. F | Xinglin
  6. M | Zvonko
  1. SELECT LAST(first_name, birth_date) FROM emp;
  2. LAST(first_name, birth_date)
  3. -------------------------------
  4. Hilari
  1. SELECT gender, LAST(first_name, birth_date) FROM emp GROUP BY gender ORDER BY gender;
  2. gender | LAST(first_name, birth_date)
  3. -----------+-------------------------------
  4. null | Eberhardt
  5. F | Valdiodio
  6. M | Hilari

LAST_VALUE is a name alias and can be used instead of LAST, e.g.:

  1. SELECT gender, LAST_VALUE(first_name, birth_date) FROM emp GROUP BY gender ORDER BY gender;
  2. gender | LAST_VALUE(first_name, birth_date)
  3. -----------+-------------------------------------
  4. null | Eberhardt
  5. F | Valdiodio
  6. M | Hilari
  1. SELECT gender, LAST_VALUE(SUBSTRING(first_name, 3, 8), birth_date) AS "last" FROM emp GROUP BY gender ORDER BY gender;
  2. gender | last
  3. ---------------+---------------
  4. null |erhardt
  5. F |ldiodio
  6. M |lari

LAST cannot be used in HAVING clause.

LAST cannot be used with columns of type text unless the field is also saved as a keyword.

MAX

Synopsis:

  1. MAX(field_name)

Input:

a numeric field. If this field contains only null values, the function returns null. Otherwise, the function ignores null values in this field.

Output: same type as the input

Description: Returns the maximum value across input values in the field field_name.

  1. SELECT MAX(salary) AS max FROM emp;
  2. max
  3. ---------------
  4. 74999
  1. SELECT MAX(ABS(salary / -12.0)) AS max FROM emp;
  2. max
  3. -----------------
  4. 6249.916666666667

MAX on a field of type text or keyword is translated into LAST/LAST_VALUE and therefore, it cannot be used in HAVING clause.

MIN

Synopsis:

  1. MIN(field_name)

Input:

a numeric field. If this field contains only null values, the function returns null. Otherwise, the function ignores null values in this field.

Output: same type as the input

Description: Returns the minimum value across input values in the field field_name.

  1. SELECT MIN(salary) AS min FROM emp;
  2. min
  3. ---------------
  4. 25324

MIN on a field of type text or keyword is translated into FIRST/FIRST_VALUE and therefore, it cannot be used in HAVING clause.

SUM

Synopsis:

  1. SUM(field_name)

Input:

a numeric field. If this field contains only null values, the function returns null. Otherwise, the function ignores null values in this field.

Output: bigint for integer input, double for floating points

Description: Returns the sum of input values in the field field_name.

  1. SELECT SUM(salary) AS sum FROM emp;
  2. sum
  3. ---------------
  4. 4824855
  1. SELECT ROUND(SUM(salary / 12.0), 1) AS sum FROM emp;
  2. sum
  3. ---------------
  4. 402071.3

Statistics

KURTOSIS

Synopsis:

  1. KURTOSIS(field_name)

Input:

a numeric field. If this field contains only null values, the function returns null. Otherwise, the function ignores null values in this field.

Output: double numeric value

Description:

Quantify the shape of the distribution of input values in the field field_name.

  1. SELECT MIN(salary) AS min, MAX(salary) AS max, KURTOSIS(salary) AS k FROM emp;
  2. min | max | k
  3. ---------------+---------------+------------------
  4. 25324 |74999 |2.0444718929142986

KURTOSIS cannot be used on top of scalar functions or operators but only directly on a field. So, for example, the following is not allowed and an error is returned:

  1. SELECT KURTOSIS(salary / 12.0), gender FROM emp GROUP BY gender

MAD

Synopsis:

  1. MAD(field_name)

Input:

a numeric field. If this field contains only null values, the function returns null. Otherwise, the function ignores null values in this field.

Output: double numeric value

Description:

Measure the variability of the input values in the field field_name.

  1. SELECT MIN(salary) AS min, MAX(salary) AS max, AVG(salary) AS avg, MAD(salary) AS mad FROM emp;
  2. min | max | avg | mad
  3. ---------------+---------------+---------------+---------------
  4. 25324 |74999 |48248.55 |10096.5
  1. SELECT MIN(salary / 12.0) AS min, MAX(salary / 12.0) AS max, AVG(salary/ 12.0) AS avg, MAD(salary / 12.0) AS mad FROM emp;
  2. min | max | avg | mad
  3. ------------------+-----------------+---------------+-----------------
  4. 2110.3333333333335|6249.916666666667|4020.7125 |841.3750000000002

PERCENTILE

Synopsis:

  1. PERCENTILE(
  2. field_name,
  3. percentile[,
  4. method[,
  5. method_parameter]])

Input:

a numeric field. If this field contains only null values, the function returns null. Otherwise, the function ignores null values in this field.

a numeric expression (must be a constant and not based on a field). If null, the function returns null.

optional string literal for the percentile algorithm. Possible values: tdigest or hdr. Defaults to tdigest.

optional numeric literal that configures the percentile algorithm. Configures compression for tdigest or number_of_significant_value_digits for hdr. The default is the same as that of the backing algorithm.

Output: double numeric value

Description:

Returns the nth percentile (represented by numeric_exp parameter) of input values in the field field_name.

  1. SELECT languages, PERCENTILE(salary, 95) AS "95th" FROM emp
  2. GROUP BY languages;
  3. languages | 95th
  4. ---------------+-----------------
  5. null |74482.4
  6. 1 |71122.8
  7. 2 |70271.4
  8. 3 |71926.0
  9. 4 |69352.15
  10. 5 |56371.0
  1. SELECT languages, PERCENTILE(salary / 12.0, 95) AS "95th" FROM emp
  2. GROUP BY languages;
  3. languages | 95th
  4. ---------------+------------------
  5. null |6206.866666666667
  6. 1 |5926.9
  7. 2 |5855.949999999999
  8. 3 |5993.833333333333
  9. 4 |5779.345833333333
  10. 5 |4697.583333333333
  1. SELECT
  2. languages,
  3. PERCENTILE(salary, 97.3, 'tdigest', 100.0) AS "97.3_TDigest",
  4. PERCENTILE(salary, 97.3, 'hdr', 3) AS "97.3_HDR"
  5. FROM emp
  6. GROUP BY languages;
  7. languages | 97.3_TDigest | 97.3_HDR
  8. ---------------+-----------------+---------------
  9. null |74720.036 |74992.0
  10. 1 |72316.132 |73712.0
  11. 2 |71792.436 |69936.0
  12. 3 |73326.23999999999|74992.0
  13. 4 |71753.281 |74608.0
  14. 5 |61176.16000000001|56368.0

PERCENTILE_RANK

Synopsis:

  1. PERCENTILE_RANK(
  2. field_name,
  3. value[,
  4. method[,
  5. method_parameter]])

Input:

a numeric field. If this field contains only null values, the function returns null. Otherwise, the function ignores null values in this field.

a numeric expression (must be a constant and not based on a field). If null, the function returns null.

optional string literal for the percentile algorithm. Possible values: tdigest or hdr. Defaults to tdigest.

optional numeric literal that configures the percentile algorithm. Configures compression for tdigest or number_of_significant_value_digits for hdr. The default is the same as that of the backing algorithm.

Output: double numeric value

Description:

Returns the nth percentile rank (represented by numeric_exp parameter) of input values in the field field_name.

  1. SELECT languages, PERCENTILE_RANK(salary, 65000) AS rank FROM emp GROUP BY languages;
  2. languages | rank
  3. ---------------+-----------------
  4. null |73.65766569962062
  5. 1 |73.7291625157734
  6. 2 |88.88005607010643
  7. 3 |79.43662623295829
  8. 4 |85.70446389643493
  9. 5 |96.79075152940749
  1. SELECT languages, PERCENTILE_RANK(salary/12, 5000) AS rank FROM emp GROUP BY languages;
  2. languages | rank
  3. ---------------+------------------
  4. null |66.91240875912409
  5. 1 |66.70766707667076
  6. 2 |84.13266895048271
  7. 3 |61.052992625621684
  8. 4 |76.55646443990001
  9. 5 |94.00696864111498
  1. SELECT
  2. languages,
  3. ROUND(PERCENTILE_RANK(salary, 65000, 'tdigest', 100.0), 2) AS "rank_TDigest",
  4. ROUND(PERCENTILE_RANK(salary, 65000, 'hdr', 3), 2) AS "rank_HDR"
  5. FROM emp
  6. GROUP BY languages;
  7. languages | rank_TDigest | rank_HDR
  8. ---------------+---------------+---------------
  9. null |73.66 |80.0
  10. 1 |73.73 |73.33
  11. 2 |88.88 |89.47
  12. 3 |79.44 |76.47
  13. 4 |85.7 |83.33
  14. 5 |96.79 |95.24

SKEWNESS

Synopsis:

  1. SKEWNESS(field_name)

Input:

a numeric field. If this field contains only null values, the function returns null. Otherwise, the function ignores null values in this field.

Output: double numeric value

Description:

Quantify the asymmetric distribution of input values in the field field_name.

  1. SELECT MIN(salary) AS min, MAX(salary) AS max, SKEWNESS(salary) AS s FROM emp;
  2. min | max | s
  3. ---------------+---------------+------------------
  4. 25324 |74999 |0.2707722118423227

SKEWNESS cannot be used on top of scalar functions but only directly on a field. So, for example, the following is not allowed and an error is returned:

  1. SELECT SKEWNESS(ROUND(salary / 12.0, 2), gender FROM emp GROUP BY gender

STDDEV_POP

Synopsis:

  1. STDDEV_POP(field_name)

Input:

a numeric field. If this field contains only null values, the function returns null. Otherwise, the function ignores null values in this field.

Output: double numeric value

Description:

Returns the population standard deviation of input values in the field field_name.

  1. SELECT MIN(salary) AS min, MAX(salary) AS max, STDDEV_POP(salary) AS stddev FROM emp;
  2. min | max | stddev
  3. ---------------+---------------+------------------
  4. 25324 |74999 |13765.125502787832
  1. SELECT MIN(salary / 12.0) AS min, MAX(salary / 12.0) AS max, STDDEV_POP(salary / 12.0) AS stddev FROM emp;
  2. min | max | stddev
  3. ------------------+-----------------+-----------------
  4. 2110.3333333333335|6249.916666666667|1147.093791898986

STDDEV_SAMP

Synopsis:

  1. STDDEV_SAMP(field_name)

Input:

a numeric field. If this field contains only null values, the function returns null. Otherwise, the function ignores null values in this field.

Output: double numeric value

Description:

Returns the sample standard deviation of input values in the field field_name.

  1. SELECT MIN(salary) AS min, MAX(salary) AS max, STDDEV_SAMP(salary) AS stddev FROM emp;
  2. min | max | stddev
  3. ---------------+---------------+------------------
  4. 25324 |74999 |13834.471662090747
  1. SELECT MIN(salary / 12.0) AS min, MAX(salary / 12.0) AS max, STDDEV_SAMP(salary / 12.0) AS stddev FROM emp;
  2. min | max | stddev
  3. ------------------+-----------------+-----------------
  4. 2110.3333333333335|6249.916666666667|1152.872638507562

SUM_OF_SQUARES

Synopsis:

  1. SUM_OF_SQUARES(field_name)

Input:

a numeric field. If this field contains only null values, the function returns null. Otherwise, the function ignores null values in this field.

Output: double numeric value

Description:

Returns the sum of squares of input values in the field field_name.

  1. SELECT MIN(salary) AS min, MAX(salary) AS max, SUM_OF_SQUARES(salary) AS sumsq
  2. FROM emp;
  3. min | max | sumsq
  4. ---------------+---------------+----------------
  5. 25324 |74999 |2.51740125721E11
  1. SELECT MIN(salary / 24.0) AS min, MAX(salary / 24.0) AS max, SUM_OF_SQUARES(salary / 24.0) AS sumsq FROM emp;
  2. min | max | sumsq
  3. ------------------+------------------+-------------------
  4. 1055.1666666666667|3124.9583333333335|4.370488293767361E8

VAR_POP

Synopsis:

  1. VAR_POP(field_name)

Input:

a numeric field. If this field contains only null values, the function returns null. Otherwise, the function ignores null values in this field.

Output: double numeric value

Description:

Returns the population variance of input values in the field field_name.

  1. SELECT MIN(salary) AS min, MAX(salary) AS max, VAR_POP(salary) AS varpop FROM emp;
  2. min | max | varpop
  3. ---------------+---------------+----------------
  4. 25324 |74999 |1.894786801075E8
  1. SELECT MIN(salary / 24.0) AS min, MAX(salary / 24.0) AS max, VAR_POP(salary / 24.0) AS varpop FROM emp;
  2. min | max | varpop
  3. ------------------+------------------+------------------
  4. 1055.1666666666667|3124.9583333333335|328956.04185329855

VAR_SAMP

Synopsis:

  1. VAR_SAMP(field_name)

Input:

a numeric field. If this field contains only null values, the function returns null. Otherwise, the function ignores null values in this field.

Output: double numeric value

Description:

Returns the sample variance of input values in the field field_name.

  1. SELECT MIN(salary) AS min, MAX(salary) AS max, VAR_SAMP(salary) AS varsamp FROM emp;
  2. min | max | varsamp
  3. ---------------+---------------+----------------
  4. 25324 |74999 |1.913926061691E8
  1. SELECT MIN(salary / 24.0) AS min, MAX(salary / 24.0) AS max, VAR_SAMP(salary / 24.0) AS varsamp FROM emp;
  2. min | max | varsamp
  3. ------------------+------------------+----------------
  4. 1055.1666666666667|3124.9583333333335|332278.830154847