stats_agg() (two variables) functions

stats_agg() (two variables) functions

Introduction

Perform linear regression analysis, for example to calculate correlation coefficient and covariance, on two-dimensional data. You can also calculate common statistics, such as average and standard deviation, on each dimension separately. These functions are similar to the PostgreSQL statistical aggregates, but they include more features and are easier to use in continuous aggregates and window functions. The linear regressions are based on the standard least-squares fitting method.

These functions work on two-dimensional data. To work with one-dimensional data, for example to calculate the average and standard deviation of a single variable, see the one-dimensional stats_agg functions.

stats_agg() (one variable)

Two-step aggregation

Hide content

This group of functions uses the two-step aggregation pattern.

Rather than calculating the final result in one step, you first create an intermediate aggregate by using the aggregate function.

Then, use any of the accessors on the intermediate aggregate to calculate a final result. You can also roll up multiple intermediate aggregates with the rollup functions.

The two-step aggregation pattern has several advantages:

More efficient because multiple accessors can reuse the same aggregate
Easier to reason about performance, because aggregation is separate from final computation
Easier to understand when calculations can be rolled up into larger intervals, especially in window functions and continuous aggregates
Can perform retrospective analysis even when underlying data is dropped, because the intermediate aggregate stores extra information not available in the final result

To learn more, see the blog post on two-step aggregates.

Functions in this group

Aggregate

stats_agg (two variables)

Aggregate data into an intermediate statistical aggregate form for further calculation

Accessor

average_y, average_x

Calculate the average from a two-dimensional statistical aggregate for the dimension specified

corr

Calculate the correlation coefficient from a two-dimensional statistical aggregate

covariance

Calculate the covariance from a two-dimensional statistical aggregate

determination_coeff

Calculate the determination coefficient from a two-dimensional statistical aggregate

intercept

Calculate the intercept from a two-dimensional statistical aggregate

kurtosis_y, kurtosis_x

Calculate the kurtosis from a two-dimensional statistical aggregate for the dimension specified

num_vals

Calculate the number of values in a two-dimensional statistical aggregate

skewness_y, skewness_x

Calculate the skewness from a two-dimensional statistical aggregate for the dimension specified

slope

Calculate the slope from a two-dimensional statistical aggregate

stddev_y, stddev_x

Calculate the standard deviation from a two-dimensional statistical aggregate for the dimension specified

sum_y, sum_x

Calculate the sum from a two-dimensional statistical aggregate for the dimension specified

variance_y, variance_x

Calculate the variance from a two-dimensional statistical aggregate for the dimension specified

x_intercept

Calculate the x-intercept from a two-dimensional statistical aggregate

Rollup

rolling

Combine multiple two-dimensional statistical aggregates to calculate rolling window aggregates

rollup

Combine multiple two-dimensional statistical aggregates

Function details

stats_agg() (two variables)

Stabilized in Toolkit v1.3.0

Hide content

stats_agg(

    y DOUBLE PRECISION,
    x DOUBLE PRECISION

) RETURNS StatsSummary2D

This is the first step for performing any statistical aggregate calculations on two-dimensional data. Use stats_agg to create an intermediate aggregate (StatsSummary2D) from your data. This intermediate form can then be used by one or more accessors in this group to compute the final results. Optionally, multiple such intermediate aggregate objects can be combined using rollup() or rolling() before an accessor is applied.

Required arguments

Name	Type	Description
`y, x`	`DOUBLE PRECISION`	The variables to use for the statistical aggregate.

Returns

Column	Type	Description
`stats_agg`	`StatsSummary2D`	The statistical aggregate, containing data about the variables in an intermediate form. Pass the aggregate to accessor functions in the statistical aggregates API to perform final calculations. Or, pass the aggregate to rollup functions to combine multiple statistical aggregates into larger aggregates.

Examples

Create a two-dimensional statistical aggregate from columns x and y in table foo:


SELECT stats_agg(y, x) FROM foo;

average_y(), average_x()

Stabilized in Toolkit v1.3.0

Hide content

average_y(

  summary StatsSummary 2D

) RETURNS DOUBLE PRECISION

average_x(

  summary StatsSummary 2D

) RETURNS DOUBLE PRECISION

Calculate the average from a two-dimensional aggregate for the given dimension. For example, average_y() calculates the average for all the values of the y variable, independent of the values of the x variable.

Required arguments

Name	Type	Description
`summary`	`StatsSummary2D`	The statistical aggregate produced by a `stats_agg` call

Returns

Column	Type	Description
`average_y`, `average_x`	`DOUBLE PRECISION`	The average of the values in the statistical aggregate

Examples

Calculate the average of the integers from 0 to 100:


SELECT average_x(stats_agg(y, x))
  FROM generate_series(1, 5) y,
       generate_series(0, 100) x;
average
-----------
50

corr()

Stabilized in Toolkit v1.3.0

Hide content

corr(

  summary StatsSummary2D

) RETURNS DOUBLE PRECISION

Calculate the correlation coefficient from a two-dimensional statistical aggregate. The calculation uses the standard least-squares fitting for linear regression.

Required arguments

Name	Type	Description
`summary`	`StatsSummary2D`	The statistical aggregate produced by a `stats_agg` call

Returns

Column	Type	Description
`corr`	`DOUBLE PRECISION`	The correlation coefficient of the least-squares fit line

Examples

Calculate the correlation coefficient of independent variable y and dependent variable x for each 15-minute time bucket:


SELECT
  id,
  time_bucket('15 min'::interval, ts) AS bucket,
  corr(stats_agg(y, x)) AS summary
FROM foo
GROUP BY id, time_bucket('15 min'::interval, ts)

covariance()

Stabilized in Toolkit v1.3.0

Hide content

covariance(

    summary StatsSummary2D,
    [ method TEXT ]

) RETURNS DOUBLE PRECISION

Calculate the covariance from a two-dimensional statistical aggregate. The calculation uses the standard least-squares fitting for linear regression.

Required arguments

Name	Type	Description
`summary`	`StatsSummary2D`	The statistical aggregate produced by a `stats_agg` call

Optional arguments

Name	Type	Description
`method`	`TEXT`	The method used for calculating the covariance. The two options are `population` and `sample`, which can be abbreviated to `pop` or `samp`. Defaults to `sample`.

Returns

Column	Type	Description
`covariance`	`DOUBLE PRECISION`	The covariance of the least-squares fit line

Examples

Calculate the covariance from a two-dimensional statistical aggregate:


SELECT covariance(
    stats_agg(y, x),
    'population'
) FROM foo;

Calculate the covariance with an implicit argument of population:


SELECT covariance(
    stats_agg(y, x)
) FROM foo;

Calculate the covariance of independent variable y and dependent variable x for each 15-minute time bucket:


SELECT
    id,
    time_bucket('15 min'::interval, ts) AS bucket,
    covariance(stats_agg(y, x)) AS summary
FROM foo
GROUP BY id, time_bucket('15 min'::interval, ts)

determination_coeff()

Stabilized in Toolkit v1.3.0

Hide content

determination_coeff(

    summary StatsSummary2D

) RETURNS DOUBLE PRECISION

Calculate the determination coefficient from a two-dimensional statistical aggregate. The calculation uses the standard least-squares fitting for linear regression.

Required arguments

Name	Type	Description
`summary`	`StatsSummary2D`	The statistical aggregate produced by a `stats_agg` call

Returns

Column	Type	Description
`determination_coeff`	`DOUBLE PRECISION`	The determination coefficient of the least-squares fit line

Examples

Calculate the determination coefficient of independent variable y and dependent variable x for each 15-minute time bucket:


SELECT
    id,
    time_bucket('15 min'::interval, ts) AS bucket,
    determination_coeff(stats_agg(y, x)) AS summary
FROM foo
GROUP BY id, time_bucket('15 min'::interval, ts)

intercept()

Stabilized in Toolkit v1.3.0

Hide content

intercept(

    summary StatsSummary2D

) RETURNS DOUBLE PRECISION

Calculate the y intercept from a two-dimensional statistical aggregate. The calculation uses the standard least-squares fitting for linear regression.

Required arguments

Name	Type	Description
`summary`	`StatsSummary2D`	The statistical aggregate produced by a `stats_agg` call

Returns

Column	Type	Description
`intercept`	`DOUBLE PRECISION`	The y intercept of the least-squares fit line

Examples

Calculate the y intercept from independent variable y and dependent variable x for each 15-minute time bucket:


SELECT
    id,
    time_bucket('15 min'::interval, ts) AS bucket,
    intercept(stats_agg(y, x)) AS summary
FROM foo
GROUP BY id, time_bucket('15 min'::interval, ts)

kurtosis_y(), kurtosis_x()

Stabilized in Toolkit v1.3.0

Hide content

kurtosis_y(

  summary StatsSummary2D,
  [ method TEXT ]

) RETURNS DOUBLE PRECISION

kurtosis_x(

  summary StatsSummary2D,
  [ method TEXT ]

) RETURNS DOUBLE PRECISION

Calculate the kurtosis from a two-dimensional statistical aggregate for the given dimension. For example, kurtosis_y() calculates the kurtosis for all the values of the y variable, independent of values of the x variable. The kurtosis is the fourth statistical moment. It is a measure of “tailedness” of a data distribution compared to a normal distribution.

Required arguments

Name	Type	Description
`summary`	`StatsSummary2D`	The statistical aggregate produced by a `stats_agg` call

Optional arguments

Name	Type	Description
`method`	`TEXT`	The method used for calculating the kurtosis. The two options are `population` and `sample`, which can be abbreviated to `pop` or `samp`. Defaults to `sample`.

Returns

Column	Type	Description
`kurtosis_y`, `kurtosis_x`	`DOUBLE PRECISION`	The kurtosis of the values in the statistical aggregate

Examples

Calculate the kurtosis of a sample containing the integers from 0 to 100:


SELECT kurtosis_y(stats_agg(data, data))
  FROM generate_series(0, 100) data;
kurtosis_y
----------
1.78195

num_vals()

Hide content

num_vals(

  summary StatsSummary2D

) RETURNS BIGINT

Calculate the number of values contained in a two-dimensional statistical aggregate.

Required arguments

Name	Type	Description
`summary`	`StatsSummary2D`	The statistical aggregate produced by a `stats_agg` call

Returns

Column	Type	Description
`num_vals`	`DOUBLE PRECISION`	The number of values in the statistical aggregate

Examples

Calculate the number of values from 1 to 5, and from 0 to 100, inclusive:


SELECT num_vals(stats_agg(y, x))
  FROM generate_series(1, 5) y,
       generate_series(0, 100) x;
num_vals
--------
505

skewness_y(), skewness_x()

Stabilized in Toolkit v1.3.0

Hide content

skewness_y(

  summary StatsSummary2D,
  [ method TEXT ]

) RETURNS DOUBLE PRECISION

skewness_x(

  summary StatsSummary2D,
  [ method TEXT ]

) RETURNS DOUBLE PRECISION

Calculate the skewness from a two-dimensional statistical aggregate for the given dimension. For example, skewness_y() calculates the skewness for all the values of the y variable, independent of values of the x variable. The skewness is the third statistical moment. It is a measure of asymmetry in a data distribution.

Required arguments

Name	Type	Description
`summary`	`StatsSummary2D`	The statistical aggregate produced by a `stats_agg` call

Optional arguments

Name	Type	Description
`method`	`TEXT`	The method used for calculating the skewness. The two options are `population` and `sample`, which can be abbreviated to `pop` or `samp`. Defaults to `sample`.

Returns

Column	Type	Description
`skewness_y`, `skewness_x`	`DOUBLE PRECISION`	The skewness of the values in the statistical aggregate

Examples

Calculate the skewness of a sample containing the integers from 0 to 100:


SELECT skewness_x(stats_agg(data, data))
  FROM generate_series(0, 100) data;
skewness_x
----------
0

slope()

Stabilized in Toolkit v1.3.0

Hide content

slope(

    summary StatsSummary2D

) RETURNS DOUBLE PRECISION

Calculate the slope of the linear fitting line from a two-dimensional statistical aggregate. The calculation uses the standard least-squares fitting for linear regression.

Required arguments

Name	Type	Description
`summary`	`StatsSummary2D`	The statistical aggregate produced by a `stats_agg` call

Returns

Column	Type	Description
`slope`	`DOUBLE PRECISION`	The slope of the least-squares fit line

Examples

Calculate the slope from a two-dimensional statistical aggregate:


SELECT slope(
    stats_agg(y, x)
) FROM foo;

Calculate the slope from independent variable y and dependent variable x for each 15-minute time bucket:


SELECT
    id,
    time_bucket('15 min'::interval, ts) AS bucket,
    slope(stats_agg(y, x)) AS summary
FROM foo
GROUP BY id, time_bucket('15 min'::interval, ts)

stddev_y(), stddev_x()

Stabilized in Toolkit v1.3.0

Hide content

stddev_y(

  summary StatsSummary2D,
  [ method TEXT ]

) RETURNS DOUBLE PRECISION

stddev_x(summary

    StatsSummary2D,
    [ method TEXT ]

) RETURNS DOUBLE PRECISION

Calculate the standard deviation from a two-dimensional statistical aggregate for the given dimension. For example, stddev_y() calculates the skewness for all the values of the y variable, independent of values of the x variable.

Required arguments

Name	Type	Description
`summary`	`StatsSummary2D`	The statistical aggregate produced by a `stats_agg` call

Optional arguments

Name	Type	Description
`method`	`TEXT`	The method used for calculating the standard deviation. The two options are `population` and `sample`, which can be abbreviated to `pop` or `samp`. Defaults to `sample`.

Returns

Column	Type	Description
`stddev_y`, `stddev_x`	`DOUBLE PRECISION`	The standard deviation of the values in the statistical aggregate

Examples

Calculate the standard deviation of a sample containing the integers from 0 to 100:


SELECT stddev_y(stats_agg(data, data))
  FROM generate_series(0, 100) data;
stddev_y
--------
29.3002

sum_y(), sum_x()

Stabilized in Toolkit v1.3.0

Hide content

sum_y(

  summary StatsSummary2D

) RETURNS DOUBLE PRECISION

sum_x(

  summary StatsSummary2D

) RETURNS DOUBLE PRECISION

Calculate the sum from a two-dimensional statistical aggregate for the given dimension. For example, sum_y() calculates the skewness for all the values of the y variable, independent of values of the x variable.

Required arguments

Name	Type	Description
`summary`	`StatsSummary2D`	The statistical aggregate produced by a `stats_agg` call

Returns

Column	Type	Description
`sum`	`DOUBLE PRECISION`	The sum of the values in the statistical aggregate

Examples

Calculate the sum of the numbers from 0 to 100:


SELECT sum_y(stats_agg(data, data))
  FROM generate_series(0, 100) data;
sum_y
-----
5050

variance_y(), variance_x()

Stabilized in Toolkit v1.3.0

Hide content

variance_y(

  summary StatsSummary2D,
  [ method TEXT ]

) RETURNS DOUBLE PRECISION

variance_x(summary

    StatsSummary2D,
    [ method TEXT ]

) RETURNS DOUBLE PRECISION

Calculate the variance from a two-dimensional statistical aggregate for the given dimension. For example, variance_y() calculates the skewness for all the values of the y variable, independent of values of the x variable.

Required arguments

Name	Type	Description
`summary`	`StatsSummary2D`	The statistical aggregate produced by a `stats_agg` call

Optional arguments

Name	Type	Description
`method`	`TEXT`	The method used for calculating the standard deviation. The two options are `population` and `sample`, which can be abbreviated to `pop` or `samp`. Defaults to `sample`.

Returns

Column	Type	Description
`variance`	`DOUBLE PRECISION`	The variance of the values in the statistical aggregate

Examples

Calculate the variance of a sample containing the integers from 0 to 100:


SELECT variance_y(stats_agg(data, data))
FROM generate_series(0, 100) data;
variance_y
----------
858.5

x_intercept()

Stabilized in Toolkit v1.3.0

Hide content

x_intercept(

  summary StatsSummary2D

) RETURNS DOUBLE PRECISION

Calculate the x intercept from a two-dimensional statistical aggregate. The calculation uses the standard least-squares fitting for linear regression.

Required arguments

Name	Type	Description
`summary`	`StatsSummary2D`	The statistical aggregate produced by a `stats_agg` call

Returns

Column	Type	Description
`intercept`	`DOUBLE PRECISION`	The x intercept of the least-squares fit line

Examples

Calculate the x intercept from independent variable y and dependent variable x for each 15-minute time bucket:


SELECT
    id,
    time_bucket('15 min'::interval, ts) AS bucket,
    x_intercept(stats_agg(y, x)) AS summary
FROM foo
GROUP BY id, time_bucket('15 min'::interval, ts)

rolling()

Stabilized in Toolkit v1.3.0

Hide content

rolling(

    ss StatsSummary2D

) RETURNS StatsSummary2D

Combine multiple intermediate two-dimensional statistical aggregate (StatsSummary2D) objects into a single StatsSummary2D object. It is optimized for use in a window function context for computing tumbling window statistical aggregates.

This is especially useful for computing tumbling window aggregates from a continuous aggregate. It can be orders of magnitude faster because it uses inverse transition and combine functions, with the possibility that bigger floating point errors can occur in unusual scenarios.

For re-aggregation in a non-window function context, such as combining hourly buckets into daily buckets, see rollup().

Required arguments

Name	Type	Description
`summary`	`StatsSummary2D`	The statistical aggregate produced by a `stats_agg` call

Returns

Column	Type	Description
`rolling`	`StatsSummary2D`	A new statistical aggregate produced by combining the input statistical aggregates

rollup()

Stabilized in Toolkit v1.3.0

Hide content

rolling(

    ss StatsSummary2D

) RETURNS StatsSummary2D

Combine multiple intermediate two-dimensional statistical aggregate (StatsSummary2D) objects into a single StatsSummary2D object. For example, you can use rollup to combine statistical aggregates from 15-minute buckets into daily buckets. For use in window function, see rolling().

Required arguments

Name	Type	Description
`summary`	`StatsSummary2D`	The statistical aggregate produced by a `stats_agg` call

Returns

Column	Type	Description
`rollup`	`StatsSummary2D`	A new statistical aggregate produced by combining the input statistical aggregates

Extended examples

Create a statistical aggregate that summarizes daily statistical data about two variables, val2 and val1, where val2 is the dependent variable and val1 is the independent variable. Use the statistical aggregate to calculate the average of the dependent variable and the slope of the linear-regression fit:


WITH t as (
    SELECT
        time_bucket('1 day'::interval, ts) as dt,
        stats_agg(val2, val1) AS stats2D,
    FROM foo
    WHERE id = 'bar'
    GROUP BY time_bucket('1 day'::interval, ts)
)
SELECT
    average_x(stats2D),
    slope(stats2D)
FROM t;

stats_agg (two variables)

stats_agg() (two variables) functions

Introduction

Related hyperfunction groups

Two-step aggregation

Functions in this group

Aggregate

Accessor

Rollup

Function details

stats_agg() (two variables)

average_y(), average_x()

corr()

covariance()

determination_coeff()

intercept()

kurtosis_y(), kurtosis_x()

num_vals()

skewness_y(), skewness_x()

slope()

stddev_y(), stddev_x()

sum_y(), sum_x()

variance_y(), variance_x()

x_intercept()

rolling()

rollup()

Extended examples