time_weight()

  1. time_weight(
  2. method TEXT,
  3. ts TIMESTAMPTZ,
  4. value DOUBLE PRECISION
  5. ) RETURNS TimeWeightSummary

An aggregate that produces a TimeWeightSummary from timestamps and associated values.

NB: Only two values for method are currently supported: linear and LOCF, and any capitalization is accepted. See interpolation methods for more information.

Required arguments

NameTypeDescription
methodTEXTThe weighting method we should use, options are linear or LOCF, not case sensitive
tsTIMESTAMPTZThe time at each point
valueDOUBLE PRECISIONThe value at each point to use for the time weighted average

Note that ts and value can be null, however the aggregate is not evaluated on null values and will return null, but it will not error on null inputs.

Returns

ColumnTypeDescription
time_weightTimeWeightSummaryA TimeWeightSummary object that can be passed to other functions within the time weighting API

Sample usage

  1. WITH t as (
  2. SELECT
  3. time_bucket('1 day'::interval, ts) as dt,
  4. time_weight('Linear', ts, val) AS tw -- get a time weight summary
  5. FROM foo
  6. WHERE measure_id = 10
  7. GROUP BY time_bucket('1 day'::interval, ts)
  8. )
  9. SELECT
  10. dt,
  11. average(tw) -- extract the average from the time weight summary
  12. FROM t;

Advanced usage notes

Most cases will work out of the box, but for power users, or those who want to dive deeper, we’ve included a bit more context below.

Interpolation methods details

Discrete time values don’t always allow for an obvious calculation of the time weighted average. In order to calculate a time weighted average we need to choose how to weight each value. The two methods we currently use are last observation carried forward (LOCF) and linear interpolation.

In the LOCF approach, the value is treated as if it remains constant until the next value is seen. The LOCF approach is commonly used when the sensor or measurement device sends measurement only when there is a change in value.

The linear interpolation approach treats the values between any two measurements as if they lie on the line connecting the two measurements. The linear interpolation approach is used to account for irregularly sampled data where the sensor doesn’t provide any guarantees.

Parallelism and ordering

The time weighted average calculations we perform require a strict ordering of inputs and therefore the calculations are not parallelizable in the strict Postgres sense. This is because when Postgres does parallelism it hands out rows randomly, basically as it sees them to workers. However, if your parallelism can guarantee disjoint (in time) sets of rows, the algorithm can be parallelized, just so long as within some time range, all rows go to the same worker. This is the case for both continuous aggregates and for distributed hypertables (as long as the partitioning keys are in the group by, though the aggregate itself doesn’t horribly make sense otherwise).

We throw an error if there is an attempt to combine overlapping TimeWeightSummaries, for instance, in our example above, if you were to try to combine summaries across measure_ids it would error. This is because the interpolation techniques really only make sense within a given time series determined by a single measure_id. However, given that the time weighted average produced is a dimensionless quantity, a simple average of time weighted average should better represent the variation across devices, so the recommendation for things like baselines across many timeseries would be something like:

  1. WITH t as (SELECT measure_id,
  2. average(
  3. time_weight('LOCF', ts, val)
  4. ) as time_weighted_average
  5. FROM foo
  6. GROUP BY measure_id)
  7. SELECT avg(time_weighted_average) -- use the normal avg function to average our time weighted averages
  8. FROM t;

Internally, the first and last points seen as well as the calculated weighted sum are stored in each TimeWeightSummary and used to combine with a neighboring TimeWeightSummary when re-aggregation or the Postgres combine function is called. In general, the functions support partial aggregation and partitionwise aggregation in the multinode context, but are not parallelizable (in the Postgres sense, which requires them to accept potentially overlapping input).

Because they require ordered sets, the aggregates build up a buffer of input data, sort it and then perform the proper aggregation steps. In cases where memory is proving to be too small to build up a buffer of points causing OOMs or other issues, a multi-level aggregate can be useful. Following our example from above:

  1. WITH t as (SELECT measure_id,
  2. time_bucket('1 day'::interval, ts),
  3. time_weight('LOCF', ts, val)
  4. FROM foo
  5. GROUP BY measure_id, time_bucket('1 day'::interval, ts)
  6. )
  7. SELECT measure_id,
  8. average(
  9. rollup(time_weight)
  10. )
  11. FROM t
  12. GROUP BY measure_id;