Percentile approximation
- Using percentile approximation in TimescaleDB

Percentile approximation

Examining time-series data through percentiles is useful for understanding the distribution of your time-series data. Specifically, they can help eliminate the inherent impact that outliers have on calculations such as average. For instance the 50% percentile (median) of the data can be a more useful measure than average when there are outliers that would dramatically impact the average, but have a much smaller impact on the median. The median or 50th percentile means that in an ordered list of your data half of the data will have a greater value and half a smaller value. Likewise, the 10th percentile would mean that 10% fall below and 90% above the value returned.

Often the 95th or 99th percentile can be very useful in identifying normalized trends in networking and monitoring applications. For instance, when a user reports that your website is taking 30 second to load, it’s helpful to quickly identify that 99% of requests occur in 200ms or less, which means that this specific report is an outlier and likely caused by extraordinary conditions.

By using percentiles, outliers have less of an impact on the calculations because their magnitude doesn’t affect their percentile, only their order in the set. Therefore, the skew that is introduced to calculations like AVG() by infrequent very large or very small values is reduced or eliminated.

We provide percentile approximation functions because exact percentiles are not parallelizable, cannot be used with continuous aggregates and would be very inefficient when used with multi-node TimescaleDB. Our percentile approximation algorithm provide good estimates of percentiles while integrating much more fully with all these other TimescaleDB features.

Using percentile approximation in TimescaleDB

tip

In order to use functions in the TimescaleDB Toolkit, ensure that the extension is installed and available within your database.

Percentiles in TimescaleDB are calculated in two steps. First, we must create a percentile estimator which can be created using either percentile_agg(), or one of the advanced aggregation methods uddsketch() or tdigest(). Estimators can be combined or re-aggregated using the rollupfunction.

Once the estimator is created, the desired values can be obtained by using the aggregate result as input to the following functions:

Additionally, the output of the aggregation methods can be stored as part of a continuous aggregate for re-aggregation using the above value functions.