Data retention
An intrinsic part of time-series data is that new data is accumulated and old data is rarely, if ever, updated and the relevance of the data diminishes over time. It is therefore often desirable to delete old data to save disk space.
As an example, if you have a hypertable definition of conditions
where you collect raw data into chunks of one day:
CREATE TABLE conditions(
time TIMESTAMPTZ NOT NULL,
device INTEGER,
temperature FLOAT
);
SELECT * FROM create_hypertable('conditions', 'time',
chunk_time_interval => INTERVAL '1 day');
If you collect a lot of data and realize that you never actually use raw data older than 30 days, you might want to delete data older than 30 days from conditions
.
However, deleting large swaths of data from tables can be costly and slow if done row-by-row using the standard DELETE
command. Instead, TimescaleDB provides a function drop_chunks
that quickly drop data at the granularity of chunks without incurring the same overhead.
For example:
SELECT drop_chunks('conditions', INTERVAL '24 hours');
This will drop all chunks from the hypertable conditions
that only include data older than this duration, and will not delete any individual rows of data in chunks.
Automatic data retention policies
TimescaleDB also includes a background job scheduling framework for automating data management tasks, such as enabling easy data retention policies. With policies, you can set data retention standards on each hypertable and allow TimescaleDB to drop data as necessary.
It’s worth noting that continuous aggregates are also valid targets retention policies.