Get faster DISTINCT
queries with SkipScan
SkipScan improves query times for DISTINCT
queries. It works on PostgreSQL tables, TimescaleDB hypertables, and TimescaleDB distributed hypertables. SkipScan is included in TimescaleDB 2.2.1 and above.
Speed up DISTINCT
queries
To query your database and find the most recent value of an item, you could use a DISTINCT
query. For example, you might want to find the latest stock or cryptocurrency price for each of your investments. Or you might have graphs and alarms that repeatedly query the most recent values for every device or service.
As your tables get larger, DISTINCT
queries tend to get slower. This is because PostgreSQL does not currently have a good mechanism for pulling a list of unique values from an ordered index. Even when you have an index that matches the exact order and columns for these kinds of queries, PostgreSQL scans the entire index to find all unique values. As a table grows, this operation keeps getting slower.
SkipScan allows queries to incrementally jump from one ordered value to the next without reading all of the rows in between. Without support for this feature, the database engine has to scan the entire ordered index and then de-duplicate at the end, which is a much slower process.
note
PostgreSQL has plans to implement a native feature like SkipScan, but it is unlikely to be included until at least PostgreSQL 15. This section documents TimescaleDB SkipScan, which is not a native PostgreSQL feature.
SkipScan is an optimization for queries of the form SELECT DISTINCT ON column_name
. Conceptually, SkipScan is a regular IndexScan that skips across an index looking for the next value that is greater than the current value.
When you issue a query that uses SkipScan, the EXPLAIN
output includes a new operator, or node, that can quickly return distinct items from a properly ordered index. With an IndexOnly scan, PostgreSQL has to scan the entire index, but SkipScan incrementally searches for each successive item in the ordered index. As it locates one item, the SkipScan node quickly restarts the search for the next item. This is a much more efficient way of finding distinct items in an ordered index.
For benchmarking information on how SkipScan compares to regular DISTINCT
queries, see our SkipScan blog post.
Use SkipScan queries
SkipScan is included in TimescaleDB 2.2.1 and above. This section describes how to set up your database index and query to use a SkipScan node.
Your index must:
- Contain the
DISTINCT
column as the first column. - Be a
BTREE
index. - Match the
ORDER BY
used in your query.
Your query must:
- Use the
DISTINCT
keyword on a single column.
If the DISTINCT
column is not the first column of the index, ensure any leading columns are used as constraints in your query. This means that if you are asking a question such as “retrieve a list of unique IDs in order” and “retrieve the last reading of each ID,” you need at least one index like this:
CREATE INDEX "cpu_customer_tags_id_time_idx" \
ON readings (customer_id, tags_id, time DESC)
With your index set up correctly, you should start to see immediate benefit for DISTINCT
queries. When SkipScan is chosen for your query, the EXPLAIN ANALYZE
output shows one or more Custom Scan (SkipScan)
nodes, like this:
-> Unique
-> Merge Append
Sort Key: _hyper_8_79_chunk.tags_id, _hyper_8_79_chunk."time" DESC
-> Custom Scan (SkipScan) on _hyper_8_79_chunk
-> Index Only Scan using _hyper_8_79_chunk_cpu_tags_id_time_idx on _hyper_8_79_chunk
Index Cond: (tags_id > NULL::integer)
-> Custom Scan (SkipScan) on _hyper_8_80_chunk
-> Index Only Scan using _hyper_8_80_chunk_cpu_tags_id_time_idx on _hyper_8_80_chunk
Index Cond: (tags_id > NULL::integer)