RANGE QUERY

Querying and aggregating data within a range of time is a common query pattern for time series data, such as the Range selector in PromQL. GreptimeDB supports Range queries in SQL, which is used to summarize time series data into time chunks and aggregate data on time chunks. As part of the SELECT statement, Range query can be flexibly combined with SQL to provide more powerful time series data query capabilities in SQL.

Syntax

Range query uses Time Index column as the timeline basis for aggregation. A legal Range query syntax structure is as follows:

  1. SELECT
  2. AGGR_FUNCTION(column1, column2,..) RANGE INTERVAL [FILL FILL_OPTION],
  3. ...
  4. FROM table_name
  5. [ WHERE <where_clause>]
  6. ALIGN INTERVAL [ TO TO_OPTION ] [BY (columna, columnb,..)] [FILL FILL_OPTION]
  7. [ ORDER BY <order_by_clause>]
  8. [ LIMIT <limit_clause>];
  9. INTERVAL := TIME_INTERVAL | ( INTERVAL expr )
  • Keyword ALIGN, required field, followed by parameter INTERVAL, ALIGN specifies the step of Range query.
    • Subkeyword TO, optional field, specifies the time point to which Range query is aligned. For legal TO_OPTION parameters, see TO Option.
    • Subkeyword BY, optional field, followed by parameter (columna, columnb,..), describes the aggregate key. For legal BY_OPTION parameters, see BY Option.
  • The parameter INTERVAL is mainly used to give the length of a period of time. There are two parameter forms:
    • Strings based on the PromQL Time Durations format (eg: 3h, 1h30m). Visit the Prometheus documentation for a more detailed description of this format.
    • Interval type. To use the Interval type, you need to carry parentheses, (for example: (INTERVAL '1 year 3 hours 20 minutes')). Visit Interval for a more detailed description of this format.
  • AGGR_FUNCTION(column1, column2,..) RANGE INTERVAL [FILL FILL_OPTION] is called a Range expression.
    • AGGR_FUNCTION(column1, column2,..) is an aggregate function that represents the expression that needs to be aggregated.
    • Keyword RANGE, required field, followed by parameter INTERVAL specifies the time range of each data aggregation.
    • Keyword FILL, optional field, please see FILL Option for details.
    • Range expressions can be combined with other operations to implement more complex queries. For details, see Nested Range Expressions.
  • FILL keyword after ALIGN, optional field. See FILL Option for details.

FILL Option

The FILL option specifies the data filling method when there is no data in an aggregated time slot, or the value of the aggregate field is empty.

The FILL keyword can be used after the RANGE keyword to indicate the filling method for the Range expression. The FILL keyword can also be used after the ALIGN keyword to specify the default filling method for a Range expression if no fill option is provided.

For example, in the following SQL code, the max(val) RANGE '10s' Range expression uses the fill option LINEAR, while the min(val) RANGE '10s' Range expression, which does not specify a fill option, uses the fill option PREV specified after the ALIGN keyword.

  1. SELECT
  2. ts,
  3. host,
  4. min(val) RANGE '10s',
  5. max(val) RANGE '10s' FILL LINEAR
  6. FROM host_cpu
  7. ALIGN '5s' BY (host) FILL PREV;

FILL has the following filling methods:

FILLDESCRIPTION
NULLFill directly with NULL
PREVFill with data from previous point
LINEARUse linear interpolation to fill the data. If an integer type is filled with LINEAR, the variable type of the column will be implicitly converted to a floating point type during calculation
XFill in a constant, the data type of the constant must be consistent with the variable type of the Range expression

Take the following table as an example:

  1. +---------------------+-------+------+
  2. | ts | host | val |
  3. +---------------------+-------+------+
  4. | 1970-01-01 00:00:00 | host1 | 0 |
  5. | 1970-01-01 00:00:15 | host1 | 6 |
  6. | 1970-01-01 00:00:00 | host2 | 6 |
  7. | 1970-01-01 00:00:15 | host2 | 12 |
  8. +---------------------+-------+------+

The result of each FILL option is as follows:

  • NO FILL
  • FILL NULL
  • FILL PREV
  • FILL LINEAR
  • FILL Constant Value 6.0
  1. > SELECT ts, host, min(val) RANGE '5s' FROM host ALIGN '5s';
  2. +---------------------+-------+------------------------+
  3. | ts | host | MIN(host.val) RANGE 5s |
  4. +---------------------+-------+------------------------+
  5. | 1970-01-01 00:00:00 | host1 | 0 |
  6. | 1970-01-01 00:00:15 | host1 | 6 |
  7. | 1970-01-01 00:00:00 | host2 | 6 |
  8. | 1970-01-01 00:00:15 | host2 | 12 |
  9. +---------------------+-------+------------------------+
  1. > SELECT ts, host, min(val) RANGE '5s' FILL NULL FROM host ALIGN '5s';
  2. +---------------------+-------+----------------------------------+
  3. | ts | host | MIN(host.val) RANGE 5s FILL NULL |
  4. +---------------------+-------+----------------------------------+
  5. | 1970-01-01 00:00:00 | host1 | 0 |
  6. | 1970-01-01 00:00:05 | host1 | NULL |
  7. | 1970-01-01 00:00:10 | host1 | NULL |
  8. | 1970-01-01 00:00:15 | host1 | 6 |
  9. | 1970-01-01 00:00:00 | host2 | 6 |
  10. | 1970-01-01 00:00:05 | host2 | NULL |
  11. | 1970-01-01 00:00:10 | host2 | NULL |
  12. | 1970-01-01 00:00:15 | host2 | 12 |
  13. +---------------------+-------+----------------------------------+
  1. > SELECT ts, host, min(val) RANGE '5s' FILL PREV FROM host ALIGN '5s';
  2. +---------------------+-------+----------------------------------+
  3. | ts | host | MIN(host.val) RANGE 5s FILL PREV |
  4. +---------------------+-------+----------------------------------+
  5. | 1970-01-01 00:00:00 | host1 | 0 |
  6. | 1970-01-01 00:00:05 | host1 | 0 |
  7. | 1970-01-01 00:00:10 | host1 | 0 |
  8. | 1970-01-01 00:00:15 | host1 | 6 |
  9. | 1970-01-01 00:00:00 | host2 | 6 |
  10. | 1970-01-01 00:00:05 | host2 | 6 |
  11. | 1970-01-01 00:00:10 | host2 | 6 |
  12. | 1970-01-01 00:00:15 | host2 | 12 |
  13. +---------------------+-------+----------------------------------+
  1. > SELECT ts, host, min(val) RANGE '5s' FILL LINEAR FROM host ALIGN '5s';
  2. +---------------------+-------+------------------------------------+
  3. | ts | host | MIN(host.val) RANGE 5s FILL LINEAR |
  4. +---------------------+-------+------------------------------------+
  5. | 1970-01-01 00:00:00 | host1 | 0 |
  6. | 1970-01-01 00:00:05 | host1 | 2 |
  7. | 1970-01-01 00:00:10 | host1 | 4 |
  8. | 1970-01-01 00:00:15 | host1 | 6 |
  9. | 1970-01-01 00:00:00 | host2 | 6 |
  10. | 1970-01-01 00:00:05 | host2 | 8 |
  11. | 1970-01-01 00:00:10 | host2 | 10 |
  12. | 1970-01-01 00:00:15 | host2 | 12 |
  13. +---------------------+-------+------------------------------------+
  1. > SELECT ts, host, min(val) RANGE '5s' FILL 6 FROM host ALIGN '5s';
  2. +---------------------+-------+-------------------------------+
  3. | ts | host | MIN(host.val) RANGE 5s FILL 6 |
  4. +---------------------+-------+-------------------------------+
  5. | 1970-01-01 00:00:00 | host1 | 0 |
  6. | 1970-01-01 00:00:05 | host1 | 6 |
  7. | 1970-01-01 00:00:10 | host1 | 6 |
  8. | 1970-01-01 00:00:15 | host1 | 6 |
  9. | 1970-01-01 00:00:00 | host2 | 6 |
  10. | 1970-01-01 00:00:05 | host2 | 6 |
  11. | 1970-01-01 00:00:10 | host2 | 6 |
  12. | 1970-01-01 00:00:15 | host2 | 12 |
  13. +---------------------+-------+-------------------------------+

If there are multiple Range expressions but only one of them specifies the Fill option, the other Range expressions will use the FILL NULL method to fill in the missing time slots. The following two SQL statements are equivalent in output:

  1. SELECT
  2. ts,
  3. host,
  4. min(val) RANGE '10s',
  5. max(val) RANGE '10s' FILL LINEAR
  6. FROM host_cpu
  7. ALIGN '5s';
  1. SELECT
  2. ts,
  3. host,
  4. min(val) RANGE '10s' FILL NULL,
  5. max(val) RANGE '10s' FILL LINEAR
  6. FROM host_cpu
  7. ALIGN '5s';

TO Option

The TO keyword specifies the origin time point to which the range query is aligned. TO option along with RANGE option and ALIGN INTERVAL determine the time range windows. Please refer to Time Range Window for details.

The default value of the TO option is the current query client timezone. To set the timezone, please refer to MySQL client or PostgreSQL client. Other valid TO options are:

TODESCRIPTION
NOWAlign to current query time
TimestampAlign to a user-specified timestamp, supports timestamp format RFC3339 / ISO8601

Suppose we have a tale host with the following data:

  1. +---------------------+-------+------+
  2. | ts | host | val |
  3. +---------------------+-------+------+
  4. | 2023-01-01 23:00:00 | host1 | 0 |
  5. | 2023-01-02 01:00:00 | host1 | 1 |
  6. | 2023-01-01 23:00:00 | host2 | 2 |
  7. | 2023-01-02 01:00:00 | host2 | 3 |
  8. +---------------------+-------+------+

The query results by each TO options shown below:

  • Default to timezone
  • NOW
  • Specific Timestamp
  1. -- Querying the database timezone using the MySQL protocol, currently in the UTC timezone
  2. > SELECT @@time_zone;
  3. +-------------+
  4. | @@time_zone |
  5. +-------------+
  6. | UTC |
  7. +-------------+
  8. -- If we do not specify the `TO` keyword,
  9. -- the timezone will be used as the default origin alignment time.
  10. > SELECT ts, host, min(val) RANGE '1d' FROM host ALIGN '1d';
  11. +---------------------+-------+----------------------------------+
  12. | ts | host | MIN(host.val) RANGE 1d FILL NULL |
  13. +---------------------+-------+----------------------------------+
  14. | 2023-01-01 00:00:00 | host2 | 2 |
  15. | 2023-01-01 00:00:00 | host1 | 0 |
  16. | 2023-01-02 00:00:00 | host2 | 3 |
  17. | 2023-01-02 00:00:00 | host1 | 1 |
  18. +---------------------+-------+----------------------------------+
  1. -- If you want to align the origin time to the current time,
  2. -- use the `NOW` keyword.
  3. -- Assume that the current query time is `2023-01-02T09:16:40.503000`.
  4. > SELECT ts, host, min(val) RANGE '1d' FROM host ALIGN '1d' TO NOW;
  5. +----------------------------+-------+----------------------------------+
  6. | ts | host | MIN(host.val) RANGE 1d FILL NULL |
  7. +----------------------------+-------+----------------------------------+
  8. | 2023-01-01 09:16:40.503000 | host2 | 2 |
  9. | 2023-01-01 09:16:40.503000 | host1 | 0 |
  10. +----------------------------+-------+----------------------------------+
  1. -- If you want to align the origin time to a specific timestamp,
  2. -- for example, "+08:00" Beijing time on December 1, 2023,
  3. -- you can set the `TO` option to the specific timestamp '2023-01-01T00:00:00+08:00'.
  4. SELECT ts, host, min(val) RANGE '1d' FROM host ALIGN '1d' TO '2023-01-01T00:00:00+08:00';
  5. +---------------------+-------+----------------------------------+
  6. | ts | host | MIN(host.val) RANGE 1d FILL NULL |
  7. +---------------------+-------+----------------------------------+
  8. | 2023-01-01 16:00:00 | host2 | 2 |
  9. | 2023-01-01 16:00:00 | host1 | 0 |
  10. +---------------------+-------+----------------------------------+

If you want to query data for a specific time range, you can specify the timestamp using the TO keyword. For example, to query the daily minimum value of val between 00:45 and 06:45, you can use 2023-01-01T00:45:00 as the TO option along with a 6h range.

  1. SELECT ts, host, min(val) RANGE '6h' FROM host ALIGN '1d' TO '2023-01-01T00:45:00';
  1. +---------------------+-------+----------------------------------+
  2. | ts | host | MIN(host.val) RANGE 6h FILL NULL |
  3. +---------------------+-------+----------------------------------+
  4. | 2023-01-02 00:45:00 | host1 | 1 |
  5. | 2023-01-02 00:45:00 | host2 | 3 |
  6. +---------------------+-------+----------------------------------+

BY Option

BY option describes the aggregate key. If this field is not given, the primary key of the table is used as the aggregate key by default. If the table does not specify a primary key, the BY keyword cannot be omitted.

Suppose we have a tale host with the following data:

  1. +---------------------+-------+------+
  2. | ts | host | val |
  3. +---------------------+-------+------+
  4. | 2023-01-01 23:00:00 | host1 | 0 |
  5. | 2023-01-02 01:00:00 | host1 | 1 |
  6. | 2023-01-01 23:00:00 | host2 | 2 |
  7. | 2023-01-02 01:00:00 | host2 | 3 |
  8. +---------------------+-------+------+

The following SQL uses host as the aggregate key:

  1. SELECT
  2. ts,
  3. host,
  4. min(val) RANGE '10s'
  5. FROM host ALIGN '5s' BY (host);

You can also use the BY keyword to declare other columns as the basis for data aggregation. For example, the following RANGE query uses the string length length(host) of the host column as the basis for data aggregation.

  1. SELECT
  2. ts,
  3. length(host),
  4. min(val) RANGE '10s'
  5. FROM host ALIGN '5s' BY (length(host));

Get after running

  1. +---------------------+-----------------------------+-----------------------------------+
  2. | ts | character_length(host.host) | MIN(host.val) RANGE 10s FILL NULL |
  3. +---------------------+-----------------------------+-----------------------------------+
  4. | 2023-01-01 22:59:55 | 5 | 0 |
  5. | 2023-01-01 23:00:00 | 5 | 0 |
  6. | 2023-01-02 00:59:55 | 5 | 1 |
  7. | 2023-01-02 01:00:00 | 5 | 1 |
  8. +---------------------+-----------------------------+-----------------------------------+

You can explicitly use BY (), which means you do not need to use aggregation keys and aggregate all data into a group. However, if you omit the BY keyword directly, it means using the primary key of the table as the aggregate key.

  1. SELECT
  2. ts,
  3. min(val) RANGE '10s'
  4. FROM host ALIGN '5s' BY ();

Get after running

  1. +---------------------+-----------------------------------+
  2. | ts | MIN(host.val) RANGE 10s FILL NULL |
  3. +---------------------+-----------------------------------+
  4. | 2023-01-01 22:59:55 | 0 |
  5. | 2023-01-01 23:00:00 | 0 |
  6. | 2023-01-02 00:59:55 | 1 |
  7. | 2023-01-02 01:00:00 | 1 |
  8. +---------------------+-----------------------------------+

ORDER BY option in aggregate functions

Range queries support the use of order by expressions in the parameters of the first_value and last_value aggregate functions. By default, the data is sorted in ascending order based on the time index column.

Take this table as an example:

  1. +---------------------+-------+------+-------+
  2. | ts | host | val | addon |
  3. +---------------------+-------+------+-------+
  4. | 1970-01-01 00:00:00 | host1 | 0 | 3 |
  5. | 1970-01-01 00:00:01 | host1 | 1 | 2 |
  6. | 1970-01-01 00:00:02 | host1 | 2 | 1 |
  7. +---------------------+-------+------+-------+

When the order by expression is not specified in the function parameter, the default behavior is to sort the data in ascending order based on the time index column.

  1. SELECT ts, first_value(val) RANGE '5s', last_value(val) RANGE '5s' FROM host ALIGN '5s';
  2. -- Equivalent to
  3. SELECT ts, first_value(val order by ts ASC) RANGE '5s', last_value(val order by ts ASC) RANGE '5s' FROM host ALIGN '5s';

Get after query

  1. +---------------------+--------------------------------+-------------------------------+
  2. | ts | FIRST_VALUE(host.val) RANGE 5s | LAST_VALUE(host.val) RANGE 5s |
  3. +---------------------+--------------------------------+-------------------------------+
  4. | 1970-01-01 00:00:00 | 0 | 2 |
  5. +---------------------+--------------------------------+-------------------------------+

You can specify your own sorting rules. For example, the following SQL sorts the data by the addon column in ascending order:

  1. SELECT ts, first_value(val ORDER BY addon ASC) RANGE '5s', last_value(val ORDER BY addon ASC) RANGE '5s' FROM host ALIGN '5s';

Get after query

  1. +---------------------+---------------------------------------------------------------------+--------------------------------------------------------------------+
  2. | ts | FIRST_VALUE(host.val) ORDER BY [host.addon ASC NULLS LAST] RANGE 5s | LAST_VALUE(host.val) ORDER BY [host.addon ASC NULLS LAST] RANGE 5s |
  3. +---------------------+---------------------------------------------------------------------+--------------------------------------------------------------------+
  4. | 1970-01-01 00:00:00 | 2 | 0 |
  5. +---------------------+---------------------------------------------------------------------+--------------------------------------------------------------------+

Nested Range Expressions

Range expressions support flexible nesting, and range expressions can be combined with various operations to provide more powerful query capabilities.

Take the following table as an example:

  1. +---------------------+-------+------+
  2. | ts | host | val |
  3. +---------------------+-------+------+
  4. | 2023-01-01 08:00:00 | host1 | 1.1 |
  5. | 2023-01-01 08:00:05 | host1 | 2.2 |
  6. | 2023-01-01 08:00:00 | host2 | 3.3 |
  7. | 2023-01-01 08:00:05 | host2 | 4.4 |
  8. +---------------------+-------+------+
  1. Aggregation functions support calculations both internally and externally:
  1. SELECT ts, host, 2.0 * min(val * 2.0) RANGE '10s' FROM host_cpu ALIGN '5s';

Get after running

  1. +---------------------+-------+-----------------------------------------------------------------+
  2. | ts | host | Float64(2) * MIN(host_cpu.val * Float64(2)) RANGE 10s FILL NULL |
  3. +---------------------+-------+-----------------------------------------------------------------+
  4. | 2023-01-01 07:59:55 | host1 | 4.4 |
  5. | 2023-01-01 07:59:55 | host2 | 13.2 |
  6. | 2023-01-01 08:00:00 | host1 | 4.4 |
  7. | 2023-01-01 08:00:00 | host2 | 13.2 |
  8. | 2023-01-01 08:00:05 | host1 | 8.8 |
  9. | 2023-01-01 08:00:05 | host2 | 17.6 |
  10. +---------------------+-------+-----------------------------------------------------------------+
  1. Scalar functions are supported both inside and outside aggregate functions:
    • min(round(val)) RANGE '10s' means that each value is rounded using the round function before aggregation
    • round(min(val) RANGE '10s') means rounding the result of each aggregation using the round function
  1. SELECT ts, host, min(round(val)) RANGE '10s' FROM host_cpu ALIGN '5s';

Get after running

  1. +---------------------+-------+----------------------------------------------+
  2. | ts | host | MIN(round(host_cpu.val)) RANGE 10s FILL NULL |
  3. +---------------------+-------+----------------------------------------------+
  4. | 2023-01-01 07:59:55 | host2 | 3 |
  5. | 2023-01-01 07:59:55 | host1 | 1 |
  6. | 2023-01-01 08:00:00 | host2 | 3 |
  7. | 2023-01-01 08:00:00 | host1 | 1 |
  8. | 2023-01-01 08:00:05 | host2 | 4 |
  9. | 2023-01-01 08:00:05 | host1 | 2 |
  10. +---------------------+-------+----------------------------------------------+
  1. SELECT ts, host, round(min(val) RANGE '10s') FROM host_cpu ALIGN '5s';

Get after running

  1. +---------------------+-------+----------------------------------------------+
  2. | ts | host | round(MIN(host_cpu.val) RANGE 10s FILL NULL) |
  3. +---------------------+-------+----------------------------------------------+
  4. | 2023-01-01 07:59:55 | host2 | 3 |
  5. | 2023-01-01 07:59:55 | host1 | 1 |
  6. | 2023-01-01 08:00:00 | host2 | 3 |
  7. | 2023-01-01 08:00:00 | host1 | 1 |
  8. | 2023-01-01 08:00:05 | host2 | 4 |
  9. | 2023-01-01 08:00:05 | host1 | 2 |
  10. +---------------------+-------+----------------------------------------------+
  1. Multiple Range expressions can also evaluate together, and Range expressions support the distributive law. The following two Range query are legal and equivalent:
  1. SELECT ts, host, max(val) RANGE '10s' - min(val) RANGE '10s' FROM host_cpu ALIGN '5s';
  2. SELECT ts, host, (max(val) - min(val)) RANGE '10s' FROM host_cpu ALIGN '5s';

Get after running

  1. +---------------------+-------+-------------------------------------------------------------------------------+
  2. | ts | host | MAX(host_cpu.val) RANGE 10s FILL NULL - MIN(host_cpu.val) RANGE 10s FILL NULL |
  3. +---------------------+-------+-------------------------------------------------------------------------------+
  4. | 2023-01-01 08:00:05 | host1 | 0 |
  5. | 2023-01-01 08:00:05 | host2 | 0 |
  6. | 2023-01-01 08:00:00 | host1 | 1.1 |
  7. | 2023-01-01 08:00:00 | host2 | 1.1 |
  8. | 2023-01-01 07:59:55 | host1 | 0 |
  9. | 2023-01-01 07:59:55 | host2 | 0 |
  10. +---------------------+-------+-------------------------------------------------------------------------------+

But note that the RANGE keyword apply to the expression before the RANGE keyword. The following Range query is illegal because the RANGE keyword apply to the expression 2.0, not the expression min(val * 2.0) * 2.0

  1. SELECT ts, host, min(val * 2.0) * 2.0 RANGE '10s' FROM host_cpu ALIGN '5s';
  2. ERROR 1815 (HY000): sql parser error: Can't use the RANGE keyword in Expr 2.0 without function

Expressions can be bracketed, and the RANGE keyword is automatically applied to any aggregate functions contained within the brackets:

  1. SELECT ts, host, (min(val * 2.0) * 2.0) RANGE '10s' FROM host_cpu ALIGN '5s';

After running, we get:

  1. +---------------------+-------+-----------------------------------------------------------------+
  2. | ts | host | MIN(host_cpu.val * Float64(2)) RANGE 10s FILL NULL * Float64(2) |
  3. +---------------------+-------+-----------------------------------------------------------------+
  4. | 2023-01-01 07:59:55 | host2 | 13.2 |
  5. | 2023-01-01 07:59:55 | host1 | 4.4 |
  6. | 2023-01-01 08:00:00 | host2 | 13.2 |
  7. | 2023-01-01 08:00:00 | host1 | 4.4 |
  8. | 2023-01-01 08:00:05 | host2 | 17.6 |
  9. | 2023-01-01 08:00:05 | host1 | 8.8 |
  10. +---------------------+-------+-----------------------------------------------------------------+

Nesting of Range expressions is not allowed. Nested Range queries are illegal:

  1. SELECT ts, host, max(min(val) RANGE '10s') RANGE '10s' FROM host_cpu ALIGN '5s';
  2. ERROR 1815 (HY000): Range Query: Nest Range Query is not allowed