RANGE QUERY

Querying and aggregating data within a range of time is a common query pattern for time series data, such as the Range selector in PromQL. GreptimeDB supports Range queries in SQL, which is used to summarize time series data into time chunks and aggregate data on time chunks. As part of the SELECT statement, Range query can be flexibly combined with SQL to provide more powerful time series data query capabilities in SQL.

Syntax

Range query uses Time Index column as the timeline basis for aggregation. A legal Range query syntax structure is as follows:

sql

  1. SELECT
  2. AGGR_FUNCTION(column1, column2,..) RANGE INTERVAL [FILL FILL_OPTION],
  3. ...
  4. FROM table_name
  5. ALIGN INTERVAL [ TO TO_OPTION ] [BY (columna, columnb,..)] [FILL FILL_OPTION];
  6. INTERVAL := TIME_INTERVAL | ( INTERVAL expr )
  • Keyword ALIGN, required field, followed by parameter INTERVAL, ALIGN specifies the step of Range query.
    • Subkeyword TO, optional field, specifies the time point to which Range query is aligned. For legal TO_OPTION parameters, see TO Option.
    • Subkeyword BY, optional field, followed by parameter (columna, columnb,..), describes the aggregate key. For legal BY_OPTION parameters, see BY Option.
  • The parameter INTERVAL is mainly used to give the length of a period of time. There are two parameter forms:
    • Strings based on the PromQL Time Durations format (eg: 3h, 1h30m). Visit the Prometheus documentation for a more detailed description of this format.
    • Interval type. To use the Interval type, you need to carry parentheses, (for example: (INTERVAL '1 year 3 hours 20 minutes')). Visit Interval for a more detailed description of this format.
  • AGGR_FUNCTION(column1, column2,..) RANGE INTERVAL [FILL FILL_OPTION] is called a Range expression.
    • AGGR_FUNCTION(column1, column2,..) is an aggregate function that represents the expression that needs to be aggregated.
    • Keyword RANGE, required field, followed by parameter INTERVAL specifies the time range of each data aggregation.
    • Keyword FILL, optional field, please see FILL Option for details.
    • Range expressions can be combined with other operations to implement more complex queries. For details, see Nested Range Expressions.
  • FILL keyword after ALIGN, optional field. See FILL Option for details.

FILL Option

FILL option specifies the data filling method when the aggregate field is empty.

The FILL keyword can be used after the RANGE keyword to indicate the filling method for the Range expression. The FILL keyword can also be used after the ALIGN keyword to specify the default filling method for a Range expression if no fill option is provided.

For example, in the following SQL code, the max(cpu) RANGE '10s' Range expression uses the fill option LINEAR, while the min(cpu) RANGE '10s' Range expression, which does not specify a fill option, uses the fill option PREV specified after the ALIGN keyword.

sql

  1. SELECT
  2. ts,
  3. host,
  4. min(cpu) RANGE '10s',
  5. max(cpu) RANGE '10s' FILL LINEAR
  6. FROM host_cpu
  7. ALIGN '5s' BY (host) FILL PREV;

FILL has the following filling methods:

FILLDESCRIPTION
NULLFill directly with NULL (default method)
PREVFill with data from previous point
LINEARUse linear interpolation to fill the data. If an integer type is filled with LINEAR, the variable type of the column will be implicitly converted to a floating point type during calculation
XFill in a constant, the data type of the constant must be consistent with the variable type of the Range expression

Take the following table as an example:

sql

  1. +---------------------+-------+------+
  2. | ts | host | cpu |
  3. +---------------------+-------+------+
  4. | 2023-01-01 08:00:00 | host1 | 4.5 |
  5. | 2023-01-01 08:00:05 | host1 | NULL |
  6. | 2023-01-01 08:00:10 | host1 | 6.5 |
  7. +---------------------+-------+------+

The result of each FILL option is as follows:

NULLPREVLINEARConstant Value 6.0

sql

  1. > SELECT ts, min(cpu) RANGE '5s' FILL NULL FROM host_cpu ALIGN '5s';
  2. +---------------------+--------------------------------------+
  3. | ts | MIN(host_cpu.cpu) RANGE 5s FILL NULL |
  4. +---------------------+--------------------------------------+
  5. | 2023-01-01 08:00:00 | 4.5 |
  6. | 2023-01-01 08:00:05 | NULL |
  7. | 2023-01-01 08:00:10 | 6.5 |
  8. +---------------------+--------------------------------------+

sql

  1. > SELECT ts, min(cpu) RANGE '5s' FILL PREV FROM host_cpu ALIGN '5s';
  2. +---------------------+--------------------------------------+
  3. | ts | MIN(host_cpu.cpu) RANGE 5s FILL PREV |
  4. +---------------------+--------------------------------------+
  5. | 2023-01-01 08:00:00 | 4.5 |
  6. | 2023-01-01 08:00:05 | 4.5 |
  7. | 2023-01-01 08:00:10 | 6.5 |
  8. +---------------------+--------------------------------------+

sql

  1. > SELECT ts, min(cpu) RANGE '5s' FILL LINEAR FROM host_cpu ALIGN '5s';
  2. +---------------------+----------------------------------------+
  3. | ts | MIN(host_cpu.cpu) RANGE 5s FILL LINEAR |
  4. +---------------------+----------------------------------------+
  5. | 2023-01-01 08:00:00 | 4.5 |
  6. | 2023-01-01 08:00:05 | 5.5 |
  7. | 2023-01-01 08:00:10 | 6.5 |
  8. +---------------------+----------------------------------------+

sql

  1. > SELECT ts, min(cpu) RANGE '5s' FILL 6.0 FROM host_cpu ALIGN '5s';
  2. +---------------------+-----------------------------------+
  3. | ts | MIN(host_cpu.cpu) RANGE 5s FILL 6 |
  4. +---------------------+-----------------------------------+
  5. | 2023-01-01 08:00:00 | 4.5 |
  6. | 2023-01-01 08:00:05 | 6 |
  7. | 2023-01-01 08:00:10 | 6.5 |
  8. +---------------------+-----------------------------------+

TO Option

The TO keyword specifies the origin time point to which the range query is aligned. TO option along with RANGE option and ALIGN INTERVAL determine the time range windows. Please refer to Time Range Window for details.

The default value of TO option is Unix time 0. Other valid TO options are:

TODESCRIPTION
NOWAlign to current query time
TimestampAlign to a user-specified timestamp, supports timestamp format RFC3339 / ISO8601

Suppose we have a tale host with the following data:

sql

  1. +---------------------+-------+------+
  2. | ts | host | val |
  3. +---------------------+-------+------+
  4. | 2023-01-01 23:00:00 | host1 | 0 |
  5. | 2023-01-02 01:00:00 | host1 | 1 |
  6. | 2023-01-01 23:00:00 | host2 | 2 |
  7. | 2023-01-02 01:00:00 | host2 | 3 |
  8. +---------------------+-------+------+

The query results by each TO options shown below:

Default Unix time 0NOWSpecific Timestamp

sql

  1. -- If we do not specify the `TO` keyword,
  2. -- the default value Unix time 0 will be used as the origin alignment time.
  3. > SELECT ts, host, min(val) RANGE '1d' FROM host ALIGN '1d';
  4. +---------------------+-------+----------------------------------+
  5. | ts | host | MIN(host.val) RANGE 1d FILL NULL |
  6. +---------------------+-------+----------------------------------+
  7. | 2023-01-01 00:00:00 | host2 | 2 |
  8. | 2023-01-01 00:00:00 | host1 | 0 |
  9. | 2023-01-02 00:00:00 | host2 | 3 |
  10. | 2023-01-02 00:00:00 | host1 | 1 |
  11. +---------------------+-------+----------------------------------+

sql

  1. -- If you want to align the origin time to the current time,
  2. -- use the `NOW` keyword.
  3. -- Assume that the current query time is `2023-01-02T09:16:40.503000`.
  4. > SELECT ts, host, min(val) RANGE '1d' FROM host ALIGN '1d' TO NOW;
  5. +----------------------------+-------+----------------------------------+
  6. | ts | host | MIN(host.val) RANGE 1d FILL NULL |
  7. +----------------------------+-------+----------------------------------+
  8. | 2023-01-01 09:16:40.503000 | host2 | 2 |
  9. | 2023-01-01 09:16:40.503000 | host1 | 0 |
  10. +----------------------------+-------+----------------------------------+

sql

  1. -- If you want to align the origin time to a specific timestamp,
  2. -- for example, "+08:00" Beijing time on December 1, 2023,
  3. -- you can set the `TO` option to the specific timestamp '2023-01-01T00:00:00+08:00'.
  4. SELECT ts, host, min(val) RANGE '1d' FROM host ALIGN '1d' TO '2023-01-01T00:00:00+08:00';
  5. +---------------------+-------+----------------------------------+
  6. | ts | host | MIN(host.val) RANGE 1d FILL NULL |
  7. +---------------------+-------+----------------------------------+
  8. | 2023-01-01 16:00:00 | host2 | 2 |
  9. | 2023-01-01 16:00:00 | host1 | 0 |
  10. +---------------------+-------+----------------------------------+

If you want to query data for a specific time range, you can specify the timestamp using the TO keyword. For example, to query the daily minimum value of val between 00:45 and 06:45, you can use 2023-01-01T00:45:00 as the TO option along with a 6h range.

sql

  1. SELECT ts, host, min(val) RANGE '6h' FROM host ALIGN '1d' TO '2023-01-01T00:45:00';

sql

  1. +---------------------+-------+----------------------------------+
  2. | ts | host | MIN(host.val) RANGE 6h FILL NULL |
  3. +---------------------+-------+----------------------------------+
  4. | 2023-01-02 00:45:00 | host1 | 1 |
  5. | 2023-01-02 00:45:00 | host2 | 3 |
  6. +---------------------+-------+----------------------------------+

BY Option

BY option describes the aggregate key. If this field is not given, the primary key of the table is used as the aggregate key by default. If the table does not specify a primary key, the BY keyword cannot be omitted.

Suppose we have a tale host with the following data:

sql

  1. +---------------------+-------+------+
  2. | ts | host | val |
  3. +---------------------+-------+------+
  4. | 2023-01-01 23:00:00 | host1 | 0 |
  5. | 2023-01-02 01:00:00 | host1 | 1 |
  6. | 2023-01-01 23:00:00 | host2 | 2 |
  7. | 2023-01-02 01:00:00 | host2 | 3 |
  8. +---------------------+-------+------+

The following SQL uses host as the aggragate key:

sql

  1. SELECT
  2. ts,
  3. host,
  4. min(val) RANGE '10s'
  5. FROM host ALIGN '5s' BY (host);

You can also use the BY keyword to declare other columns as the basis for data aggregation. For example, the following RANGE query uses the string length length(host) of the host column as the basis for data aggregation.

sql

  1. SELECT
  2. ts,
  3. length(host),
  4. min(val) RANGE '10s'
  5. FROM host ALIGN '5s' BY (length(host));

Get after running

sql

  1. +---------------------+-----------------------------+-----------------------------------+
  2. | ts | character_length(host.host) | MIN(host.val) RANGE 10s FILL NULL |
  3. +---------------------+-----------------------------+-----------------------------------+
  4. | 2023-01-01 22:59:55 | 5 | 0 |
  5. | 2023-01-01 23:00:00 | 5 | 0 |
  6. | 2023-01-02 00:59:55 | 5 | 1 |
  7. | 2023-01-02 01:00:00 | 5 | 1 |
  8. +---------------------+-----------------------------+-----------------------------------+

You can explicitly use BY (), which means you do not need to use aggregation keys and aggregate all data into a group. However, if you omit the BY keyword directly, it means using the primary key of the table as the aggregate key.

sql

  1. SELECT
  2. ts,
  3. min(val) RANGE '10s'
  4. FROM host ALIGN '5s' BY ();

Get after running

sql

  1. +---------------------+-----------------------------------+
  2. | ts | MIN(host.val) RANGE 10s FILL NULL |
  3. +---------------------+-----------------------------------+
  4. | 2023-01-01 22:59:55 | 0 |
  5. | 2023-01-01 23:00:00 | 0 |
  6. | 2023-01-02 00:59:55 | 1 |
  7. | 2023-01-02 01:00:00 | 1 |
  8. +---------------------+-----------------------------------+

Nested Range Expressions

Range expressions support flexible nesting, and range expressions can be combined with various operations to provide more powerful query capabilities.

Take the following table as an example:

sql

  1. +---------------------+-------+------+
  2. | ts | host | cpu |
  3. +---------------------+-------+------+
  4. | 2023-01-01 08:00:00 | host1 | 1.1 |
  5. | 2023-01-01 08:00:05 | host1 | 2.2 |
  6. | 2023-01-01 08:00:00 | host2 | 3.3 |
  7. | 2023-01-01 08:00:05 | host2 | 4.4 |
  8. +---------------------+-------+------+
  1. Aggregation functions support calculations both internally and externally:

sql

  1. SELECT ts, host, 2.0 * min(cpu * 2.0) RANGE '10s' FROM host_cpu ALIGN '5s';

Get after running

sql

  1. +---------------------+-------+-----------------------------------------------------------------+
  2. | ts | host | Float64(2) * MIN(host_cpu.cpu * Float64(2)) RANGE 10s FILL NULL |
  3. +---------------------+-------+-----------------------------------------------------------------+
  4. | 2023-01-01 07:59:55 | host1 | 4.4 |
  5. | 2023-01-01 07:59:55 | host2 | 13.2 |
  6. | 2023-01-01 08:00:00 | host1 | 4.4 |
  7. | 2023-01-01 08:00:00 | host2 | 13.2 |
  8. | 2023-01-01 08:00:05 | host1 | 8.8 |
  9. | 2023-01-01 08:00:05 | host2 | 17.6 |
  10. +---------------------+-------+-----------------------------------------------------------------+
  1. Scalar functions are supported both inside and outside aggregate functions:
    • min(round(cpu)) RANGE '10s' means that each value is rounded using the round function before aggregation
    • round(min(cpu) RANGE '10s') means rounding the result of each aggregation using the round function

sql

  1. SELECT ts, host, min(round(cpu)) RANGE '10s' FROM host_cpu ALIGN '5s';

Get after running

sql

  1. +---------------------+-------+----------------------------------------------+
  2. | ts | host | MIN(round(host_cpu.cpu)) RANGE 10s FILL NULL |
  3. +---------------------+-------+----------------------------------------------+
  4. | 2023-01-01 07:59:55 | host2 | 3 |
  5. | 2023-01-01 07:59:55 | host1 | 1 |
  6. | 2023-01-01 08:00:00 | host2 | 3 |
  7. | 2023-01-01 08:00:00 | host1 | 1 |
  8. | 2023-01-01 08:00:05 | host2 | 4 |
  9. | 2023-01-01 08:00:05 | host1 | 2 |
  10. +---------------------+-------+----------------------------------------------+

sql

  1. SELECT ts, host, round(min(cpu) RANGE '10s') FROM host_cpu ALIGN '5s';

Get after running

sql

  1. +---------------------+-------+----------------------------------------------+
  2. | ts | host | round(MIN(host_cpu.cpu) RANGE 10s FILL NULL) |
  3. +---------------------+-------+----------------------------------------------+
  4. | 2023-01-01 07:59:55 | host2 | 3 |
  5. | 2023-01-01 07:59:55 | host1 | 1 |
  6. | 2023-01-01 08:00:00 | host2 | 3 |
  7. | 2023-01-01 08:00:00 | host1 | 1 |
  8. | 2023-01-01 08:00:05 | host2 | 4 |
  9. | 2023-01-01 08:00:05 | host1 | 2 |
  10. +---------------------+-------+----------------------------------------------+
  1. Multiple Range expressions can also evaluate together, and Range expressions support the distributive law. The following two Range query are legal and equivalent:

sql

  1. SELECT ts, host, max(cpu) RANGE '10s' - min(cpu) RANGE '10s' FROM host_cpu ALIGN '5s';
  2. SELECT ts, host, (max(cpu) - min(cpu)) RANGE '10s' FROM host_cpu ALIGN '5s';

Get after running

sql

  1. +---------------------+-------+-------------------------------------------------------------------------------+
  2. | ts | host | MAX(host_cpu.cpu) RANGE 10s FILL NULL - MIN(host_cpu.cpu) RANGE 10s FILL NULL |
  3. +---------------------+-------+-------------------------------------------------------------------------------+
  4. | 2023-01-01 08:00:05 | host1 | 0 |
  5. | 2023-01-01 08:00:05 | host2 | 0 |
  6. | 2023-01-01 08:00:00 | host1 | 1.1 |
  7. | 2023-01-01 08:00:00 | host2 | 1.1 |
  8. | 2023-01-01 07:59:55 | host1 | 0 |
  9. | 2023-01-01 07:59:55 | host2 | 0 |
  10. +---------------------+-------+-------------------------------------------------------------------------------+

But note that the RANGE keyword apply to the expression before the RANGE keyword. The following Range query is illegal because the RANGE keyword apply to the expression 2.0, not the expression min(cpu * 2.0) * 2.0

sql

  1. SELECT ts, host, min(cpu * 2.0) * 2.0 RANGE '10s' FROM host_cpu ALIGN '5s';
  2. ERROR 1815 (HY000): sql parser error: Can't use the RANGE keyword in Expr 2.0 without function

Expressions can be bracketed, and the RANGE keyword is automatically applied to any aggregate functions contained within the brackets:

sql

  1. SELECT ts, host, (min(cpu * 2.0) * 2.0) RANGE '10s' FROM host_cpu ALIGN '5s';

After running, we get:

sql

  1. +---------------------+-------+-----------------------------------------------------------------+
  2. | ts | host | MIN(host_cpu.cpu * Float64(2)) RANGE 10s FILL NULL * Float64(2) |
  3. +---------------------+-------+-----------------------------------------------------------------+
  4. | 2023-01-01 07:59:55 | host2 | 13.2 |
  5. | 2023-01-01 07:59:55 | host1 | 4.4 |
  6. | 2023-01-01 08:00:00 | host2 | 13.2 |
  7. | 2023-01-01 08:00:00 | host1 | 4.4 |
  8. | 2023-01-01 08:00:05 | host2 | 17.6 |
  9. | 2023-01-01 08:00:05 | host1 | 8.8 |
  10. +---------------------+-------+-----------------------------------------------------------------+

Nesting of Range expressions is not allowed. Nested Range queries are illegal:

sql

  1. SELECT ts, host, max(min(cpu) RANGE '10s') RANGE '10s' FROM host_cpu ALIGN '5s';
  2. ERROR 1815 (HY000): Range Query: Nest Range Query is not allowed