Select
Syntax
SELECT select_expr [, select_expr ...]
FROM {tb_name_list}
[WHERE where_condition]
[SESSION(ts_col, tol_val)]
[STATE_WINDOW(col)]
[INTERVAL(interval_val [, interval_offset]) [SLIDING sliding_val]]
[FILL(fill_mod_and_val)]
[GROUP BY col_list]
[ORDER BY col_list { DESC | ASC }]
[SLIMIT limit_val [SOFFSET offset_val]]
[LIMIT limit_val [OFFSET offset_val]]
[>> export_file];
Wildcard
Wildcard * can be used to specify all columns. The result includes only data columns for normal tables.
taos> SELECT * FROM d1001;
ts | current | voltage | phase |
======================================================================================
2018-10-03 14:38:05.000 | 10.30000 | 219 | 0.31000 |
2018-10-03 14:38:15.000 | 12.60000 | 218 | 0.33000 |
2018-10-03 14:38:16.800 | 12.30000 | 221 | 0.31000 |
Query OK, 3 row(s) in set (0.001165s)
The result includes both data columns and tag columns for super table.
taos> SELECT * FROM meters;
ts | current | voltage | phase | location | groupid |
=====================================================================================================================================
2018-10-03 14:38:05.500 | 11.80000 | 221 | 0.28000 | California.LoSangeles | 2 |
2018-10-03 14:38:16.600 | 13.40000 | 223 | 0.29000 | California.LoSangeles | 2 |
2018-10-03 14:38:05.000 | 10.80000 | 223 | 0.29000 | California.LoSangeles | 3 |
2018-10-03 14:38:06.500 | 11.50000 | 221 | 0.35000 | California.LoSangeles | 3 |
2018-10-03 14:38:04.000 | 10.20000 | 220 | 0.23000 | California.SanFrancisco | 3 |
2018-10-03 14:38:16.650 | 10.30000 | 218 | 0.25000 | California.SanFrancisco | 3 |
2018-10-03 14:38:05.000 | 10.30000 | 219 | 0.31000 | California.SanFrancisco | 2 |
2018-10-03 14:38:15.000 | 12.60000 | 218 | 0.33000 | California.SanFrancisco | 2 |
2018-10-03 14:38:16.800 | 12.30000 | 221 | 0.31000 | California.SanFrancisco | 2 |
Query OK, 9 row(s) in set (0.002022s)
Wildcard can be used with table name as prefix. Both SQL statements below have the same effect and return all columns.
SELECT * FROM d1001;
SELECT d1001.* FROM d1001;
In a JOIN query, however, the results are different with or without a table name prefix. * without table prefix will return all the columns of both tables, but * with table name as prefix will return only the columns of that table.
taos> SELECT * FROM d1001, d1003 WHERE d1001.ts=d1003.ts;
ts | current | voltage | phase | ts | current | voltage | phase |
==================================================================================================================================
2018-10-03 14:38:05.000 | 10.30000| 219 | 0.31000 | 2018-10-03 14:38:05.000 | 10.80000| 223 | 0.29000 |
Query OK, 1 row(s) in set (0.017385s)
taos> SELECT d1001.* FROM d1001,d1003 WHERE d1001.ts = d1003.ts;
ts | current | voltage | phase |
======================================================================================
2018-10-03 14:38:05.000 | 10.30000 | 219 | 0.31000 |
Query OK, 1 row(s) in set (0.020443s)
Wildcard * can be used with some functions, but the result may be different depending on the function being used. For example, count(*)
returns only one column, i.e. the number of rows; first
, last
and last_row
return all columns of the selected row.
taos> SELECT COUNT(*) FROM d1001;
count(*) |
========================
3 |
Query OK, 1 row(s) in set (0.001035s)
taos> SELECT FIRST(*) FROM d1001;
first(ts) | first(current) | first(voltage) | first(phase) |
=========================================================================================
2018-10-03 14:38:05.000 | 10.30000 | 219 | 0.31000 |
Query OK, 1 row(s) in set (0.000849s)
Tags
Starting from version 2.0.14, tag columns can be selected together with data columns when querying sub tables. Please note however, that, wildcard * cannot be used to represent any tag column. This means that tag columns must be specified explicitly like the example below.
taos> SELECT location, groupid, current FROM d1001 LIMIT 2;
location | groupid | current |
======================================================================
California.SanFrancisco | 2 | 10.30000 |
California.SanFrancisco | 2 | 12.60000 |
Query OK, 2 row(s) in set (0.003112s)
Get distinct values
DISTINCT
keyword can be used to get all the unique values of tag columns from a super table. It can also be used to get all the unique values of data columns from a table or subtable.
SELECT DISTINCT tag_name [, tag_name ...] FROM stb_name;
SELECT DISTINCT col_name [, col_name ...] FROM tb_name;
info
- Configuration parameter
maxNumOfDistinctRes
intaos.cfg
is used to control the number of rows to output. The minimum configurable value is 100,000, the maximum configurable value is 100,000,000, the default value is 1,000,000. If the actual number of rows exceeds the value of this parameter, only the number of rows specified by this parameter will be output. - It can’t be guaranteed that the results selected by using
DISTINCT
on columns ofFLOAT
orDOUBLE
are exactly unique because of the precision errors in floating point numbers. DISTINCT
can’t be used in the sub-query of a nested query statement, and can’t be used together with aggregate functions,GROUP BY
orJOIN
in the same SQL statement.
Columns Names of Result Set
When using SELECT
, the column names in the result set will be the same as that in the select clause if AS
is not used. AS
can be used to rename the column names in the result set. For example
taos> SELECT ts, ts AS primary_key_ts FROM d1001;
ts | primary_key_ts |
====================================================
2018-10-03 14:38:05.000 | 2018-10-03 14:38:05.000 |
2018-10-03 14:38:15.000 | 2018-10-03 14:38:15.000 |
2018-10-03 14:38:16.800 | 2018-10-03 14:38:16.800 |
Query OK, 3 row(s) in set (0.001191s)
AS
can’t be used together with first(*)
, last(*)
, or last_row(*)
.
Implicit Columns
Select_exprs
can be column names of a table, or function expression or arithmetic expression on columns. The maximum number of allowed column names and expressions is 256. Timestamp and the corresponding tag names will be returned in the result set if interval
or group by tags
are used, and timestamp will always be the first column in the result set.
Table List
FROM
can be followed by a number of tables or super tables, or can be followed by a sub-query. If no database is specified as current database in use, table names must be preceded with database name, like power.d1001
.
SELECT * FROM power.d1001;
has same effect as
USE power;
SELECT * FROM d1001;
Special Query
Some special query functions can be invoked without FROM
sub-clause. For example, the statement below can be used to get the current database in use.
taos> SELECT DATABASE();
database() |
=================================
power |
Query OK, 1 row(s) in set (0.000079s)
If no database is specified upon logging in and no database is specified with USE
after login, NULL will be returned by select database()
.
taos> SELECT DATABASE();
database() |
=================================
NULL |
Query OK, 1 row(s) in set (0.000184s)
The statement below can be used to get the version of client or server.
taos> SELECT CLIENT_VERSION();
client_version() |
===================
2.0.0.0 |
Query OK, 1 row(s) in set (0.000070s)
taos> SELECT SERVER_VERSION();
server_version() |
===================
2.0.0.0 |
Query OK, 1 row(s) in set (0.000077s)
The statement below is used to check the server status. An integer, like 1
, is returned if the server status is OK, otherwise an error code is returned. This is compatible with the status check for TDengine from connection pool or 3rd party tools, and can avoid the problem of losing the connection from a connection pool when using the wrong heartbeat checking SQL statement.
taos> SELECT SERVER_STATUS();
server_status() |
==================
1 |
Query OK, 1 row(s) in set (0.000074s)
taos> SELECT SERVER_STATUS() AS status;
status |
==============
1 |
Query OK, 1 row(s) in set (0.000081s)
_block_dist
Description: Get the data block distribution of a table or STable.
Syntax
SELECT _block_dist() FROM { tb_name | stb_name }
Restrictions:No argument is allowed, where clause is not allowed
Sub Query:Sub query or nested query are not supported
Return value: A string which includes the data block distribution of the specified table or STable, i.e. the histogram of rows stored in the data blocks of the table or STable.
Result
summary:
5th=[392], 10th=[392], 20th=[392], 30th=[392], 40th=[792], 50th=[792] 60th=[792], 70th=[792], 80th=[792], 90th=[792], 95th=[792], 99th=[792] Min=[392(Rows)] Max=[800(Rows)] Avg=[666(Rows)] Stddev=[2.17] Rows=[2000], Blocks=[3], Size=[5.440(Kb)] Comp=[0.23] RowsInMem=[0] SeekHeaderTime=[1(us)]
More explanation about above example:
- Histogram about the rows stored in the data blocks of the table or STable: the value of rows for 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, and 99%
- Minimum number of rows stored in a data block, i.e. Min=[392(Rows)]
- Maximum number of rows stored in a data block, i.e. Max=[800(Rows)]
- Average number of rows stored in a data block, i.e. Avg=[666(Rows)]
- stddev of number of rows, i.e. Stddev=[2.17]
- Total number of rows, i.e. Rows[2000]
- Total number of data blocks, i.e. Blocks=[3]
- Total disk size consumed, i.e. Size=[5.440(Kb)]
- Compression ratio, which means the compressed size divided by original size, i.e. Comp=[0.23]
- Total number of rows in memory, i.e. RowsInMem=[0], which means no rows in memory
- The time spent on reading head file (to retrieve data block information), i.e. SeekHeaderTime=[1(us)], which means 1 microsecond.
Special Keywords in TAOS SQL
TBNAME
: it is treated as a special tag when selecting on a super table, representing the name of subtables in that super table._c0
: represents the first column of a table or super table.
Tips
To get all the subtables and corresponding tag values from a super table:
SELECT TBNAME, location FROM meters;
To get the number of sub tables in a super table:
SELECT COUNT(TBNAME) FROM meters;
Only filter on TAGS
are allowed in the where
clause for above two query statements. For example:
taos> SELECT TBNAME, location FROM meters;
tbname | location |
==================================================================
d1004 | California.LosAngeles |
d1003 | California.LosAngeles |
d1002 | California.SanFrancisco |
d1001 | California.SanFrancisco |
Query OK, 4 row(s) in set (0.000881s)
taos> SELECT COUNT(tbname) FROM meters WHERE groupId > 2;
count(tbname) |
========================
2 |
Query OK, 1 row(s) in set (0.001091s)
- Wildcard * can be used to get all columns, or specific column names can be specified. Arithmetic operation can be performed on columns of numerical types, columns can be renamed in the result set.
- Arithmetic operation on columns can’t be used in where clause. For example,
where a*2>6;
is not allowed butwhere a>6/2;
can be used instead for the same purpose. - Arithmetic operation on columns can’t be used as the objectives of select statement. For example,
select min(2*a) from t;
is not allowed butselect 2*min(a) from t;
can be used instead. - Logical operation can be used in
WHERE
clause to filter numeric values, wildcard can be used to filter string values. - Result sets are arranged in ascending order of the first column, i.e. timestamp, but it can be controlled to output as descending order of timestamp. If
order by
is used on other columns, the result may not be as expected. By the way, _c0 is used to represent the first column, i.e. timestamp. LIMIT
parameter is used to control the number of rows to output.OFFSET
parameter is used to specify from which row to output.LIMIT
andOFFSET
are executed afterORDER BY
in the query execution. A simple tip is thatLIMIT 5 OFFSET 2
can be abbreviated asLIMIT 2, 5
.- What is controlled by
LIMIT
is the number of rows in each group whenGROUP BY
is used. SLIMIT
parameter is used to control the number of groups whenGROUP BY
is used. Similar toLIMIT
,SLIMIT 5 OFFSET 2
can be abbreviated asSLIMIT 2, 5
.- “>>” can be used to output the result set of
select
statement to the specified file.
Where
Logical operations in below table can be used in the where
clause to filter the resulting rows.
Operation | Note | Applicable Data Types |
---|---|---|
> | larger than | all types except bool |
< | smaller than | all types except bool |
>= | larger than or equal to | all types except bool |
<= | smaller than or equal to | all types except bool |
= | equal to | all types |
<> | not equal to | all types |
is [not] null | is null or is not null | all types |
between and | within a certain range | all types except bool |
in | match any value in a set | all types except first column timestamp |
like | match a wildcard string | binary nchar |
match/nmatch | filter regex | binary nchar |
Explanations:
- Operator
<\>
is equal to!=
, please note that this operator can’t be used on the first column of any table, i.e.timestamp column. - Operator
like
is used together with wildcards to match strings- ‘%’ matches 0 or any number of characters, ‘_‘ matches any single ASCII character.
\_
is used to match the _ in the string.- The maximum length of wildcard string is 100 bytes from version 2.1.6.1 (before that the maximum length is 20 bytes).
maxWildCardsLength
intaos.cfg
can be used to control this threshold. A very long wildcard string may slowdown the execution performance ofLIKE
operator.
AND
keyword can be used to filter multiple columns simultaneously. AND/OR operation can be performed on single or multiple columns from version 2.3.0.0. However, before 2.3.0.0OR
can’t be used on multiple columns.- For timestamp column, only one condition can be used; for other columns or tags,
OR
keyword can be used to combine multiple logical operators. For example,((value > 20 AND value < 30) OR (value < 12))
.- From version 2.3.0.0, multiple conditions can be used on timestamp column, but the result set can only contain single time range.
- From version 2.0.17.0, operator
BETWEEN AND
can be used in where clause, for exampleWHERE col2 BETWEEN 1.5 AND 3.25
means the filter condition is equal to “1.5 ≤ col2 ≤ 3.25”. - From version 2.1.4.0, operator
IN
can be used in the where clause. For example,WHERE city IN ('California.SanFrancisco', 'California.SanDiego')
. For bool type, both{true, false}
and{0, 1}
are allowed, but integers other than 0 or 1 are not allowed. FLOAT and DOUBLE types are impacted by floating point precision errors. Only values that match the condition within the tolerance will be selected. Non-primary key column of timestamp type can be used withIN
. - From version 2.3.0.0, regular expression is supported in the where clause with keyword
match
ornmatch
. The regular expression is case insensitive.
Regular Expression
Syntax
WHERE (column|tbname) **match/MATCH/nmatch/NMATCH** _regex_
Specification
The regular expression being used must be compliant with POSIX specification, please refer to Regular Expressions.
Restrictions
Regular expression can be used against only table names, i.e. tbname
, and tags of binary/nchar types, but can’t be used against data columns.
The maximum length of regular expression string is 128 bytes. Configuration parameter maxRegexStringLen
can be used to set the maximum allowed regular expression. It’s a configuration parameter on the client side, and will take effect after restarting the client.
JOIN
From version 2.2.0.0, inner join is fully supported in TDengine. More specifically, the inner join between table and table, between STable and STable, and between sub query and sub query are supported.
Only primary key, i.e. timestamp, can be used in the join operation between table and table. For example:
SELECT *
FROM temp_tb_1 t1, pressure_tb_1 t2
WHERE t1.ts = t2.ts
In the join operation between STable and STable, besides the primary key, i.e. timestamp, tags can also be used. For example:
SELECT *
FROM temp_STable t1, temp_STable t2
WHERE t1.ts = t2.ts AND t1.deviceid = t2.deviceid AND t1.status=0;
Similarly, join operations can be performed on the result set of multiple sub queries.
note
Restrictions on join operation:
- The number of tables or STables in a single join operation can’t exceed 10.
FILL
is not allowed in the query statement that includes JOIN operation.- Arithmetic operation is not allowed on the result set of join operation.
GROUP BY
is not allowed on a part of tables that participate in join operation.OR
can’t be used in the conditions for join operation- join operation can’t be performed on data columns, i.e. can only be performed on tags or primary key, i.e. timestamp
Nested Query
Nested query is also called sub query. This means that in a single SQL statement the result of inner query can be used as the data source of the outer query.
From 2.2.0.0, unassociated sub query can be used in the FROM
clause. Unassociated means the sub query doesn’t use the parameters in the parent query. More specifically, in the tb_name_list
of SELECT
statement, an independent SELECT statement can be used. So a complete nested query looks like:
SELECT ... FROM (SELECT ... FROM ...) ...;
info
- Only one layer of nesting is allowed, that means no sub query is allowed within a sub query
- The result set returned by the inner query will be used as a “virtual table” by the outer query. The “virtual table” can be renamed using
AS
keyword for easy reference in the outer query. - Sub query is not allowed in continuous query.
- JOIN operation is allowed between tables/STables inside both inner and outer queries. Join operation can be performed on the result set of the inner query.
- UNION operation is not allowed in either inner query or outer query.
- The functions that can be used in the inner query are the same as those that can be used in a non-nested query.
ORDER BY
inside the inner query is unnecessary and will slow down the query performance significantly. It is best to avoid the use ofORDER BY
inside the inner query.
- Compared to the non-nested query, the functionality that can be used in the outer query has the following restrictions:
- Functions
- If the result set returned by the inner query doesn’t contain timestamp column, then functions relying on timestamp can’t be used in the outer query, like
TOP
,BOTTOM
,FIRST
,LAST
,DIFF
. - Functions that need to scan the data twice can’t be used in the outer query, like
STDDEV
,PERCENTILE
.
- If the result set returned by the inner query doesn’t contain timestamp column, then functions relying on timestamp can’t be used in the outer query, like
IN
operator is not allowed in the outer query but can be used in the inner query.GROUP BY
is not supported in the outer query.
- Functions
UNION ALL
Syntax
SELECT ...
UNION ALL SELECT ...
[UNION ALL SELECT ...]
UNION ALL
operator can be used to combine the result set from multiple select statements as long as the result set of these select statements have exactly the same columns. UNION ALL
doesn’t remove redundant rows from multiple result sets. In a single SQL statement, at most 100 UNION ALL
can be supported.
Examples
table tb1
is created using below SQL statement:
CREATE TABLE tb1 (ts TIMESTAMP, col1 INT, col2 FLOAT, col3 BINARY(50));
The rows in the past one hour in tb1
can be selected using below SQL statement:
SELECT * FROM tb1 WHERE ts >= NOW - 1h;
The rows between 2018-06-01 08:00:00.000 and 2018-06-02 08:00:00.000 and col3 ends with ‘nny’ can be selected in the descending order of timestamp using below SQL statement:
SELECT * FROM tb1 WHERE ts > '2018-06-01 08:00:00.000' AND ts <= '2018-06-02 08:00:00.000' AND col3 LIKE '%nny' ORDER BY ts DESC;
The sum of col1 and col2 for rows later than 2018-06-01 08:00:00.000 and whose col2 is bigger than 1.2 can be selected and renamed as “complex”, while only 10 rows are output from the 5th row, by below SQL statement:
SELECT (col1 + col2) AS 'complex' FROM tb1 WHERE ts > '2018-06-01 08:00:00.000' AND col2 > 1.2 LIMIT 10 OFFSET 5;
The rows in the past 10 minutes and whose col2 is bigger than 3.14 are selected and output to the result file /home/testoutput.csv
with below SQL statement:
SELECT COUNT(*) FROM tb1 WHERE ts >= NOW - 10m AND col2 > 3.14 >> /home/testoutput.csv;