- taosBenchmark
- Switch specifying whether to use Unicode Chinese characters in nchar and binary. By default is not used.
taosBenchmark
Introduction
taosBenchmark (formerly taosdemo ) is a tool for testing the performance of TDengine products. taosBenchmark can test the performance of TDengine’s insert, query, and subscription functions and simulate large amounts of data generated by many devices. taosBenchmark can be configured to generate user defined databases, supertables, subtables, and the time series data to populate these for performance benchmarking. taosBenchmark is highly configurable and some of the configurations include the time interval for inserting data, the number of working threads and the capability to insert disordered data. The installer provides taosdemo as a soft link to taosBenchmark for compatibility with past users.
Installation
There are two ways to install taosBenchmark:
Installing the official TDengine installer will automatically install taosBenchmark. Please refer to TDengine installation for details.
Compile taos-tools separately and install them. Please refer to the taos-tools repository for details.
Run
Configuration and running methods
TaosBenchmark needs to be executed on the terminal of the operating system, it supports two configuration methods: Command-line arguments and JSON configuration file. These two methods are mutually exclusive. Users can use -f <json file>
to specify a configuration file. When running taosBenchmark with command-line arguments to control its behavior, users should use other parameters for configuration, but not the -f
parameter. In addition, taosBenchmark offers a special way of running without parameters.
taosBenchmark supports the complete performance testing of TDengine by providing functionally to write, query, and subscribe. These three functions are mutually exclusive, users can only select one of them each time taosBenchmark runs. The query and subscribe functionalities are only configurable using a json configuration file by specifying the parameter filetype
, while write can be performed through both the command-line and a configuration file.
Make sure that the TDengine cluster is running correctly before running taosBenchmark.
Run without command-line arguments
Execute the following commands to quickly experience taosBenchmark’s default configuration-based write performance testing of TDengine.
taosBenchmark
When run without parameters, taosBenchmark connects to the TDengine cluster specified in /etc/taos
by default and creates a database named test
, a super table named meters
under the test database, and 10,000 tables under the super table with 10,000 records written to each table. Note that if there is already a database named “test” this command will delete it first and create a new database.
Run with command-line configuration parameters
The -f <json file>
argument cannot be used when running taosBenchmark with command-line parameters and controlling its behavior. Users must specify all configuration parameters from the command-line. The following is an example of testing taosBenchmark writing performance using the command-line approach.
taosBenchmark -I stmt -n 200 -t 100
Using the above command, taosBenchmark
will create a database named test
, create a super table meters
in it, create 100 sub-tables in the super table and insert 200 records for each sub-table using parameter binding.
Run with the configuration file
A sample configuration file is provided in the taosBenchmark installation package under <install_directory>/examples/taosbenchmark-json
.
Use the following command-line to run taosBenchmark and control its behavior via a configuration file.
taosBenchmark -f <json file>
Configuration file examples
Example of inserting a scenario JSON configuration file
insert.json
{
"filetype": "insert",
"cfgdir": "/etc/taos",
"host": "127.0.0.1",
"port": 6030,
"user": "root",
"password": "taosdata",
"connection_pool_size": 8,
"thread_count": 4,
"create_table_thread_count": 7,
"result_file": "./insert_res.txt",
"confirm_parameter_prompt": "no",
"insert_interval": 0,
"interlace_rows": 100,
"num_of_records_per_req": 100,
"prepared_rand": 10000,
"chinese": "no",
"databases": [
{
"dbinfo": {
"name": "test",
"drop": "yes",
"replica": 1,
"precision": "ms",
"keep": 3650,
"minRows": 100,
"maxRows": 4096,
"comp": 2
},
"super_tables": [
{
"name": "meters",
"child_table_exists": "no",
"childtable_count": 10000,
"childtable_prefix": "d",
"escape_character": "yes",
"auto_create_table": "no",
"batch_create_tbl_num": 5,
"data_source": "rand",
"insert_mode": "taosc",
"non_stop_mode": "no",
"line_protocol": "line",
"insert_rows": 10000,
"childtable_limit": 10,
"childtable_offset": 100,
"interlace_rows": 0,
"insert_interval": 0,
"partial_col_num": 0,
"disorder_ratio": 0,
"disorder_range": 1000,
"timestamp_step": 10,
"start_timestamp": "2020-10-01 00:00:00.000",
"sample_format": "csv",
"sample_file": "./sample.csv",
"use_sample_ts": "no",
"tags_file": "",
"columns": [
{
"type": "FLOAT",
"name": "current",
"count": 1,
"max": 12,
"min": 8
},
{ "type": "INT", "name": "voltage", "max": 225, "min": 215 },
{ "type": "FLOAT", "name": "phase", "max": 1, "min": 0 }
],
"tags": [
{
"type": "TINYINT",
"name": "groupid",
"max": 10,
"min": 1
},
{
"name": "location",
"type": "BINARY",
"len": 16,
"values": ["San Francisco", "Los Angles", "San Diego",
"San Jose", "Palo Alto", "Campbell", "Mountain View",
"Sunnyvale", "Santa Clara", "Cupertino"]
}
]
}
]
}
]
}
Query Scenario JSON Profile Example
query.json
{
"filetype": "query",
"cfgdir": "/etc/taos",
"host": "127.0.0.1",
"port": 6030,
"user": "root",
"password": "taosdata",
"confirm_parameter_prompt": "no",
"databases": "test",
"query_times": 2,
"query_mode": "taosc",
"specified_table_query": {
"query_interval": 1,
"concurrent": 3,
"sqls": [
{
"sql": "select last_row(*) from meters",
"result": "./query_res0.txt"
},
{
"sql": "select count(*) from d0",
"result": "./query_res1.txt"
}
]
},
"super_table_query": {
"stblname": "meters",
"query_interval": 1,
"threads": 3,
"sqls": [
{
"sql": "select last_row(ts) from xxxx",
"result": "./query_res2.txt"
}
]
}
}
Subscription JSON configuration example
subscribe.json
{
"filetype": "subscribe",
"cfgdir": "/etc/taos",
"host": "127.0.0.1",
"port": 6030,
"user": "root",
"password": "taosdata",
"databases": "test",
"specified_table_query": {
"concurrent": 1,
"mode": "sync",
"interval": 1000,
"restart": "yes",
"keepProgress": "yes",
"resubAfterConsume": 10,
"sqls": [
{
"sql": "select avg(current) from meters where location = 'beijing';",
"result": "./subscribe_res0.txt"
}
]
},
"super_table_query": {
"stblname": "meters",
"threads": 1,
"mode": "sync",
"interval": 1000,
"restart": "yes",
"keepProgress": "yes",
"sqls": [
{
"sql": "select phase from xxxx where groupid > 3;",
"result": "./subscribe_res1.txt"
}
]
}
}
Command-line argument in detailed
-f/—file <json file> : specify the configuration file to use. This file includes All parameters. Users should not use this parameter with other parameters on the command-line. There is no default value.
-c/—config-dir <dir> : specify the directory where the TDengine cluster configuration file. The default path is
/etc/taos
.-h/—host <host> : Specify the FQDN of the TDengine server to connect to. The default value is localhost.
-P/—port <port> : The port number of the TDengine server to connect to, the default value is 6030.
-I/—interface <insertMode> : Insert mode. Options are taosc, rest, stmt, sml, sml-rest, corresponding to normal write, restful interface writing, parameter binding interface writing, schemaless interface writing, RESTful schemaless interface writing (provided by taosAdapter). The default value is taosc.
-u/—user <user> : User name to connect to the TDengine server. Default is root.
-p/—password <passwd> : The default password to connect to the TDengine server is
taosdata
.-o/—output <file> : specify the path of the result output file, the default value is
. /output.txt
.-T/—thread <threadNum> : The number of threads to insert data. Default is 8.
-B/—interlace-rows <rowNum> : Enables interleaved insertion mode and specifies the number of rows of data to be inserted into each child table. Interleaved insertion mode means inserting the number of rows specified by this parameter into each sub-table and repeating the process until all sub-tables have been inserted. The default value is 0, i.e., data is inserted into one sub-table before the next sub-table is inserted.
-i/—insert-interval <timeInterval> : Specify the insert interval in
ms
for interleaved insert mode. The default value is 0. It only works if-B/--interlace-rows
is greater than 0. That means that after inserting interlaced rows for each child table, the data insertion with multiple threads will wait for the interval specified by this value before proceeding to the next round of writes.-r/—rec-per-req <rowNum> : Writing the number of rows of records per request to TDengine, the default value is 30000.
-t/—tables <tableNum> : Specify the number of sub-tables. The default is 10000.
-S/—timestampstep <stepLength> : Timestamp step for inserting data in each child table in ms, default is 1.
-n/—records <recordNum> : The default value of the number of records inserted in each sub-table is 10000.
-d/—database <dbName> : The name of the database used, the default value is
test
.-b/—data-type <colType> : specify the type of the data columns of the super table. It defaults to three columns of type FLOAT, INT, and FLOAT if not used.
-l/—columns <colNum> : specify the number of columns in the super table. If both this parameter and
-b/--data-type
is set, the final result number of columns is the greater of the two. If the number specified by this parameter is greater than the number of columns specified by-b/--data-type
, the unspecified column type defaults to INT, for example:-l 5 -b float,double
, then the final column isFLOAT,DOUBLE,INT,INT,INT
. If the number of columns specified is less than or equal to the number of columns specified by-b/--data-type
, then the result is the column and type specified by-b/--data-type
, e.g.:-l 3 -b float,double,float,bigint
. The last column isFLOAT,DOUBLE, FLOAT,BIGINT
.-A/—tag-type <tagType> : The tag column type of the super table. nchar and binary types can both set the length, for example:
taosBenchmark -A INT,DOUBLE,NCHAR,BINARY(16)
If users did not set tag type, the default is two tags, whose types are INT and BINARY(16). Note: In some shells, such as bash, “()” needs to be escaped, so the above command should be
taosBenchmark -A INT,DOUBLE,NCHAR,BINARY\(16\)
-w/—binwidth <length>: specify the default length for nchar and binary types. The default value is 64.
-m/—table-prefix <tablePrefix> : The prefix of the sub-table name, the default value is “d”.
-E/—escape-character : Switch parameter specifying whether to use escape characters in the super table and sub-table names. By default is not used.
-C/—chinese : <<<<<<< HEAD
Switch specifying whether to use Unicode Chinese characters in nchar and binary. By default is not used.
specify whether to use Unicode Chinese characters in nchar and binary, the default is no.
108548b4d6 (docs: typo)
-N/—normal-table : This parameter indicates that taosBenchmark will create only normal tables instead of super tables. The default value is false. It can be used if the insert mode is taosc, stmt, and rest.
-M/—random : This parameter indicates writing data with random values. The default is false. If users use this parameter, taosBenchmark will generate the random values. For tag/data columns of numeric type, the value is a random value within the range of values of that type. For NCHAR and BINARY type tag columns/data columns, the value is the random string within the specified length range.
-x/—aggr-func : Switch parameter to indicate query aggregation function after insertion. The default value is false.
-y/—answer-yes : Switch parameter that requires the user to confirm at the prompt to continue. The default value is false.
-O/—disorder <Percentage> : Specify the percentage probability of disordered data, with a value range of [0,50]. The default is 0, i.e., there is no disordered data.
-R/—disorder-range <timeRange> : Specify the timestamp range for the disordered data. It leads the resulting disorder timestamp as the ordered timestamp minus a random value in this range. Valid only if the percentage of disordered data specified by
-O/--disorder
is greater than 0.-F/—prepare_rand <Num> : Specify the number of unique values in the generated random data. A value of 1 means that all data are equal. The default value is 10000.
-a/—replica <replicaNum> : Specify the number of replicas when creating the database. The default value is 1.
-V/—version : Show version information only. Users should not use it with other parameters.
-? /—help : Show help information and exit. Users should not use it with other parameters.
Configuration file parameters in detailed
General configuration parameters
The parameters listed in this section apply to all function modes.
filetype : The function to be tested, with optional values
insert
,query
andsubscribe
. These correspond to the insert, query, and subscribe functions, respectively. Users can specify only one of these in each configuration file. cfgdir: specify the TDengine client configuration file’s directory. The default path is /etc/taos.host: Specify the FQDN of the TDengine server to connect. The default value is
localhost
.port: The port number of the TDengine server to connect to, the default value is
6030
.user: The user name of the TDengine server to connect to, the default is
root
.password: The password to connect to the TDengine server, the default value is
taosdata
.
Insert scenario configuration parameters
filetype
must be set to insert
in the insertion scenario. See [General Configuration Parameters](#General Configuration Parameters)
Database related configuration parameters
The parameters related to database creation are configured in dbinfo
in the json configuration file, as follows. These parameters correspond to the database parameters specified when create database
in TDengine.
name: specify the name of the database.
drop: indicate whether to delete the database before inserting. The value can be ‘yes’ or ‘no’. No means do not drop. The default is to drop.
replica: specify the number of replicas when creating the database.
days: specify the time span for storing data in a single data file. The default is 10.
cache: specify the size of the cache blocks in MB. The default value is 16.
blocks: specify the number of cache blocks in each vnode. The default is 6.
precision: specify the database time precision. The default value is “ms”.
keep: specify the number of days to keep the data. The default value is 3650.
minRows: specify the minimum number of records in the file block. The default value is 100.
maxRows: specify the maximum number of records in the file block. The default value is 4096.
comp: specify the file compression level. The default value is 2.
walLevel : specify WAL level, default is 1.
cacheLast: indicate whether to allow the last record of each table to be kept in memory. The default value is 0. The value can be 0, 1, 2, or 3.
quorum: specify the number of writing acknowledgments in multi-replica mode. The default value is 1.
fsync: specify the interval of fsync in ms when users set WAL to 2. The default value is 3000.
update : indicate whether to support data update, default value is 0, optional values are 0, 1, 2.
Super table related configuration parameters
The parameters for creating super tables are configured in super_tables
in the json configuration file, as shown below.
name: Super table name, mandatory, no default value.
child_table_exists : whether the child table already exists, default value is “no”, optional value is “yes” or “no”.
child_table_count : The number of child tables, the default value is 10.
child_table_prefix : The prefix of the child table name, mandatory configuration item, no default value.
escape_character: specify the super table and child table names containing escape characters. The value can be “yes” or “no”. The default is “no”.
auto_create_table: only when insert_mode is taosc, rest, stmt, and childtable_exists is “no”. “yes” means taosBenchmark will automatically create non-existent tables when inserting data; “no” means that taosBenchmark will create all tables before inserting.
batch_create_tbl_num : the number of tables per batch when creating sub-tables, default is 10. Note: the actual number of batches may not be the same as this value. If the executed SQL statement is larger than the maximum length supported, it will be automatically truncated and re-executed to continue creating.
data_source: specify the source of data-generation. Default is taosBenchmark randomly generated. Users can configure it as “rand” and “sample”. When “sample” is used, taosBenchmark will use the data in the file specified by the
sample_file
parameter.insert_mode: insertion mode with options taosc, rest, stmt, sml, sml-rest, corresponding to normal write, restful interface write, parameter binding interface write, schemaless interface write, restful schemaless interface write (provided by taosAdapter). The default value is taosc.
non_stop_mode: Specify whether to keep writing. If “yes”, insert_rows will be disabled, and writing will not stop until Ctrl + C stops the program. The default value is “no”, i.e., taosBenchmark will stop the writing after the specified number of rows are written. Note: insert_rows must be configured as a non-zero positive integer even if it fails in continuous write mode.
line_protocol: Insert data using line protocol. Only works when insert_mode is sml or sml-rest. The value can be
line
,telnet
, orjson
.tcp_transfer: Communication protocol in telnet mode only takes effect when insert_mode is sml-rest, and line_protocol is telnet. If not configured, the default protocol is http.
insert_rows : The number of inserted rows per child table, default is 0.
childtable_offset: Effective only if childtable_exists is yes, specifies the offset when fetching the list of child tables from the super table, i.e., starting from the first child table.
childtable_limit: Effective only when childtable_exists is yes, specifies the upper limit for fetching the list of child tables from the super table.
interlace_rows: Enables interleaved insertion mode and specifies the number of rows of data to be inserted into each child table at a time. Staggered insertion mode means inserting the number of rows specified by this parameter into each sub-table and repeating the process until all sub-tables have been inserted. The default value is 0, i.e., data is inserted into one sub-table before the next sub-table is inserted.
insert_interval : Specifies the insertion interval in ms for interleaved insertion mode. The default value is 0. It only works if
-B/--interlace-rows
is greater than 0. After inserting interlaced rows for each child table, the data insertion thread will wait for the interval specified by this value before proceeding to the next round of writes.partial_col_num: If this value is a positive number n, only the first n columns are written to, only if insert_mode is taosc and rest, or all columns if n is 0.
disorder_ratio : Specifies the percentage probability of disordered (i.e. out-of-order) data in the value range [0,50]. The default is 0, which means there is no disorder data.
disorder_range : Specifies the timestamp fallback range for the disordered data. The disordered timestamp is generated by subtracting a random value in this range, from the timestamp that would be used in the non-disorder case. Valid only if the percentage of disordered data specified by
-O/--disorder
is greater than 0.timestamp_step: The timestamp step for inserting data in each child table, in units consistent with the
precision
of the database. For e.g. if theprecision
is milliseconds, the timestamp step will be in milliseconds. The default value is 1.start_timestamp : The timestamp start value of each sub-table, the default value is now.
sample_format: The type of the sample data file; for now only “csv” is supported.
sample_file: Specify a CSV format file as the data source. It only works when data_source is a sample. If the number of rows in the CSV file is less than or equal to prepared_rand, then taosBenchmark will read the CSV file data cyclically until it is the same as prepared_rand; otherwise, taosBenchmark will read only the rows with the number of prepared_rand. The final number of rows of data generated is the smaller of the two.
use_sample_ts: effective only when data_source is
sample
, indicates whether the CSV file specified by sample_file contains the first timestamp column. Default is no. If set to yes, the first column of the CSV file is used astimestamp
. Since the timestamp of the same sub-table cannot be repeated, the amount of data generated depends on the same number of rows of data in the CSV file, and insert_rows will be invalidated.tags_file : only works when insert_mode is taosc, rest. The final tag value is related to the childtable_count. Suppose the tag data rows in the CSV file are smaller than the given number of child tables. In that case, taosBenchmark will read the CSV file data cyclically until the number of child tables specified by childtable_count is generated. Otherwise, taosBenchmark will read the childtable_count rows of tag data only. The final number of child tables generated is the smaller of the two.
Tag and Data Column Configuration Parameters
The configuration parameters for specifying super table tag columns and data columns are in columns
and tag
in super_tables
, respectively.
type: Specify the column type. For optional values, please refer to the data types supported by TDengine. Note: JSON data type is unique and can only be used for tags. When using JSON type as a tag, there is and can only be this one tag. At this time,
count
andlen
represent the meaning of the number of key-value pairs within the JSON tag and the length of the value of each KV pair. Respectively, the value is a string by default.len: Specifies the length of this data type, valid for NCHAR, BINARY, and JSON data types. If this parameter is configured for other data types, a value of 0 means that the column is always written with a null value; if it is not 0, it is ignored.
count: Specifies the number of consecutive occurrences of the column type, e.g., “count”: 4096 generates 4096 columns of the specified type.
name : The name of the column, if used together with count, e.g. “name”: “current”, “count”:3, then the names of the 3 columns are current, current_2. current_3.
min: The minimum value of the column/label of the data type.
max: The maximum value of the column/label of the data type.
values: The value field of the nchar/binary column/label, which will be chosen randomly from the values.
insertion behavior configuration parameters
thread_count: specify the number of threads to insert data. Default is 8.
create_table_thread_count : The number of threads to build the table, default is 8.
connection_pool_size : The number of pre-established connections to the TDengine server. If not configured, it is the same as number of threads specified.
result_file : The path to the result output file, the default value is . /output.txt.
confirm_parameter_prompt: The switch parameter requires the user to confirm after the prompt to continue. The default value is false.
interlace_rows: Enables interleaved insertion mode and specifies the number of rows of data to be inserted into each child table at a time. Interleaved insertion mode means inserting the number of rows specified by this parameter into each sub-table and repeating the process until all sub-tables are inserted. The default value is 0, which means that data will be inserted into the following child table only after data is inserted into one child table. This parameter can also be configured in
super_tables
, and if so, the configuration insuper_tables
takes precedence and overrides the global setting.insert_interval : Specifies the insertion interval in ms for interleaved insertion mode. The default value is 0. Only works if
-B/--interlace-rows
is greater than 0. It means that after inserting interlace rows for each child table, the data insertion thread will wait for the interval specified by this value before proceeding to the next round of writes. This parameter can also be configured insuper_tables
, and if configured, the configuration insuper_tables
takes high priority, overriding the global setting.num_of_records_per_req : The number of rows of data to be written per request to TDengine, the default value is 30000. When it is set too large, the TDengine client driver will return the corresponding error message, so you need to lower the setting of this parameter to meet the writing requirements.
prepare_rand: The number of unique values in the generated random data. A value of 1 means that all data are the same. The default value is 10000.
Query scenario configuration parameters
filetype
must be set to query
in the query scenario. See [General Configuration Parameters](#General Configuration Parameters) for details of this parameter and other general parameters
Configuration parameters for executing the specified query statement
The configuration parameters for querying the sub-tables or the normal tables are set in specified_table_query
.
query_interval : The query interval in seconds, the default value is 0.
threads: The number of threads to execute the query SQL, the default value is 1.
sqls.
- sql: the SQL command to be executed.
- result: the file to save the query result. If it is unspecified, taosBenchmark will not save the result.
Configuration parameters of query super table
The configuration parameters of the super table query are set in super_table_query
.
stblname: Specify the name of the super table to be queried, required.
query_interval : The query interval in seconds, the default value is 0.
threads: The number of threads to execute the query SQL, the default value is 1.
sqls : The default value is 1.
- sql: The SQL command to be executed. For the query SQL of super table, keep “xxxx” in the SQL command. The program will automatically replace it with all the sub-table names of the super table. Replace it with all the sub-table names in the super table.
- result: The file to save the query result. If not specified, taosBenchmark will not save result.
Subscription scenario configuration parameters
filetype
must be set to subscribe
in the subscription scenario. See [General Configuration Parameters](#General Configuration Parameters) for details of this and other general parameters
Configuration parameters for executing the specified subscription statement
The configuration parameters for subscribing to a sub-table or a generic table are set in specified_table_query
.
threads: The number of threads to execute SQL, default is 1.
interval: The time interval to execute the subscription, in seconds, default is 0.
restart : “yes” means start a new subscription, “no” means continue the previous subscription, the default value is “no”.
keepProgress: “yes” means keep the progress of the subscription, “no” means don’t keep it, and the default value is “no”.
resubAfterConsume: “yes” means cancel the previous subscription and then subscribe again, “no” means continue the previous subscription, and the default value is “no”.
sqls : The default value is “no”.
- sql : The SQL command to be executed, required.
- result : The file to save the query result, unspecified is not saved.
Configuration parameters for subscribing to supertables
The configuration parameters for subscribing to a super table are set in super_table_query
.
stblname: The name of the super table to subscribe.
threads: The number of threads to execute SQL, default is 1.
interval: The time interval to execute the subscription, in seconds, default is 0.
restart : “yes” means start a new subscription, “no” means continue the previous subscription, the default value is “no”.
keepProgress: “yes” means keep the progress of the subscription, “no” means don’t keep it, and the default value is “no”.
resubAfterConsume: “yes” means cancel the previous subscription and then subscribe again, “no” means continue the previous subscription, and the default value is “no”.
sqls : The default value is “no”.
- sql: SQL command to be executed, required; for the query SQL of the super table, keep “xxxx” in the SQL command, and the program will replace it with all the sub-table names of the super table automatically. Replace it with all the sub-table names in the super table.
- result: The file to save the query result, if not specified, it will not be saved.