Resource Planning
It is important to plan computing and storage resources if using TDengine to build an IoT, time-series or Big Data platform. How to plan the CPU, memory and disk resources required, will be described in this chapter.
Server Memory Requirements
Each database creates a fixed number of vgroups. This number is 2 by default and can be configured with the vgroups
parameter. The number of replicas can be controlled with the replica
parameter. Each replica requires one vnode per vgroup. Altogether, the memory required by each database depends on the following configuration options:
- vgroups
- replica
- buffer
- pages
- pagesize
- cachesize
For more information, see Database.
The memory required by a database is therefore greater than or equal to:
vgroups * replica * (buffer + pages * pagesize + cachesize)
However, note that this requirement is spread over all dnodes in the cluster, not on a single physical machine. The physical servers that run dnodes meet the requirement together. If a cluster has multiple databases, the memory required increases accordingly. In complex environments where dnodes were added after initial deployment in response to increasing resource requirements, load may not be balanced among the original dnodes and newer dnodes. In this situation, the actual status of your dnodes is more important than theoretical calculations.
Client Memory Requirements
For the client programs using TDengine client driver taosc
to connect to the server side there is a memory requirement as well.
The memory consumed by a client program is mainly about the SQL statements for data insertion, caching of table metadata, and some internal use. Assuming maximum number of tables is N (the memory consumed by the metadata of each table is 256 bytes), maximum number of threads for parallel insertion is T, maximum length of a SQL statement is S (normally 1 MB), the memory required by a client program can be estimated using the below formula:
M = (T * S * 3 + (N / 4096) + 100)
For example, if the number of parallel data insertion threads is 100, total number of tables is 10,000,000, then the minimum memory requirement of a client program is:
100 * 3 + (10000000 / 4096) + 100 = 2741 (MBytes)
So, at least 3GB needs to be reserved for such a client.
CPU Requirement
The CPU resources required depend on two aspects:
- Data Insertion Each dnode of TDengine can process at least 10,000 insertion requests in one second, while each insertion request can have multiple rows. The difference in computing resource consumed, between inserting 1 row at a time, and inserting 10 rows at a time is very small. So, the more the number of rows that can be inserted one time, the higher the efficiency. If each insert request contains more than 200 records, a single core can process more than 1 million records per second. Inserting in batch also imposes requirements on the client side which needs to cache rows to insert in batch once the number of cached rows reaches a threshold.
- Data Query High efficiency query is provided in TDengine, but it’s hard to estimate the CPU resource required because the queries used in different use cases and the frequency of queries vary significantly. It can only be verified with the query statements, query frequency, data size to be queried, and other requirements provided by users.
In short, the CPU resource required for data insertion can be estimated but it’s hard to do so for query use cases. If possible, ensure that CPU usage remains below 50%. If this threshold is exceeded, it’s a reminder for system operator to add more nodes in the cluster to expand resources.
Disk Requirement
The compression ratio in TDengine is much higher than that in RDBMS. In most cases, the compression ratio in TDengine is bigger than 5, or even 10 in some cases, depending on the characteristics of the original data. The data size before compression can be calculated based on below formula:
Raw DataSize = numOfTables * rowSizePerTable * rowsPerTable
For example, there are 10,000,000 meters, while each meter collects data every 15 minutes and the data size of each collection is 128 bytes, so the raw data size of one year is: 10000000 * 128 * 24 * 60 / 15 * 365 = 44.8512(TB). Assuming compression ratio is 5, the actual disk size is: 44.851 / 5 = 8.97024(TB).
Parameter keep
can be used to set how long the data will be kept on disk. To further reduce storage cost, multiple storage levels can be enabled in TDengine, with the coldest data stored on the cheapest storage device. This is completely transparent to application programs.
To increase performance, multiple disks can be setup for parallel data reading or data inserting. Please note that an expensive disk array is not necessary because replications are used in TDengine to provide high availability.
Number of Hosts
A host can be either physical or virtual. The total memory, total CPU, total disk required can be estimated according to the formulae mentioned previously. If the number of data replicas is not 1, the required resources are multiplied by the number of replicas.
Then, according to the system resources that a single host can provide, assuming all hosts have the same resources, the number of hosts can be derived easily.