IoTDB Deployment Recommendation

IoTDB Deployment Recommendation

Backgrounds

System Abilities

Performance: writing and reading performance, compression ratio
Extensibility: system has the ability to manage data with multiple nodes, and is essentially that data can be managed by partitions
High availability(HA): system has the ability to tolerate the nodes disconnected, and is essentially that the data has replicas
Consistency：when data is with multiple copies, whether the replicas are consistent, and is essentially that the system treats the whole database as a single node

Abbreviations

C: ConfigNode
D: DataNode
nCmD：cluster with n ConfigNodes and m DataNodes

Deployment mode

mode	Performance	Extensibility	HA	Consistency
Lightweight standalone mode	Extremely High	None	None	High
Scalable standalone mode (default)	High	High	Medium	High
High performance cluster mode	High	High	High	Medium
Strong consistency cluster mode	Medium	High	High	High

Config	Lightweight standalone mode	Scalable single node mode	High performance mode	strong consistency cluster mode
ConfigNode number	1	≥1 (odd number)	≥1 (odd number)	≥1 (odd number)
DataNode number	1	≥1	≥3	≥3
schema_replication_factor	1	1	3	3
data_replication_factor	1	1	2	3
config_node_consensus_protocol_class	Simple	Ratis	Ratis	Ratis
schema_region_consensus_protocol_class	Simple	Ratis	Ratis	Ratis
data_region_consensus_protocol_class	Simple	IoT	IoT	Ratis

Deployment Recommendation

Upgrade from v0.13 to v1.0

Scenario: Already has some data under v0.13, hope to upgrade to v1.0.

Options:

Upgrade to 1C1D standalone mode, allocate 2GB memory to ConfigNode, allocate same memory size with v0.13 to DataNode.
Upgrade to 3C3D cluster mode, allocate 2GB memory to ConfigNode, allocate same memory size with v0.13 to DataNode.

Configuration modification:

Do not point v1.0 data directory to v0.13 data directory
region_group_extension_strategy=COSTOM
data_region_group_per_database
- for 3C3D cluster mode: Cluster CPU total core num / data_replication_factor
- for 1C1D standalone mode: use virtual_storage_group_num in v0.13

Data migration: After modifying the configuration, use load-tsfile tool to load the TsFiles of v0.13 to v1.0.

Use v1.0 directly

Recommend to use 1 Database only

Memory estimation

Use active series number to estimate memory size

Cluster DataNode total heap size(GB) = active series number / 100000 * data_replication_factor

Heap size of each DataNode (GB) = Cluster DataNode total heap size / DataNode number

Example: use 3C3D to manage 1 million timeseries, use 3 data replicas

Cluster DataNode total heap size: 1,000,000 / 100,000 * 3 = 30G
每Heap size of each DataNode: 30 / 3 = 10G

Use total series number to estimate memory size

Cluster DataNode total heap size（B） = 20 * (180 + 2 * average character num of the series full path) * total series number * schema_replication_factor

Heap size of each DataNode = Cluster DataNode total heap size / DataNode number

Example: use 3C3D to manage 1 million timeseries, use 3 schema replicas, series name such as root.sg_1.d_10.s_100(20 chars)

Cluster DataNode total heap size: 20 * (180 + 2 * 20) * 1,000,000 * 3 = 13.2 GB
Heap size of each DataNode: 13.2 GB / 3 = 4.4 GB

Disk estimation

IoTDB storage size = data storage size + schema storage size + temp storage size

Data storage size

Series number * Sampling frequency * Data point size * Storage duration * data_replication_factor / 10 (compression ratio)

Data Type \ Data point size	Timestamp (Byte)	Value (Byte)	Total (Byte)
Boolean	8	1	9
INT32 / FLOAT	8	4	12
INT64）/ DOUBLE	8	8	16
TEXT	8	Assuming a	8+a

Example: 1000 devices, 100 sensors for one device, 100,000 series total, INT32 data type, 1Hz sampling frequency, 1 year storage duration, 3 replicas, compression ratio is 10 Data storage size = 1000 * 100 * 12 * 86400 * 365 * 3 / 10 = 11T

Schema storage size

One series uses the path character byte size + 20 bytes. If the series has tag, add the tag character byte size.

Temp storage size

Temp storage size = WAL storage size + Consensus storage size + Compaction temp storage size

max wal storage size = memtable memory size ÷ wal_min_effective_info_ratio

memtable memory size is decided by storage_query_schema_consensus_free_memory_proportion, storage_engine_memory_proportion and write_memory_proportion
wal_min_effective_info_ratio is decided by wal_min_effective_info_ratio configuration

Example: allocate 16G memory for DataNode, config is as below: storage_query_schema_consensus_free_memory_proportion=3:3:1:1:2 storage_engine_memory_proportion=8:2 write_memory_proportion=19:1 wal_min_effective_info_ratio=0.1 max wal storage size = 16 * (3 / 10) * (8 / 10) * (19 / 20) ÷ 0.1 = 36.48G

Consensus

Ratis consensus

When using ratis consensus protocol, we need extra storage for Raft Log, which will be deleted after the state machine takes snapshot. We can adjust trigger_snapshot_threshold to control the maximum Raft Log disk usage.

Raft Log disk size in each Region = average * trigger_snapshot_threshold

The total Raft Log storage space is proportional to the data replica number

Example: DataRegion, 20kB data for one request, data_region_trigger_snapshot_threshold = 400,000, then max Raft Log disk size = 20K * 400,000 = 8G. Raft Log increases from 0 to 8GB, and then turns to 0 after snapshot. Average size will be 4GB. When replica number is 3, max Raft log size will be 3 * 8G = 24G.

What’s more, we can configure data_region_ratis_log_max_size to limit max log size of a single DataRegion. By default, data_region_ratis_log_max_size=20G, which guarantees that Raft Log size would not exceed 20G.

Compaction

Inner space compaction Disk space for temporary files = Total Disk space of origin files

Example: 10 origin files, 100MB for each file Disk space for temporary files = 10 * 100 = 1000M
Outer space compaction The overlap of out-of-order data = overlapped data amount / total out-of-order data amount

Disk space for temporary file = Total ordered Disk space of origin files + Total out-of-order disk space of origin files *（1 - overlap）

Example: 10 ordered files, 10 out-of-order files, 100M for each ordered file, 50M for each out-of-order file, half of data is overlapped with sequence file The overlap of out-of-order data = 25M/50M * 100% = 50% Disk space for temporary files = 10 * 100 + 10 * 50 * 50% = 1250M