IoTDB Deployment Recommendation

Backgrounds

System Abilities

  • Performance: writing and reading performance, compression ratio
  • Extensibility: system has the ability to manage data with multiple nodes, and is essentially that data can be managed by partitions
  • High availability(HA): system has the ability to tolerate the nodes disconnected, and is essentially that the data has replicas
  • Consistency:when data is with multiple copies, whether the replicas are consistent, and is essentially that the system treats the whole database as a single node

Abbreviations

  • C: ConfigNode
  • D: DataNode
  • nCmD:cluster with n ConfigNodes and m DataNodes

Deployment mode

modePerformanceExtensibilityHAConsistency
Lightweight standalone modeExtremely HighNoneNoneHigh
Scalable standalone mode (default)HighHighMediumHigh
High performance cluster modeHighHighHighMedium
Strong consistency cluster modeMediumHighHighHigh
ConfigLightweight standalone modeScalable single node modeHigh performance modestrong consistency cluster mode
ConfigNode number1≥1 (odd number)≥1 (odd number)≥1 (odd number)
DataNode number1≥1≥3≥3
schema_replication_factor1133
data_replication_factor1123
config_node_consensus_protocol_classSimpleRatisRatisRatis
schema_region_consensus_protocol_classSimpleRatisRatisRatis
data_region_consensus_protocol_classSimpleIoTIoTRatis

Deployment Recommendation

Upgrade from v0.13 to v1.0

Scenario: Already has some data under v0.13, hope to upgrade to v1.0.

Options:

  1. Upgrade to 1C1D standalone mode, allocate 2GB memory to ConfigNode, allocate same memory size with v0.13 to DataNode.
  2. Upgrade to 3C3D cluster mode, allocate 2GB memory to ConfigNode, allocate same memory size with v0.13 to DataNode.

Configuration modification:

  • Do not point v1.0 data directory to v0.13 data directory
  • region_group_extension_strategy=COSTOM
  • data_region_group_per_database
    • for 3C3D cluster mode: Cluster CPU total core num / data_replication_factor
    • for 1C1D standalone mode: use virtual_storage_group_num in v0.13

Data migration: After modifying the configuration, use load-tsfile tool to load the TsFiles of v0.13 to v1.0.

Use v1.0 directly

Recommend to use 1 Database only

Memory estimation

Use active series number to estimate memory size

Cluster DataNode total heap size(GB) = active series number / 100000 * data_replication_factor

Heap size of each DataNode (GB) = Cluster DataNode total heap size / DataNode number

Example: use 3C3D to manage 1 million timeseries, use 3 data replicas

  • Cluster DataNode total heap size: 1,000,000 / 100,000 * 3 = 30G
  • 每Heap size of each DataNode: 30 / 3 = 10G
Use total series number to estimate memory size

Cluster DataNode total heap size(B) = 20 * (180 + 2 * average character num of the series full path) * total series number * schema_replication_factor

Heap size of each DataNode = Cluster DataNode total heap size / DataNode number

Example: use 3C3D to manage 1 million timeseries, use 3 schema replicas, series name such as root.sg_1.d_10.s_100(20 chars)

  • Cluster DataNode total heap size: 20 * (180 + 2 * 20) * 1,000,000 * 3 = 13.2 GB
  • Heap size of each DataNode: 13.2 GB / 3 = 4.4 GB

Disk estimation

IoTDB storage size = data storage size + schema storage size + temp storage size

Data storage size

Series number * Sampling frequency * Data point size * Storage duration * data_replication_factor / 10 (compression ratio)

Data Type \ Data point sizeTimestamp (Byte)Value (Byte)Total (Byte)
Boolean819
INT32 / FLOAT8412
INT64)/ DOUBLE8816
TEXT8Assuming a8+a

Example: 1000 devices, 100 sensors for one device, 100,000 series total, INT32 data type, 1Hz sampling frequency, 1 year storage duration, 3 replicas, compression ratio is 10 Data storage size = 1000 * 100 * 12 * 86400 * 365 * 3 / 10 = 11T

Schema storage size

One series uses the path character byte size + 20 bytes. If the series has tag, add the tag character byte size.

Temp storage size

Temp storage size = WAL storage size + Consensus storage size + Compaction temp storage size

  1. WAL

max wal storage size = memtable memory size ÷ wal_min_effective_info_ratio

  • memtable memory size is decided by storage_query_schema_consensus_free_memory_proportion, storage_engine_memory_proportion and write_memory_proportion
  • wal_min_effective_info_ratio is decided by wal_min_effective_info_ratio configuration

Example: allocate 16G memory for DataNode, config is as below: storage_query_schema_consensus_free_memory_proportion=3:3:1:1:2 storage_engine_memory_proportion=8:2 write_memory_proportion=19:1 wal_min_effective_info_ratio=0.1 max wal storage size = 16 * (3 / 10) * (8 / 10) * (19 / 20) ÷ 0.1 = 36.48G

  1. Consensus

Ratis consensus

When using ratis consensus protocol, we need extra storage for Raft Log, which will be deleted after the state machine takes snapshot. We can adjust trigger_snapshot_threshold to control the maximum Raft Log disk usage.

Raft Log disk size in each Region = average * trigger_snapshot_threshold

The total Raft Log storage space is proportional to the data replica number

Example: DataRegion, 20kB data for one request, data_region_trigger_snapshot_threshold = 400,000, then max Raft Log disk size = 20K * 400,000 = 8G. Raft Log increases from 0 to 8GB, and then turns to 0 after snapshot. Average size will be 4GB. When replica number is 3, max Raft log size will be 3 * 8G = 24G.

What’s more, we can configure data_region_ratis_log_max_size to limit max log size of a single DataRegion. By default, data_region_ratis_log_max_size=20G, which guarantees that Raft Log size would not exceed 20G.

  1. Compaction
  • Inner space compaction Disk space for temporary files = Total Disk space of origin files

    Example: 10 origin files, 100MB for each file Disk space for temporary files = 10 * 100 = 1000M

  • Outer space compaction The overlap of out-of-order data = overlapped data amount / total out-of-order data amount

    Disk space for temporary file = Total ordered Disk space of origin files + Total out-of-order disk space of origin files *(1 - overlap)

    Example: 10 ordered files, 10 out-of-order files, 100M for each ordered file, 50M for each out-of-order file, half of data is overlapped with sequence file The overlap of out-of-order data = 25M/50M * 100% = 50% Disk space for temporary files = 10 * 100 + 10 * 50 * 50% = 1250M