IoTDB Deployment Recommendation
Backgrounds
System Abilities
- Performance: writing and reading performance, compression ratio
- Extensibility: system has the ability to manage data with multiple nodes, and is essentially that data can be managed by partitions
- High availability(HA): system has the ability to tolerate the nodes disconnected, and is essentially that the data has replicas
- Consistency:when data is with multiple copies, whether the replicas are consistent, and is essentially that the system treats the whole database as a single node
Abbreviations
- C: ConfigNode
- D: DataNode
- nCmD:cluster with n ConfigNodes and m DataNodes
Deployment mode
mode | Performance | Extensibility | HA | Consistency |
---|---|---|---|---|
Lightweight standalone mode | Extremely High | None | None | High |
Scalable standalone mode (default) | High | High | Medium | High |
High performance cluster mode | High | High | High | Medium |
Strong consistency cluster mode | Medium | High | High | High |
Config | Lightweight standalone mode | Scalable single node mode | High performance mode | strong consistency cluster mode |
---|---|---|---|---|
ConfigNode number | 1 | ≥1 (odd number) | ≥1 (odd number) | ≥1 (odd number) |
DataNode number | 1 | ≥1 | ≥3 | ≥3 |
schema_replication_factor | 1 | 1 | 3 | 3 |
data_replication_factor | 1 | 1 | 2 | 3 |
config_node_consensus_protocol_class | Simple | Ratis | Ratis | Ratis |
schema_region_consensus_protocol_class | Simple | Ratis | Ratis | Ratis |
data_region_consensus_protocol_class | Simple | IoT | IoT | Ratis |
Deployment Recommendation
Upgrade from v0.13 to v1.0
Scenario:
Already has some data under v0.13, hope to upgrade to v1.0.
Options:
- Upgrade to 1C1D standalone mode, allocate 2GB memory to ConfigNode, allocate same memory size with v0.13 to DataNode.
- Upgrade to 3C3D cluster mode, allocate 2GB memory to ConfigNode, allocate same memory size with v0.13 to DataNode.
Configuration modification:
- Do not point v1.0 data directory to v0.13 data directory
- region_group_extension_strategy=COSTOM
- data_region_group_per_database
- for 3C3D cluster mode: Cluster CPU total core num / data_replication_factor
- for 1C1D standalone mode: use virtual_storage_group_num in v0.13
Data migration:
After modifying the configuration, use load-tsfile tool to load the TsFiles of v0.13 to v1.0.
Use v1.0 directly
Recommend to use 1 Database only
Memory estimation
Use active series number to estimate memory size
Cluster DataNode total heap size(GB) = active series number / 100000 * data_replication_factor
Heap size of each DataNode (GB) = Cluster DataNode total heap size / DataNode number
Example: use 3C3D to manage 1 million timeseries, use 3 data replicas
- Cluster DataNode total heap size: 1,000,000 / 100,000 * 3 = 30G
- 每Heap size of each DataNode: 30 / 3 = 10G
Use total series number to estimate memory size
Cluster DataNode total heap size(B) = 20 * (180 + 2 * average character num of the series full path) * total series number * schema_replication_factor
Heap size of each DataNode = Cluster DataNode total heap size / DataNode number
Example: use 3C3D to manage 1 million timeseries, use 3 schema replicas, series name such as root.sg_1.d_10.s_100(20 chars)
- Cluster DataNode total heap size: 20 * (180 + 2 * 20) * 1,000,000 * 3 = 13.2 GB
- Heap size of each DataNode: 13.2 GB / 3 = 4.4 GB
Disk estimation
IoTDB storage size = data storage size + schema storage size + temp storage size
Data storage size
Series number * Sampling frequency * Data point size * Storage duration * data_replication_factor / 10 (compression ratio)
Data Type \ Data point size | Timestamp (Byte) | Value (Byte) | Total (Byte) |
---|---|---|---|
Boolean | 8 | 1 | 9 |
INT32 / FLOAT | 8 | 4 | 12 |
INT64)/ DOUBLE | 8 | 8 | 16 |
TEXT | 8 | Assuming a | 8+a |
Example: 1000 devices, 100 sensors for one device, 100,000 series total, INT32 data type, 1Hz sampling frequency, 1 year storage duration, 3 replicas, compression ratio is 10
Data storage size = 1000 * 100 * 12 * 86400 * 365 * 3 / 10 = 11T
Schema storage size
One series uses the path character byte size + 20 bytes.
If the series has tag, add the tag character byte size.
Temp storage size
Temp storage size = WAL storage size + Consensus storage size + Compaction temp storage size
- WAL
max wal storage size = memtable memory size ÷ wal_min_effective_info_ratio
- memtable memory size is decided by storage_query_schema_consensus_free_memory_proportion, storage_engine_memory_proportion and write_memory_proportion
- wal_min_effective_info_ratio is decided by wal_min_effective_info_ratio configuration
Example: allocate 16G memory for DataNode, config is as below:
storage_query_schema_consensus_free_memory_proportion=3:3:1:1:2
storage_engine_memory_proportion=8:2
write_memory_proportion=19:1
wal_min_effective_info_ratio=0.1
max wal storage size = 16 * (3 / 10) * (8 / 10) * (19 / 20) ÷ 0.1 = 36.48G
- Consensus
Ratis consensus
When using ratis consensus protocol, we need extra storage for Raft Log, which will be deleted after the state machine takes snapshot.
We can adjust trigger_snapshot_threshold
to control the maximum Raft Log disk usage.
Raft Log disk size in each Region = average * trigger_snapshot_threshold
The total Raft Log storage space is proportional to the data replica number
Example: DataRegion, 20kB data for one request, data_region_trigger_snapshot_threshold = 400,000, then max Raft Log disk size = 20K * 400,000 = 8G.
Raft Log increases from 0 to 8GB, and then turns to 0 after snapshot. Average size will be 4GB.
When replica number is 3, max Raft log size will be 3 * 8G = 24G.
What’s more, we can configure data_region_ratis_log_max_size to limit max log size of a single DataRegion.
By default, data_region_ratis_log_max_size=20G, which guarantees that Raft Log size would not exceed 20G.
- Compaction
Inner space compaction
Disk space for temporary files = Total Disk space of origin filesExample: 10 origin files, 100MB for each file
Disk space for temporary files = 10 * 100 = 1000MOuter space compaction
The overlap of out-of-order data = overlapped data amount / total out-of-order data amountDisk space for temporary file = Total ordered Disk space of origin files + Total out-of-order disk space of origin files *(1 - overlap)
Example: 10 ordered files, 10 out-of-order files, 100M for each ordered file, 50M for each out-of-order file, half of data is overlapped with sequence file
The overlap of out-of-order data = 25M/50M * 100% = 50%
Disk space for temporary files = 10 * 100 + 10 * 50 * 50% = 1250M