Integration with HDFS

Shared Storage Architecture

Currently, TsFiles(including both TsFile and related data files) are supported to be stored in local file system and hadoop distributed file system(HDFS). It is very easy to config the storage file system of TSFile.

System architecture

When you config to store TSFile on HDFS, your data files will be in distributed storage. The system architecture is as below:

Writing Data on HDFS - 图1

Config and usage

To store TSFile and related data files in HDFS, here are the steps:

First, download the source release from website or git clone the repository

Build server and Hadoop module by: mvn clean package -pl server,hadoop -am -Dmaven.test.skip=true -P get-jar-with-dependencies

Then, copy the target jar of Hadoop module hadoop-tsfile-X.X.X-jar-with-dependencies.jar into server target lib folder .../server/target/iotdb-server-X.X.X/lib.

Edit user config in iotdb-engine.properties. Related configurations are:

  • tsfile_storage_fs
Nametsfile_storage_fs
DescriptionThe storage file system of Tsfile and related data files. Currently LOCAL file system and HDFS are supported.
TypeString
DefaultLOCAL
EffectiveOnly allowed to be modified in first start up
  • core_site_path
Namecore_site_path
DescriptionAbsolute file path of core-site.xml if Tsfile and related data files are stored in HDFS.
TypeString
Default/etc/hadoop/conf/core-site.xml
EffectiveAfter restart system
  • hdfs_site_path
Namehdfs_site_path
DescriptionAbsolute file path of hdfs-site.xml if Tsfile and related data files are stored in HDFS.
TypeString
Default/etc/hadoop/conf/hdfs-site.xml
EffectiveAfter restart system
  • hdfs_ip
Namehdfs_ip
DescriptionIP of HDFS if Tsfile and related data files are stored in HDFS. If there are more than one hdfs_ip in configuration, Hadoop HA is used.
TypeString
Defaultlocalhost
EffectiveAfter restart system
  • hdfs_port
Namehdfs_port
DescriptionPort of HDFS if Tsfile and related data files are stored in HDFS
TypeString
Default9000
EffectiveAfter restart system
  • dfs_nameservices
Namehdfs_nameservices
DescriptionNameservices of HDFS HA if using Hadoop HA
TypeString
Defaulthdfsnamespace
EffectiveAfter restart system
  • dfs_ha_namenodes
Namehdfs_ha_namenodes
DescriptionNamenodes under DFS nameservices of HDFS HA if using Hadoop HA
TypeString
Defaultnn1,nn2
EffectiveAfter restart system
  • dfs_ha_automatic_failover_enabled
Namedfs_ha_automatic_failover_enabled
DescriptionWhether using automatic failover if using Hadoop HA
TypeBoolean
Defaulttrue
EffectiveAfter restart system
  • dfs_client_failover_proxy_provider
Namedfs_client_failover_proxy_provider
DescriptionProxy provider if using Hadoop HA and enabling automatic failover
TypeString
Defaultorg.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
EffectiveAfter restart system
  • hdfs_use_kerberos
Namehdfs_use_kerberos
DescriptionWhether use kerberos to authenticate hdfs
TypeString
Defaultfalse
EffectiveAfter restart system
  • kerberos_keytab_file_path
Namekerberos_keytab_file_path
DescriptionFull path of kerberos keytab file
TypeString
Default/path
EffectiveAfter restart system
  • kerberos_principal
Namekerberos_principal
DescriptionKerberos pricipal
TypeString
Defaultyour principal
EffectiveAfter restart system

Start server, and Tsfile will be stored on HDFS.

To reset storage file system to local, just edit configuration tsfile_storage_fs to LOCAL. In this situation, if data files are already on HDFS, you should either download them to local and move them to your config data file folder (../server/target/iotdb-server-X.X.X/data/data by default), or restart your process and import data to IoTDB.

Frequent questions

  1. What Hadoop version does it support?

A: Both Hadoop 2.x and Hadoop 3.x can be supported.

  1. When starting the server or trying to create timeseries, I encounter the error below:
  1. ERROR org.apache.iotdb.tsfile.fileSystem.fsFactory.HDFSFactory:62 - Failed to get Hadoop file system. Please check your dependency of Hadoop module.

A: It indicates that you forget to put Hadoop module dependency in IoTDB server. You can solve it by:

  • Build Hadoop module: mvn clean package -pl hadoop -am -Dmaven.test.skip=true -P get-jar-with-dependencies
  • Copy the target jar of Hadoop module hadoop-tsfile-X.X.X-jar-with-dependencies.jar into server target lib folder .../server/target/iotdb-server-X.X.X/lib.