TsFile-Hive-Connector User Guide

TsFile-Hive-Connector User Guide

Outline

TsFile-Hive-Connector User Guide
- About TsFile-Hive-Connector
- System Requirements
- Data Type Correspondence
- Add Dependency For Hive
- Create Tsfile-backed Hive tables
- Query from Tsfile-backed Hive tables
  - Select Clause Example
  - Aggregate Clause Example
- What’s Next

About TsFile-Hive-Connector

TsFile-Hive-Connector implements the support of Hive for external data sources of Tsfile type. This enables users to operate Tsfile by Hive.

With this connector, you can

Load a single TsFile, from either the local file system or hdfs, into hive
Load all files in a specific directory, from either the local file system or hdfs, into hive
Query the tsfile through HQL.
As of now, the write operation is not supported in hive-connector. So, insert operation in HQL is not allowed while operating tsfile through hive.

System Requirements

Hadoop Version	Hive Version	Java Version	TsFile
`2.7.3` or `3.2.1`	`2.3.6` or `3.1.2`	`1.8`	`0.11.1`

Note: For more information about how to download and use TsFile, please see the following link: https://github.com/apache/iotdb/tree/master/tsfile.

Data Type Correspondence

TsFile data type	Hive field type
BOOLEAN	Boolean
INT32	INT
INT64	BIGINT
FLOAT	Float
DOUBLE	Double
TEXT	STRING

Add Dependency For Hive

To use hive-connector in hive, we should add the hive-connector jar into hive.

After downloading the code of iotdb from https://github.com/apache/iotdb (opens new window), you can use the command of mvn clean package -pl hive-connector -am -Dmaven.test.skip=true to get a hive-connector-X.X.X-jar-with-dependencies.jar.

Then in hive, use the command of add jar XXX to add the dependency. For example:

hive> add jar /Users/hive/iotdb/hive-connector/target/hive-connector-0.11.1-jar-with-dependencies.jar;
Added [/Users/hive/iotdb/hive-connector/target/hive-connector-0.11.1-jar-with-dependencies.jar] to class path
Added resources: [/Users/hive/iotdb/hive-connector/target/hive-connector-0.11.1-jar-with-dependencies.jar]

Create Tsfile-backed Hive tables

To create a Tsfile-backed table, specify the serde as org.apache.iotdb.hive.TsFileSerDe, specify the inputformat as org.apache.iotdb.hive.TSFHiveInputFormat, and the outputformat as org.apache.iotdb.hive.TSFHiveOutputFormat.

Also provide a schema which only contains two fields: time_stamp and sensor_id for the table. time_stamp is the time value of the time series and sensor_id is the sensor name to extract from the tsfile to hive such as sensor_1. The name of the table can be any valid table names in hive.

Also a location provided for hive-connector to pull the most current data for the table.

The location should be a specific directory on your local file system or HDFS to set up Hadoop. If it is in your local file system, the location should look like file:///data/data/sequence/root.baic2.WWS.leftfrontdoor/

Last, set the device_id in TBLPROPERTIES to the device name you want to analyze.

For example:

CREATE EXTERNAL TABLE IF NOT EXISTS only_sensor_1(
  time_stamp TIMESTAMP,
  sensor_1 BIGINT)
ROW FORMAT SERDE 'org.apache.iotdb.hive.TsFileSerDe'
STORED AS
  INPUTFORMAT 'org.apache.iotdb.hive.TSFHiveInputFormat'
  OUTPUTFORMAT 'org.apache.iotdb.hive.TSFHiveOutputFormat'
LOCATION '/data/data/sequence/root.baic2.WWS.leftfrontdoor/'
TBLPROPERTIES ('device_id'='root.baic2.WWS.leftfrontdoor.plc1');

In this example, the data of root.baic2.WWS.leftfrontdoor.plc1.sensor_1 is pulled from the directory of /data/data/sequence/root.baic2.WWS.leftfrontdoor/. This table results in a description as below:

hive> describe only_sensor_1;
OK
time_stamp              timestamp                  from deserializer
sensor_1                bigint                  from deserializer
Time taken: 0.053 seconds, Fetched: 2 row(s)

At this point, the Tsfile-backed table can be worked with in Hive like any other table.

Query from Tsfile-backed Hive tables

Before we do any queries, we should set the hive.input.format in hive by executing the following command.

hive> set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;

Now, we already have an external table named only_sensor_1 in hive. We can use any query operations through HQL to analyse it.