TsFile-Hive-Connector User Guide

TsFile-Hive-Connector User Guide

Outline

TsFile-Hive-Connector User Guide
- About TsFile-Hive-Connector
- System Requirements
- Data Type Correspondence
- Add Dependency For Hive
- Creating Tsfile-backed Hive tables
- Querying from Tsfile-backed Hive tables
  - Select Clause Example
  - Aggregate Clause Example
- What’s Next

About TsFile-Hive-Connector

TsFile-Hive-Connector implements the support of Hive for external data sources of Tsfile type. This enables users to operate Tsfile by Hive.

With this connector, you can

Load a single TsFile, from either the local file system or hdfs, into hive
Load all files in a specific directory, from either the local file system or hdfs, into hive
Query the tsfile through HQL.
As of now, the write operation is not supported in hive-connector. So, insert operation in HQL is not allowed while operating tsfile through hive.

System Requirements

Hadoop Version	Hive Version	Java Version	TsFile
`2.7.3` or `3.2.1`	`2.3.6` or `3.1.2`	`1.8`	`0.10.0`

Note: For more information about how to download and use TsFile, please see the following link: https://github.com/apache/incubator-iotdb/tree/master/tsfile.

Data Type Correspondence

TsFile data type	Hive field type
BOOLEAN	Boolean
INT32	INT
INT64	BIGINT
FLOAT	Float
DOUBLE	Double
TEXT	STRING

Add Dependency For Hive

To use hive-connector in hive, we should add the hive-connector jar into hive.

After downloading the code of iotdb from https://github.com/apache/incubator-iotdb, you can use the command of mvn clean package -pl hive-connector -am -Dmaven.test.skip=true to get a hive-connector-X.X.X-jar-with-dependencies.jar.

Then in hive, use the command of add jar XXX to add the dependency. For example:

hive> add jar /Users/hive/incubator-iotdb/hive-connector/target/hive-connector-0.10.0-jar-with-dependencies.jar;
Added [/Users/hive/incubator-iotdb/hive-connector/target/hive-connector-0.10.0-jar-with-dependencies.jar] to class path
Added resources: [/Users/hive/incubator-iotdb/hive-connector/target/hive-connector-0.10.0-jar-with-dependencies.jar]

Creating Tsfile-backed Hive tables

To create a Tsfile-backed table, specify the serde as org.apache.iotdb.hive.TsFileSerDe, specify the inputformat as org.apache.iotdb.hive.TSFHiveInputFormat, and the outputformat as org.apache.iotdb.hive.TSFHiveOutputFormat.

Also provide a schema which only contains two fields: time_stamp and sensor_id for the table. time_stamp is the time value of the time series and sensor_id is the name of the sensor you want to extract from the tsfile to hive such as sensor_1. The name of the table can be any valid tables names in hive.

Also provide a location from which hive-connector will pull the most current data for the table.

The location must be a specific directory, it can be on your local file system or HDFS if you have set up Hadoop. If it is in your local file system, the location should look like file:///data/data/sequence/root.baic2.WWS.leftfrontdoor/

At last, you should set the device_id in TBLPROPERTIES to the device name you want to analyze.

For example:

CREATE EXTERNAL TABLE IF NOT EXISTS only_sensor_1(
  time_stamp TIMESTAMP,
  sensor_1 BIGINT)
ROW FORMAT SERDE 'org.apache.iotdb.hive.TsFileSerDe'
STORED AS
  INPUTFORMAT 'org.apache.iotdb.hive.TSFHiveInputFormat'
  OUTPUTFORMAT 'org.apache.iotdb.hive.TSFHiveOutputFormat'
LOCATION '/data/data/sequence/root.baic2.WWS.leftfrontdoor/'
TBLPROPERTIES ('device_id'='root.baic2.WWS.leftfrontdoor.plc1');

In this example we’re pulling the data of root.baic2.WWS.leftfrontdoor.plc1.sensor_1 from the directory of /data/data/sequence/root.baic2.WWS.leftfrontdoor/. This table might result in a description as below:

hive> describe only_sensor_1;
OK
time_stamp              timestamp                  from deserializer
sensor_1                bigint                  from deserializer
Time taken: 0.053 seconds, Fetched: 2 row(s)

At this point, the Tsfile-backed table can be worked with in Hive like any other table.

Querying from Tsfile-backed Hive tables

Before we do any queries, we should set the hive.input.format in hive by executing the following command.

hive> set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;

Now, we already have an external table named only_sensor_1 in hive. We can use any query operations through HQL to analyse it.