Hive

This documentation is a guide for using Paimon in Hive.

Version

Paimon currently supports Hive 3.1, 2.3, 2.2, 2.1 and 2.1-cdh-6.3.

Execution Engine

Paimon currently supports MR and Tez execution engine for Hive Read, and MR execution engine for Hive Write. Note If you use beeline, please restart the hive cluster.

Installation

Download the jar file with corresponding version.

Jar
Hive 3.1paimon-hive-connector-3.1-0.9.0.jar
Hive 2.3paimon-hive-connector-2.3-0.9.0.jar
Hive 2.2paimon-hive-connector-2.2-0.9.0.jar
Hive 2.1paimon-hive-connector-2.1-0.9.0.jar
Hive 2.1-cdh-6.3paimon-hive-connector-2.1-cdh-6.3-0.9.0.jar

You can also manually build bundled jar from the source code.

To build from source code, clone the git repository.

Build bundled jar with the following command. mvn clean install -DskipTests

You can find Hive connector jar in ./paimon-hive/paimon-hive-connector-<hive-version>/target/paimon-hive-connector-<hive-version>-0.9.0.jar.

There are several ways to add this jar to Hive.

  • You can create an auxlib folder under the root directory of Hive, and copy paimon-hive-connector-0.9.0.jar into auxlib.
  • You can also copy this jar to a path accessible by Hive, then use add jar /path/to/paimon-hive-connector-0.9.0.jar to enable paimon support in Hive. Note that this method is not recommended. If you’re using the MR execution engine and running a join statement, you may be faced with the exception org.apache.hive.com.esotericsoftware.kryo.kryoexception: unable to find class.

NOTE:

  • If you are using HDFS :
    • Make sure that the environment variable HADOOP_HOME or HADOOP_CONF_DIR is set.
    • You can set paimon.hadoop-load-default-config =false to disable loading the default value from core-default.xmlhdfs-default.xml, which may lead smaller size for split.
  • With hive cbo, it may lead to some incorrect query results, such as to query struct type with not null predicate, you can disable the cbo by set hive.cbo.enable=false; command.

Hive SQL: access Paimon Tables already in Hive metastore

Run the following Hive SQL in Hive CLI to access the created table.

  1. -- Assume that paimon-hive-connector-<hive-version>-0.9.0.jar is already in auxlib directory.
  2. -- List tables in Hive
  3. -- (you might need to switch to "default" database if you're not there by default)
  4. SHOW TABLES;
  5. /*
  6. OK
  7. test_table
  8. */
  9. -- Read records from test_table
  10. SELECT a, b FROM test_table ORDER BY a;
  11. /*
  12. OK
  13. 1 Table
  14. 2 Store
  15. */
  16. -- Insert records into test table
  17. -- Limitations:
  18. -- Only support INSERT INTO, not support INSERT OVERWRITE
  19. -- It is recommended to write to a non primary key table
  20. -- Writing to a primary key table may result in a large number of small files
  21. INSERT INTO test_table VALUES (3, 'Paimon');
  22. SELECT a, b FROM test_table ORDER BY a;
  23. /*
  24. OK
  25. 1 Table
  26. 2 Store
  27. 3 Paimon
  28. */
  29. -- time travel
  30. SET paimon.scan.snapshot-id=1;
  31. SELECT a, b FROM test_table ORDER BY a;
  32. /*
  33. OK
  34. 1 Table
  35. 2 Store
  36. 3 Paimon
  37. */
  38. SET paimon.scan.snapshot-id=null;

Hive SQL: create new Paimon Tables

You can create new paimon tables in Hive. Run the following Hive SQL in Hive CLI.

  1. -- Assume that paimon-hive-connector-0.9.0.jar is already in auxlib directory.
  2. -- Let's create a new paimon table.
  3. SET hive.metastore.warehouse.dir=warehouse_path;
  4. CREATE TABLE hive_test_table(
  5. a INT COMMENT 'The a field',
  6. b STRING COMMENT 'The b field'
  7. )
  8. STORED BY 'org.apache.paimon.hive.PaimonStorageHandler';

Hive SQL: access Paimon Tables by External Table

To access existing paimon table, you can also register them as external tables in Hive. Run the following Hive SQL in Hive CLI.

  1. -- Assume that paimon-hive-connector-0.9.0.jar is already in auxlib directory.
  2. -- Let's use the test_table created in the above section.
  3. -- To create an external table, you don't need to specify any column or table properties.
  4. -- Pointing the location to the path of table is enough.
  5. CREATE EXTERNAL TABLE external_test_table
  6. STORED BY 'org.apache.paimon.hive.PaimonStorageHandler'
  7. LOCATION '/path/to/table/store/warehouse/default.db/test_table';
  8. -- In addition to the way setting location above, you can also place the location setting in TBProperties
  9. -- to avoid Hive accessing Paimon's location through its own file system when creating tables.
  10. -- This method is effective in scenarios using Object storage,such as s3.
  11. CREATE EXTERNAL TABLE external_test_table
  12. STORED BY 'org.apache.paimon.hive.PaimonStorageHandler'
  13. TBLPROPERTIES (
  14. 'paimon_location' ='s3://xxxxx/path/to/table/store/warehouse/default.db/test_table'
  15. );
  16. -- Read records from external_test_table
  17. SELECT a, b FROM external_test_table ORDER BY a;
  18. /*
  19. OK
  20. 1 Table
  21. 2 Store
  22. */
  23. -- Insert records into test table
  24. INSERT INTO external_test_table VALUES (3, 'Paimon');
  25. SELECT a, b FROM external_test_table ORDER BY a;
  26. /*
  27. OK
  28. 1 Table
  29. 2 Store
  30. 3 Paimon
  31. */

Hive Type Conversion

This section lists all supported type conversion between Hive and Paimon. All Hive’s data types are available in package org.apache.hadoop.hive.serde2.typeinfo.

Hive Data TypePaimon Data TypeAtomic Type
StructTypeInfoRowTypefalse
MapTypeInfoMapTypefalse
ListTypeInfoArrayTypefalse
PrimitiveTypeInfo(“boolean”)BooleanTypetrue
PrimitiveTypeInfo(“tinyint”)TinyIntTypetrue
PrimitiveTypeInfo(“smallint”)SmallIntTypetrue
PrimitiveTypeInfo(“int”)IntTypetrue
PrimitiveTypeInfo(“bigint”)BigIntTypetrue
PrimitiveTypeInfo(“float”)FloatTypetrue
PrimitiveTypeInfo(“double”)DoubleTypetrue
CharTypeInfo(length)CharType(length)true
PrimitiveTypeInfo(“string”)VarCharType(VarCharType.MAX_LENGTH)true
VarcharTypeInfo(length)VarCharType(length), length is less than VarCharType.MAX_LENGTHtrue
PrimitiveTypeInfo(“date”)DateTypetrue
PrimitiveTypeInfo(“timestamp”)TimestampTypetrue
DecimalTypeInfo(precision, scale)DecimalType(precision, scale)true
PrimitiveTypeInfo(“binary”)VarBinaryType, BinaryTypetrue