Trino
This documentation is a guide for using Paimon in Trino.
Version
Paimon currently supports Trino 420 and above.
Filesystem
From version 0.8, paimon share trino filesystem for all actions, which means, you should config trino filesystem before using trino-paimon. You can find information about how to config filesystems for trino on trino official website.
Preparing Paimon Jar File
Download from master: https://paimon.apache.org/docs/master/project/download/
You can also manually build a bundled jar from the source code. However, there are a few preliminary steps that need to be taken before compiling:
- To build from the source code, clone the git repository.
- Install JDK17 locally, and configure JDK17 as a global environment variable;
Then,you can build bundled jar with the following command:
mvn clean install -DskipTests
You can find Trino connector jar in ./paimon-trino-<trino-version>/target/paimon-trino-<trino-version>-0.9.0-plugin.tar.gz
.
We use hadoop-apache as a dependency for Hadoop, and the default Hadoop dependency typically supports both Hadoop 2 and Hadoop 3. If you encounter an unsupported scenario, you can specify the corresponding Apache Hadoop version.
For example, if you want to use Hadoop 3.3.5-1, you can use the following command to build the jar:
mvn clean install -DskipTests -Dhadoop.apache.version=3.3.5-1
Tmp Dir
Paimon will unzip some jars to the tmp directory for codegen. By default, Trino will use '/tmp'
as the temporary directory, but '/tmp'
may be periodically deleted.
You can configure this environment variable when Trino starts:
-Djava.io.tmpdir=/path/to/other/tmpdir
Let Paimon use a secure temporary directory.
Configure Paimon Catalog
Install Paimon Connector
tar -zxf paimon-trino-<trino-version>-0.9.0-plugin.tar.gz -C ${TRINO_HOME}/plugin
the variable trino-version
is module name, must be one of 420, 427.
NOTE: For JDK 17, when Deploying Trino, should add jvm options:
--add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED
Configure
Catalogs are registered by creating a catalog properties file in the etc/catalog directory. For example, create etc/catalog/paimon.properties with the following contents to mount the paimon connector as the paimon catalog:
connector.name=paimon
warehouse=file:/tmp/warehouse
If you are using HDFS, choose one of the following ways to configure your HDFS:
- set environment variable HADOOP_HOME.
- set environment variable HADOOP_CONF_DIR.
- configure
hadoop-conf-dir
in the properties.
If you are using a hadoop filesystem, you can still use trino-hdfs and trino-hive to config it. For example, if you use oss as a storage, you can write in paimon.properties
according to Trino Reference:
hive.config.resources=/path/to/core-site.xml
Then, config core-site.xml according to Jindo Reference
Kerberos
You can configure kerberos keytab file when using KERBEROS authentication in the properties.
security.kerberos.login.principal=hadoop-user
security.kerberos.login.keytab=/etc/trino/hdfs.keytab
Keytab files must be distributed to every node in the cluster that runs Trino.
Create Schema
CREATE SCHEMA paimon.test_db;
Create Table
CREATE TABLE paimon.test_db.orders (
order_key bigint,
orders_tatus varchar,
total_price decimal(18,4),
order_date date
)
WITH (
file_format = 'ORC',
primary_key = ARRAY['order_key','order_date'],
partitioned_by = ARRAY['order_date'],
bucket = '2',
bucket_key = 'order_key',
changelog_producer = 'input'
)
Add Column
CREATE TABLE paimon.test_db.orders (
order_key bigint,
orders_tatus varchar,
total_price decimal(18,4),
order_date date
)
WITH (
file_format = 'ORC',
primary_key = ARRAY['order_key','order_date'],
partitioned_by = ARRAY['order_date'],
bucket = '2',
bucket_key = 'order_key',
changelog_producer = 'input'
)
ALTER TABLE paimon.test_db.orders ADD COLUMN shipping_address varchar;
Query
SELECT * FROM paimon.test_db.orders
Query with Time Traveling
version >=420
-- read the snapshot from specified timestamp
SELECT * FROM t FOR TIMESTAMP AS OF TIMESTAMP '2023-01-01 00:00:00 Asia/Shanghai';
-- read the snapshot with id 1L (use snapshot id as version)
SELECT * FROM t FOR VERSION AS OF 1;
Trino to Paimon type mapping
This section lists all supported type conversion between Trino and Paimon. All Trino’s data types are available in package io.trino.spi.type
.
Trino Data Type | Paimon Data Type | Atomic Type |
---|---|---|
RowType | RowType | false |
MapType | MapType | false |
ArrayType | ArrayType | false |
BooleanType | BooleanType | true |
TinyintType | TinyIntType | true |
SmallintType | SmallIntType | true |
IntegerType | IntType | true |
BigintType | BigIntType | true |
RealType | FloatType | true |
DoubleType | DoubleType | true |
CharType(length) | CharType(length) | true |
VarCharType(VarCharType.MAX_LENGTH) | VarCharType(VarCharType.MAX_LENGTH) | true |
VarCharType(length) | VarCharType(length), length is less than VarCharType.MAX_LENGTH | true |
DateType | DateType | true |
TimestampType | TimestampType | true |
DecimalType(precision, scale) | DecimalType(precision, scale) | true |
VarBinaryType(length) | VarBinaryType(length) | true |
TimestampWithTimeZoneType | LocalZonedTimestampType | true |