Presto
This documentation is a guide for using Paimon in Presto.
Version
Paimon currently supports Presto 0.236 and above.
Preparing Paimon Jar File
Download from master: https://paimon.apache.org/docs/master/project/download/
You can also manually build a bundled jar from the source code.
To build from the source code, clone the git repository.
Build presto connector plugin with the following command.
mvn clean install -DskipTests
After the packaging is complete, you can choose the corresponding connector based on your own Presto version:
Version | Package |
---|---|
[0.236, 0.268) | ./paimon-presto-0.236/target/paimon-presto-0.236-0.9.0-plugin.tar.gz |
[0.268, 0.273) | ./paimon-presto-0.268/target/paimon-presto-0.268-0.9.0-plugin.tar.gz |
[0.273, latest] | ./paimon-presto-0.273/target/paimon-presto-0.273-0.9.0-plugin.tar.gz |
Of course, we also support different versions of Hive and Hadoop. But note that we utilize Presto-shaded versions of Hive and Hadoop packages to address dependency conflicts. You can check the following two links to select the appropriate versions of Hive and Hadoop:
Both Hive 2 and 3, as well as Hadoop 2 and 3, are supported.
For example, if your presto version is 0.274, hive and hadoop version is 2.x, you could run:
mvn clean install -DskipTests -am -pl paimon-presto-0.273 -Dpresto.version=0.274 -Dhadoop.apache2.version=2.7.4-9 -Dhive.apache.version=1.2.2-2
Tmp Dir
Paimon will unzip some jars to the tmp directory for codegen. By default, Presto will use '/tmp'
as the temporary directory, but '/tmp'
may be periodically deleted.
You can configure this environment variable when Presto starts:
-Djava.io.tmpdir=/path/to/other/tmpdir
Let Paimon use a secure temporary directory.
Configure Paimon Catalog
Install Paimon Connector
tar -zxf paimon-presto-${PRESTO_VERSION}/target/paimon-presto-${PRESTO_VERSION}-${PAIMON_VERSION}-plugin.tar.gz -C ${PRESTO_HOME}/plugin
Note that, the variable PRESTO_VERSION
is module name, must be one of 0.236, 0.268, 0.273.
Configuration
cd ${PRESTO_HOME}
mkdir -p etc/catalog
connector.name=paimon
# set your filesystem path, such as hdfs://namenode01:8020/path and s3://${YOUR_S3_BUCKET}/path
warehouse=${YOUR_FS_PATH}
If you are using HDFS FileSystem, you will also need to do one more thing: choose one of the following ways to configure your HDFS:
- set environment variable HADOOP_HOME.
- set environment variable HADOOP_CONF_DIR.
- configure
hadoop-conf-dir
in the properties.
If you are using S3 FileSystem, you need to add paimon-s3-${PAIMON_VERSION}.jar
in ${PRESTO_HOME}/plugin/paimon
and additionally configure the following properties in paimon.properties
:
s3.endpoint=${YOUR_ENDPOINTS}
s3.access-key=${YOUR_AK}
s3.secret-key=${YOUR_SK}
Query HiveCatalog table:
vim etc/catalog/paimon.properties
and set the following config:
connector.name=paimon
# set your filesystem path, such as hdfs://namenode01:8020/path and s3://${YOUR_S3_BUCKET}/path
warehouse=${YOUR_FS_PATH}
metastore=hive
uri=thrift://${YOUR_HIVE_METASTORE}:9083
Kerberos
You can configure kerberos keytab file when using KERBEROS authentication in the properties.
security.kerberos.login.principal=hadoop-user
security.kerberos.login.keytab=/etc/presto/hdfs.keytab
Keytab files must be distributed to every node in the cluster that runs Presto.
Create Schema
CREATE SCHEMA paimon.test_db;
Create Table
CREATE TABLE paimon.test_db.orders (
order_key bigint,
order_status varchar,
total_price decimal(18,4),
order_date date
)
WITH (
file_format = 'ORC',
primary_key = ARRAY['order_key','order_date'],
partitioned_by = ARRAY['order_date'],
bucket = '2',
bucket_key = 'order_key',
changelog_producer = 'input'
)
Add Column
CREATE TABLE paimon.test_db.orders (
order_key bigint,
orders_tatus varchar,
total_price decimal(18,4),
order_date date
)
WITH (
file_format = 'ORC',
primary_key = ARRAY['order_key','order_date'],
partitioned_by = ARRAY['order_date'],
bucket = '2',
bucket_key = 'order_key',
changelog_producer = 'input'
)
ALTER TABLE paimon.test_db.orders ADD COLUMN "shipping_address varchar;
Query
SELECT * FROM paimon.default.MyTable
Presto to Paimon type mapping
This section lists all supported type conversion between Presto and Paimon. All Presto’s data types are available in package com.facebook.presto.common.type
.
Presto Data Type | Paimon Data Type | Atomic Type |
---|---|---|
RowType | RowType | false |
MapType | MapType | false |
ArrayType | ArrayType | false |
BooleanType | BooleanType | true |
TinyintType | TinyIntType | true |
SmallintType | SmallIntType | true |
IntegerType | IntType | true |
BigintType | BigIntType | true |
RealType | FloatType | true |
DoubleType | DoubleType | true |
CharType(length) | CharType(length) | true |
VarCharType(VarCharType.MAX_LENGTH) | VarCharType(VarCharType.MAX_LENGTH) | true |
VarCharType(length) | VarCharType(length), length is less than VarCharType.MAX_LENGTH | true |
DateType | DateType | true |
TimestampType | TimestampType | true |
DecimalType(precision, scale) | DecimalType(precision, scale) | true |
VarBinaryType(length) | VarBinaryType(length) | true |
TimestampWithTimeZoneType | LocalZonedTimestampType | true |