Presto

This documentation is a guide for using Paimon in Presto.

Version

Paimon currently supports Presto 0.236 and above.

Preparing Paimon Jar File

Download from master: https://paimon.apache.org/docs/master/project/download/

You can also manually build a bundled jar from the source code.

To build from the source code, clone the git repository.

Build presto connector plugin with the following command.

  1. mvn clean install -DskipTests

After the packaging is complete, you can choose the corresponding connector based on your own Presto version:

VersionPackage
[0.236, 0.268)./paimon-presto-0.236/target/paimon-presto-0.236-0.9.0-plugin.tar.gz
[0.268, 0.273)./paimon-presto-0.268/target/paimon-presto-0.268-0.9.0-plugin.tar.gz
[0.273, latest]./paimon-presto-0.273/target/paimon-presto-0.273-0.9.0-plugin.tar.gz

Of course, we also support different versions of Hive and Hadoop. But note that we utilize Presto-shaded versions of Hive and Hadoop packages to address dependency conflicts. You can check the following two links to select the appropriate versions of Hive and Hadoop:

hadoop-apache2

hive-apache

Both Hive 2 and 3, as well as Hadoop 2 and 3, are supported.

For example, if your presto version is 0.274, hive and hadoop version is 2.x, you could run:

  1. mvn clean install -DskipTests -am -pl paimon-presto-0.273 -Dpresto.version=0.274 -Dhadoop.apache2.version=2.7.4-9 -Dhive.apache.version=1.2.2-2

Tmp Dir

Paimon will unzip some jars to the tmp directory for codegen. By default, Presto will use '/tmp' as the temporary directory, but '/tmp' may be periodically deleted.

You can configure this environment variable when Presto starts:

  1. -Djava.io.tmpdir=/path/to/other/tmpdir

Let Paimon use a secure temporary directory.

Configure Paimon Catalog

Install Paimon Connector

  1. tar -zxf paimon-presto-${PRESTO_VERSION}/target/paimon-presto-${PRESTO_VERSION}-${PAIMON_VERSION}-plugin.tar.gz -C ${PRESTO_HOME}/plugin

Note that, the variable PRESTO_VERSION is module name, must be one of 0.236, 0.268, 0.273.

Configuration

  1. cd ${PRESTO_HOME}
  2. mkdir -p etc/catalog
  1. connector.name=paimon
  2. # set your filesystem path, such as hdfs://namenode01:8020/path and s3://${YOUR_S3_BUCKET}/path
  3. warehouse=${YOUR_FS_PATH}

If you are using HDFS FileSystem, you will also need to do one more thing: choose one of the following ways to configure your HDFS:

  • set environment variable HADOOP_HOME.
  • set environment variable HADOOP_CONF_DIR.
  • configure hadoop-conf-dir in the properties.

If you are using S3 FileSystem, you need to add paimon-s3-${PAIMON_VERSION}.jar in ${PRESTO_HOME}/plugin/paimon and additionally configure the following properties in paimon.properties:

  1. s3.endpoint=${YOUR_ENDPOINTS}
  2. s3.access-key=${YOUR_AK}
  3. s3.secret-key=${YOUR_SK}

Query HiveCatalog table:

  1. vim etc/catalog/paimon.properties

and set the following config:

  1. connector.name=paimon
  2. # set your filesystem path, such as hdfs://namenode01:8020/path and s3://${YOUR_S3_BUCKET}/path
  3. warehouse=${YOUR_FS_PATH}
  4. metastore=hive
  5. uri=thrift://${YOUR_HIVE_METASTORE}:9083

Kerberos

You can configure kerberos keytab file when using KERBEROS authentication in the properties.

  1. security.kerberos.login.principal=hadoop-user
  2. security.kerberos.login.keytab=/etc/presto/hdfs.keytab

Keytab files must be distributed to every node in the cluster that runs Presto.

Create Schema

  1. CREATE SCHEMA paimon.test_db;

Create Table

  1. CREATE TABLE paimon.test_db.orders (
  2. order_key bigint,
  3. order_status varchar,
  4. total_price decimal(18,4),
  5. order_date date
  6. )
  7. WITH (
  8. file_format = 'ORC',
  9. primary_key = ARRAY['order_key','order_date'],
  10. partitioned_by = ARRAY['order_date'],
  11. bucket = '2',
  12. bucket_key = 'order_key',
  13. changelog_producer = 'input'
  14. )

Add Column

  1. CREATE TABLE paimon.test_db.orders (
  2. order_key bigint,
  3. orders_tatus varchar,
  4. total_price decimal(18,4),
  5. order_date date
  6. )
  7. WITH (
  8. file_format = 'ORC',
  9. primary_key = ARRAY['order_key','order_date'],
  10. partitioned_by = ARRAY['order_date'],
  11. bucket = '2',
  12. bucket_key = 'order_key',
  13. changelog_producer = 'input'
  14. )
  15. ALTER TABLE paimon.test_db.orders ADD COLUMN "shipping_address varchar;

Query

  1. SELECT * FROM paimon.default.MyTable

Presto to Paimon type mapping

This section lists all supported type conversion between Presto and Paimon. All Presto’s data types are available in package com.facebook.presto.common.type.

Presto Data TypePaimon Data TypeAtomic Type
RowTypeRowTypefalse
MapTypeMapTypefalse
ArrayTypeArrayTypefalse
BooleanTypeBooleanTypetrue
TinyintTypeTinyIntTypetrue
SmallintTypeSmallIntTypetrue
IntegerTypeIntTypetrue
BigintTypeBigIntTypetrue
RealTypeFloatTypetrue
DoubleTypeDoubleTypetrue
CharType(length)CharType(length)true
VarCharType(VarCharType.MAX_LENGTH)VarCharType(VarCharType.MAX_LENGTH)true
VarCharType(length)VarCharType(length), length is less than VarCharType.MAX_LENGTHtrue
DateTypeDateTypetrue
TimestampTypeTimestampTypetrue
DecimalType(precision, scale)DecimalType(precision, scale)true
VarBinaryType(length)VarBinaryType(length)true
TimestampWithTimeZoneTypeLocalZonedTimestampTypetrue