S3

Download paimon-s3-0.9.0.jar.

Flink

If you have already configured s3 access through Flink (Via Flink FileSystem), here you can skip the following configuration.

Put paimon-s3-0.9.0.jar into lib directory of your Flink home, and create catalog:

  1. CREATE CATALOG my_catalog WITH (
  2. 'type' = 'paimon',
  3. 'warehouse' = 's3://<bucket>/<path>',
  4. 's3.endpoint' = 'your-endpoint-hostname',
  5. 's3.access-key' = 'xxx',
  6. 's3.secret-key' = 'yyy'
  7. );

Spark

If you have already configured s3 access through Spark (Via Hadoop FileSystem), here you can skip the following configuration.

Place paimon-s3-0.9.0.jar together with paimon-spark-0.9.0.jar under Spark’s jars directory, and start like

  1. spark-sql \
  2. --conf spark.sql.catalog.paimon=org.apache.paimon.spark.SparkCatalog \
  3. --conf spark.sql.catalog.paimon.warehouse=s3://<bucket>/<path> \
  4. --conf spark.sql.catalog.paimon.s3.endpoint=your-endpoint-hostname \
  5. --conf spark.sql.catalog.paimon.s3.access-key=xxx \
  6. --conf spark.sql.catalog.paimon.s3.secret-key=yyy

Hive

If you have already configured s3 access through Hive ((Via Hadoop FileSystem)), here you can skip the following configuration.

NOTE: You need to ensure that Hive metastore can access s3.

Place paimon-s3-0.9.0.jar together with paimon-hive-connector-0.9.0.jar under Hive’s auxlib directory, and start like

  1. SET paimon.s3.endpoint=your-endpoint-hostname;
  2. SET paimon.s3.access-key=xxx;
  3. SET paimon.s3.secret-key=yyy;

And read table from hive metastore, table can be created by Flink or Spark, see Catalog with Hive Metastore

  1. SELECT * FROM test_table;
  2. SELECT COUNT(1) FROM test_table;

Trino

Paimon use shared trino filesystem as basic read and write system.

Please refer to Trino S3 to config s3 filesystem in trino.

S3 Complaint Object Stores

The S3 Filesystem also support using S3 compliant object stores such as MinIO, Tencent’s COS and IBM’s Cloud Object Storage. Just configure your endpoint to the provider of the object store service.

  1. s3.endpoint: your-endpoint-hostname

Configure Path Style Access

Some S3 compliant object stores might not have virtual host style addressing enabled by default, for example when using Standalone MinIO for testing purpose. In such cases, you will have to provide the property to enable path style access.

  1. s3.path.style.access: true

S3A Performance

Tune Performance for S3AFileSystem.

If you encounter the following exception:

  1. Caused by: org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool.

Try to configure this in catalog options: fs.s3a.connection.maximum=1000.