Spec Overview

This is the specification for the Paimon table format, this document standardizes the underlying file structure and design of Paimon.

Overview - 图1

Terms

  • Schema: fields, primary keys definition, partition keys definition and options.
  • Snapshot: the entrance to all data committed at some specific time point.
  • Manifest list: includes several manifest files.
  • Manifest: includes several data files or changelog files.
  • Data File: contains incremental records.
  • Changelog File: contains records produced by changelog-producer.
  • Global Index: index for a bucket or partition.
  • Data File Index: index for a data file.

Run Flink SQL with Paimon:

  1. CREATE CATALOG my_catalog WITH (
  2. 'type' = 'paimon',
  3. 'warehouse' = '/your/path'
  4. );
  5. USE CATALOG my_catalog;
  6. CREATE TABLE my_table (
  7. k INT PRIMARY KEY NOT ENFORCED,
  8. f0 INT,
  9. f1 STRING
  10. );
  11. INSERT INTO my_table VALUES (1, 11, '111');

Take a look to the disk:

  1. warehouse
  2. └── default.db
  3. └── my_table
  4. ├── bucket-0
  5. └── data-59f60cb9-44af-48cc-b5ad-59e85c663c8f-0.orc
  6. ├── index
  7. └── index-5625e6d9-dd44-403b-a738-2b6ea92e20f1-0
  8. ├── manifest
  9. ├── index-manifest-5d670043-da25-4265-9a26-e31affc98039-0
  10. ├── manifest-6758823b-2010-4d06-aef0-3b1b597723d6-0
  11. ├── manifest-list-9f856d52-5b33-4c10-8933-a0eddfaa25bf-0
  12. └── manifest-list-9f856d52-5b33-4c10-8933-a0eddfaa25bf-1
  13. ├── schema
  14. └── schema-0
  15. └── snapshot
  16. ├── EARLIEST
  17. ├── LATEST
  18. └── snapshot-1