DataFile

Partition

Consider a Partition table via Flink SQL:

  1. CREATE TABLE part_t (
  2. f0 INT,
  3. f1 STRING,
  4. dt STRING
  5. ) PARTITIONED BY (dt);
  6. INSERT INTO part_t VALUES (1, '11', '20240514');

The file system will be:

  1. part_t
  2. ├── dt=20240514
  3. └── bucket-0
  4. └── data-ca1c3c38-dc8d-4533-949b-82e195b41bd4-0.orc
  5. ├── manifest
  6. ├── manifest-08995fe5-c2ac-4f54-9a5f-d3af1fcde41d-0
  7. ├── manifest-list-51c16f7b-421c-4bc0-80a0-17677f343358-0
  8. └── manifest-list-51c16f7b-421c-4bc0-80a0-17677f343358-1
  9. ├── schema
  10. └── schema-0
  11. └── snapshot
  12. ├── EARLIEST
  13. ├── LATEST
  14. └── snapshot-1

Paimon adopts the same partitioning concept as Apache Hive to separate data. The files of the partition will be placed in a separate partition directory.

Bucket

The storage of all Paimon tables relies on buckets, and data files are stored in the bucket directory. The relationship between various table types and buckets in Paimon:

  1. Primary Key Table:
    1. bucket = -1: Default mode, the dynamic bucket mode records which bucket the key corresponds to through the index files. The index records the correspondence between the hash value of the primary-key and the bucket.
    2. bucket = 10: The data is distributed to the corresponding buckets according to the hash value of bucket key ( default is primary key).
  2. Append Table:
    1. bucket = -1: Default mode, ignoring bucket concept, although all data is written to bucket-0, the parallelism of reads and writes is unrestricted.
    2. bucket = 10: You need to define bucket-key too, the data is distributed to the corresponding buckets according to the hash value of bucket key.

Data File

The name of data file is data-${uuid}-${id}.${format}. For the append table, the file stores the data of the table without adding any new columns. But for the primary key table, each row of data stores additional system columns:

  1. _VALUE_KIND: row is deleted or added. Similar to RocksDB, each row of data can be deleted or added, which will be used for updating the primary key table.
  2. _SEQUENCE_NUMBER: this number is used for comparison during updates, determining which data came first and which data came later.
  3. _KEY_ prefix to key columns, this is to avoid conflicts with columns of the table.

Changelog File

Changelog file and Data file are exactly the same, it only takes effect on the primary key table. It is similar to the Binlog in a database, recording changes to the data in the table.