Manifest

Manifest List

  1. ├── manifest
  2. └── manifest-list-51c16f7b-421c-4bc0-80a0-17677f343358-1

Manifest List includes meta of several manifest files. Its name contains UUID, it is a avro file, the schema is:

  1. fileName: manifest file name.
  2. fileSize: manifest file size.
  3. numAddedFiles: number added files in manifest.
  4. numDeletedFiles: number deleted files in manifest.
  5. partitionStats: partition stats, the minimum and maximum values of partition fields in this manifest are beneficial for skipping certain manifest files during queries, it is a SimpleStats.
  6. schemaId: schema id when writing this manifest file.

Manifest

Manifest includes meta of several data files or changelog files or table-index files. Its name contains UUID, it is an avro file.

The changes of the file are saved in the manifest, and the file can be added or deleted. Manifests should be in an orderly manner, and the same file may be added or deleted multiple times. The last version should be read. This design can make commit lighter to support file deletion generated by compaction.

Data Manifest

Data Manifest includes meta of several data files or changelog files.

  1. ├── manifest
  2. └── manifest-6758823b-2010-4d06-aef0-3b1b597723d6-0

The schema is:

  1. kind: ADD or DELETE,
  2. partition: partition spec, a BinaryRow.
  3. bucket: bucket of this file.
  4. totalBuckets: total buckets when write this file, it is used for verification after bucket changes.
  5. file: data file meta.

The data file meta is:

  1. fileName: file name.
  2. fileSize: file size.
  3. rowCount: total number of rows (including add & delete) in this file.
  4. minKey: the minimum key of this file.
  5. maxKey: the maximum key of this file.
  6. keyStats: the statistics of the key.
  7. valueStats: the statistics of the value.
  8. minSequenceNumber: the minimum sequence number.
  9. maxSequenceNumber: the maximum sequence number.
  10. schemaId: schema id when write this file.
  11. level: level of this file, in LSM.
  12. extraFiles: extra files for this file, for example, data file index file.
  13. creationTime: creation time of this file.
  14. deleteRowCount: rowCount = addRowCount + deleteRowCount.
  15. embeddedIndex: if data file index is too small, store the index in manifest.

Index Manifest

Index Manifest includes meta of several table-index files.

  1. ├── manifest
  2. └── index-manifest-5d670043-da25-4265-9a26-e31affc98039-0

The schema is:

  1. kind: ADD or DELETE,
  2. partition: partition spec, a BinaryRow.
  3. bucket: bucket of this file.
  4. indexFile: index file meta.

The index file meta is:

  1. indexType: string, “HASH” or “DELETION_VECTORS”.
  2. fileName: file name.
  3. fileSize: file size.
  4. rowCount: total number of rows.
  5. deletionVectorsRanges: Metadata only used by “DELETION_VECTORS”, Stores offset and length of each data file, The schema is ARRAY<ROW<f0: STRING, f1: INT, f2: INT>>.