Deletion Vectors

Overview

The Deletion Vectors mode is designed to takes into account both data reading and writing efficiency.

In this mode, additional overhead (looking up LSM Tree and generating the corresponding Deletion File) will be introduced during writing, but during reading, data can be directly retrieved by employing data with deletion vectors, avoiding additional merge costs between different files.

Furthermore, data reading concurrency is no longer limited, and non-primary key columns can also be used for filter push down. Generally speaking, in this mode, we can get a huge improvement in read performance without losing too much write performance.

Deletion Vectors - 图1

Usage

By specifying 'deletion-vectors.enabled' = 'true', the Deletion Vectors mode can be enabled.

Limitation

  • changelog-producer needs to be none or lookup.
  • changelog-producer.lookup-wait can’t be false.
  • merge-engine can’t be first-row, because the read of first-row is already no merging, deletion vectors are not needed.
  • This mode will filter the data in level-0, so when using time travel to read APPEND snapshot, there will be data delay.