BlueStore Internals
Small write strategies
U: Uncompressed write of a complete, new blob.
write to new blob
kv commit
P: Uncompressed partial write to unused region of an existingblob.
write to unused chunk(s) of existing blob
kv commit
W: WAL overwrite: commit intent to overwrite, then overwriteasync. Must be chunk_size = MAX(block_size, csum_block_size)aligned.
kv commit
wal overwrite (chunk-aligned) of existing blob
N: Uncompressed partial write to a new blob. Initially sparselyutilized. Future writes will either be P or W.
write into a new (sparse) blob
kv commit
R+W: Read partial chunk, then to WAL overwrite.
read (out to chunk boundaries)
kv commit
wal overwrite (chunk-aligned) of existing blob
C: Compress data, write to new blob.
compress and write to new blob
kv commit
Possible future modes
F: Fragment lextent space by writing small piece of data into apiecemeal blob (that collects random, noncontiguous bits of data weneed to write).
write to a piecemeal blob (min_alloc_size or larger, but we use just one block of it)
kv commit
X: WAL read/modify/write on a single block (like legacybluestore). No checksum.
kv commit
wal read/modify/write
Mapping
This very roughly maps the type of write onto what we do when weencounter a given blob. In practice it’s a bit more complicated since theremight be several blobs to consider (e.g., we might be able to W into one orP into another), but it should communicate a rough idea of strategy.
raw | raw (cached) | csum (4 KB) | csum (16 KB) | comp (128 KB) | |
128+ KB (over)write | U | U | U | U | C |
64 KB (over)write | U | U | U | U | U or C |
4 KB overwrite | W | P | W | P | W | P | R+W | P | N (F?) |
100 byte overwrite | R+W | P | W | P | R+W | P | R+W | P | N (F?) |
100 byte append | R+W | P | W | P | R+W | P | R+W | P | N (F?) |
4 KB clone overwrite | P | N | P | N | P | N | P | N | N (F?) |
100 byte clone overwrite | P | N | P | N | P | N | P | N | N (F?) |