Compression
Cassandra offers operators the ability to configure compression on a per-table basis. Compression reduces the size of data on disk by compressing the SSTable in user-configurable compression chunk_length_in_kb
. As Cassandra SSTables are immutable, the CPU cost of compressing is only necessary when the SSTable is written - subsequent updates to data will land in different SSTables, so Cassandra will not need to decompress, overwrite, and recompress data when UPDATE commands are issued. On reads, Cassandra will locate the relevant compressed chunks on disk, decompress the full chunk, and then proceed with the remainder of the read path (merging data from disks and memtables, read repair, and so on).
Compression algorithms typically trade off between the following three areas:
Compression speed: How fast does the compression algorithm compress data. This is critical in the flush and compaction paths because data must be compressed before it is written to disk.
Decompression speed: How fast does the compression algorithm de-compress data. This is critical in the read and compaction paths as data must be read off disk in a full chunk and decompressed before it can be returned.
Ratio: By what ratio is the uncompressed data reduced by. Cassandra typically measures this as the size of data on disk relative to the uncompressed size. For example a ratio of
0.5
means that the data on disk is 50% the size of the uncompressed data. Cassandra exposes this ratio per table as theSSTable Compression Ratio
field ofnodetool tablestats
.
Cassandra offers five compression algorithms by default that make different tradeoffs in these areas. While benchmarking compression algorithms depends on many factors (algorithm parameters such as compression level, the compressibility of the input data, underlying processor class, etc …), the following table should help you pick a starting point based on your application’s requirements with an extremely rough grading of the different choices by their performance in these areas (A is relatively good, F is relatively bad):
Compression Algorithm | Cassandra Class | Compression | Decompression | Ratio | C* Version |
---|---|---|---|---|---|
| A+ | A+ | C+ |
| |
| C+ | A+ | B+ |
| |
| A- | A- | A+ |
| |
| A- | A | C |
| |
| C | C | A |
|
Generally speaking for a performance critical (latency or throughput) application LZ4
is the right choice as it gets excellent ratio per CPU cycle spent. This is why it is the default choice in Cassandra.
For storage critical applications (disk footprint), however, Zstd
may be a better choice as it can get significant additional ratio to LZ4
.
Snappy
is kept for backwards compatibility and LZ4
will typically be preferable.
Deflate
is kept for backwards compatibility and Zstd
will typically be preferable.
Configuring Compression
Compression is configured on a per-table basis as an optional argument to CREATE TABLE
or ALTER TABLE
. Three options are available for all compressors:
class
(default:LZ4Compressor
): specifies the compression class to use. The two “fast” compressors areLZ4Compressor
andSnappyCompressor
and the two “good” ratio compressors areZstdCompressor
andDeflateCompressor
.chunk_length_in_kb
(default:16KiB
): specifies the number of kilobytes of data per compression chunk. The main tradeoff here is that larger chunk sizes give compression algorithms more context and improve their ratio, but require reads to deserialize and read more off disk.crc_check_chance
(default:1.0
): determines how likely Cassandra is to verify the checksum on each compression chunk during reads to protect against data corruption. Unless you have profiles indicating this is a performance problem it is highly encouraged not to turn this off as it is Cassandra’s only protection against bitrot.
The LZ4Compressor
supports the following additional options:
lz4_compressor_type
(defaultfast
): specifies if we should use thehigh
(a.k.aLZ4HC
) ratio version or thefast
(a.k.aLZ4
) version ofLZ4
. Thehigh
mode supports a configurable level, which can allow operators to tune the performance <→ ratio tradeoff via thelz4_high_compressor_level
option. Note that in4.0
and above it may be preferable to use theZstd
compressor.lz4_high_compressor_level
(default9
): A number between1
and17
inclusive that represents how much CPU time to spend trying to get more compression ratio. Generally lower levels are “faster” but they get less ratio and higher levels are slower but get more compression ratio.
The ZstdCompressor
supports the following options in addition:
compression_level
(default3
): A number between-131072
and22
inclusive that represents how much CPU time to spend trying to get more compression ratio. The lower the level, the faster the speed (at the cost of ratio). Values from 20 to 22 are called “ultra levels” and should be used with caution, as they require more memory. The default of3
is a good choice for competing withDeflate
ratios and1
is a good choice for competing withLZ4
.
Users can set compression using the following syntax:
CREATE TABLE keyspace.table (id int PRIMARY KEY)
WITH compression = {'class': 'LZ4Compressor'};
Or
ALTER TABLE keyspace.table
WITH compression = {'class': 'LZ4Compressor', 'chunk_length_in_kb': 64, 'crc_check_chance': 0.5};
Once enabled, compression can be disabled with ALTER TABLE
setting enabled
to false
:
ALTER TABLE keyspace.table
WITH compression = {'enabled':'false'};
Operators should be aware, however, that changing compression is not immediate. The data is compressed when the SSTable is written, and as SSTables are immutable, the compression will not be modified until the table is compacted. Upon issuing a change to the compression options via ALTER TABLE
, the existing SSTables will not be modified until they are compacted - if an operator needs compression changes to take effect immediately, the operator can trigger an SSTable rewrite using nodetool scrub
or nodetool upgradesstables -a
, both of which will rebuild the SSTables on disk, re-compressing the data in the process.
Benefits and Uses
Compression’s primary benefit is that it reduces the amount of data written to disk. Not only does the reduced size save in storage requirements, it often increases read and write throughput, as the CPU overhead of compressing data is faster than the time it would take to read or write the larger volume of uncompressed data from disk.
Compression is most useful in tables comprised of many rows, where the rows are similar in nature. Tables containing similar text columns (such as repeated JSON blobs) often compress very well. Tables containing data that has already been compressed or random data (e.g. benchmark datasets) do not typically compress well.
Operational Impact
Compression metadata is stored off-heap and scales with data on disk. This often requires 1-3GB of off-heap RAM per terabyte of data on disk, though the exact usage varies with
chunk_length_in_kb
and compression ratios.Streaming operations involve compressing and decompressing data on compressed tables - in some code paths (such as non-vnode bootstrap), the CPU overhead of compression can be a limiting factor.
To prevent slow compressors (
Zstd
,Deflate
,LZ4HC
) from blocking flushes for too long, all three flush with the default fastLZ4
compressor and then rely on normal compaction to re-compress the data into the desired compression strategy. See CASSANDRA-15379 issues.apache.org/jira/browse/CASSANDRA-15379 for more details.The compression path checksums data to ensure correctness - while the traditional Cassandra read path does not have a way to ensure correctness of data on disk, compressed tables allow the user to set
crc_check_chance
(a float from 0.0 to 1.0) to allow Cassandra to probabilistically validate chunks on read to verify bits on disk are not corrupt.
Advanced Use
Advanced users can provide their own compression class by implementing the interface at org.apache.cassandra.io.compress.ICompressor
.