Configure storage compression

Aerospike features lossless compression of records written to persistent storage. Aerospike supports three compression algorithms:

LZ4 - Aerospike uses a speed-optimized version (compression level 1) of the widely-adopted LZ4 compression algorithm. As of Aerospike Database 6.3, a dynamically configurable acceleration trades off between speed (CPU) and compression ratio (storage).
Snappy
Zstandard

Starting with Database 7.0, compression can be applied to records that are written to any storage engine - SSD, shared memory (RAM) or persistent memory (PMem). Prior to Database 7.0 compression wasn’t available for in-memory namespaces.

Compression details

Compression is a tradeoff between reduced data storage and improved transaction latency. Typically, write latency is affected more than read latency, because compression is more CPU-intensive than decompression.

An algorithm’s performance - speed as well as compression ratio - is highly specific to the data involved. When picking an algorithm, it is a good practice to evaluate each algorithm’s performance with records that are representative of real-world data.

A realistic way to evaluate the best algorithm to use is to configure different compression algorithms on multiple servers and then review the respective compression percentages.

Compression does not allow individual records to be larger than the configured write block size. It is the uncompressed record size that counts for the record size check. This ensures that a record always fits into a write block, even if it is later stored without compression. However, as long as each individual record’s uncompressed size is no larger than the write block size, multiple compressed records can be stored in one physical write block, up to its size.

Although it is possible to specify different compression settings for the same namespace on different nodes in a cluster, we do not recommend it.

Client-side compression

Aerospike clients support compression on the client side in addition to server-side compression. When clients are configured for client-side compression, the server decompresses bin data as it comes off the disk before recompressing to send to the client. Metadata is never compressed.

The server decompresses incoming data from client applications, allowing the server to perform operations on the data before it is written to disk.

Configuration

Storage compression is configured at the namespace level on each node with two configuration directives, compression and compression-level.

compression selects the compression algorithm. Valid parameters are:
- none
- lz4
- snappy
- zstd
If this directive is not specified, it defaults to none (no compression).
compression-level controls the trade-off between compression speed and compression ratio. It applies only to the zstd algorithm, and has no effect on the others.

The valid parameter values 1 through 9 are used as a scale. At one end of the scale, 1 selects a faster and less efficient compression. At the other end, 9 selects a slower and more efficient compression.

If you specify zstd with the compression directive but you do not set a compression level, it defaults to 9.

The configuration directives belong to a namespace’s storage-engine sub-context. For example:

namespace test {
    storage-engine device {
        device /dev/sda1
        compression zstd
        compression-level 1
    }
}

Both settings can also be configured dynamically with asadm.

The following example commands specify the zstd compression algorithm at compression level 1 for the test namespace:

asadm -e 'enable; manage config namespace test param compression to zstd'
asadm -e 'enable; manage config namespace test param compression-level to 1'

The server checks compression settings whenever a newly inserted or an updated record is compressed on its way to storage. Changing the compression settings doesn’t affect existing records in storage. It only affects records that are newly inserted or updated after the compression settings are changed. If you update compression from lz4 to snappy, existing records remain compressed with LZ4 until they are updated. The newly updated records are compressed according to the current settings.

If you change your compression settings from using a compression algorithm to not using any compression algorithm, existing records remain compressed until they are updated.

Namespace storage may contain a mix of uncompressed records and records compressed with any of the three compression algorithms. Decompression works independently from the compression setting, and always uses the same algorithm to decompress a record as the one used to compress it.

If the data in a record is such that compressing it would increase its size, the server stores it uncompressed.

When the server migrates records between nodes, the data remains in the same format as it is on disk. Migration cannot be used as a means of changing compression settings.

Changing compression settings

If you change the compression settings for a namespace, all existing records are unchanged until they are updated. You can change the compression level of existing records at the same time after changing the compression settings of a namespace in one of two ways:

Read each record in the cluster and write it back.
Create a scan and touch UDF to update all the records in a namespace without performing update operations.

Statistics

Aerospike reports the current compression ratio as part of the namespace statistics. For example:

asadm -e 'show statistics for namespace test like device_compression_ratio'

Sample output:

~~~~~~~~test Namespace Statistics (2021-10-22 22:18:16 UTC)~~~~~~~
Node                    |10.0.0.1:3000|10.0.0.1:3000|10.0.0.1:3000
device_compression_ratio|1.000        |1.000        |1.000

device_compression_ratio is the average compressed size : uncompressed size ratio. The value shown in the previous example, 1.000, means no compression at all. In contrast, 0.100 would indicate a ratio of 1 : 10, a size reduction of 90%.

device_compression_ratio is included in the namespace statistics unless compression is set to none.

The compression ratio is a moving average, based on the most recently written records.

If the written data - and how compressible it is - changes over time, the compression ratio changes with it. If a namespace’s data changes suddenly in terms of how compressible it is, the indicated compression ratio may lag behind. As a rule, you can assume that the compression ratio shown in the statistics covers the last 100,000 to 1,000,000 written records.

In particular, the compression ratio might not accurately reflect the compression ratio across all records in storage. The actual storage savings across all records might be higher or lower than the current compression ratio, which just covers the most recently written records.

When evaluating different compression settings for real-world data, proceed as follows:

Set compression and compression-level for a namespace.
Write 1,000,000 representative records.
Check the compression ratio shown in the namespace statistics.

Repeat these steps for all compression settings to be evaluated.