Skip to main content
Loading

Resilience

Overview

Aerospike optimizes writing to disk by grouping multiple record writes together. If a namespace is configured to store data on an SSD device, or is in-memory with persistence to a device or filesystem, the new version of the record is placed in a 8MiB streaming write buffer (SWB) pending a flush to the storage device. The record's metadata entry in the primary index is adjusted, updating its pointer to the new location of the record. Aerospike performs a copy-on-write for create/update/replace.

Flush size and cache sizes

note

In Database 7.1, write block was hard coded to 8MiB, and the write-block-size configuration parameter was removed. This section describes the new flushing mechanism using flush-size.

The flush-size configuration parameter defines the size in bytes of each I/O unit that is written to disk. A flush event happens either when the 8MiB SWB is full, or when the flush-max-ms period expires. At this point, the most recently written data is flushed from the SWB to disk in a series of flush-size units. These writes are appended to each other until the write block is full.

You can increase or decrease the flush size dynamically. The default value is 1MiB and the configured value of this parameter must be a power of 2. The options are: 4K, 8K, 16K, 32K, 64K, 128K, 256K, 512K, 1M, 2M, 4M, and 8M. In most direct-attached NVMe devices, the ideal size is 128K.

To identify the optimal settings, Aerospike recommends running a benchmark tool such as ACT. Enterprise licensees can contact Aerospike Support for guidance.

Each device associated with a namespace has a write queue, and a cache. The configuration max-write-cache controls the number of bytes of pending write blocks that the system is allowed to keep before failing writes, if the write queue can't immediately flush a streaming write buffer to a write block on the disk.

Writes throttling circuit-breaker

The size of the write cache is calculated using the number of devices in the namespace multiplied by the max-write-cache. This value is a baseline, not a limit. The system throttles various write types at specific thresholds past the baseline with Error Code 18: Device overload returned to the client when appropriate. Each threshold has its own "queue too deep" errors in the server logs. The 'max' number listed in the following example log messages assumes the example baseline is 512 write blocks.

  1. At baseline, the calculated write cache size, the master writes fail with an error message in the server logs - write fail: queue too deep: exceeds max 512.
  2. At baseline, the UDFs writes fail with an error message in the server logs - UDF fail: queue too deep: exceeds max 512. All UDF writes fail by design.
  3. At baseline, duplicate resolutions fail with an error message in the server logs - dup-res fail: queue too deep: exceeds max 512.
  4. At baseline + 32 write blocks, durable deletes fail with this error message in the server logs - durable delete fail: queue too deep: exceeds max 544.
  5. At baseline + 64 write blocks, immigration writes stop with this error message in the server logs - immigrate fail: queue too deep: exceeds max 576. This will cause retransmits until the write queue gets below the threshold.
  6. At baseline + 128 write blocks, defrag writes stop (changed in 5.7 from 100). The defrag process sleeps until the cache is back under the limit. There's no associated log message for this throttling.
  7. At baseline + 192 write blocks, replica writes stop with this error message in the server logs - replica write: queue too deep: exceeds max 704.

Introduced in v.5.7

Various write types are throttled at margins greater than the write cache baseline.

Introduced in v.5.1

Defrag writes are throttled when the write cache reaches 100 write blocks greater than the calculated write cache baseline. Throttling defrag does not affect migration and replica writes. Client writes are allowed until the sum of all in-use SWBs equals the number of devices multiplied by the configured value of max-write-cache. In systems prior to v.5.1 If any single device reaches the max-write-cache, all devices block client writes.