Database ( latest 8.1.0 )

/Manage/Namespace/Data retention

Configuring namespace data retention

This page describes how to configure the behavior of an Aerospike namespace during storage and memory pressure, as well as describing data retention and reclamation.

Expirations, evictions, and TTL

Aerospike records include void-time metadata - the timestamp when they automatically expire. Records with a void-time of 0 do not expire or get evicted.

An expired record does not exist from the client’s point of view, but its metadata still occupies a 64 byte slot in the primary index until it gets cleaned up by the namespace supervisor (NSUP).
Eviction is an early expiration process. When namespace storage exceeds a configurable high-water mark, NSUP deletes records with non-zero void-times that are nearest to their expiration. Eviction continues until sufficient space has been recovered.
Applications optionally send a time to live (TTL) with record write commands, which declares the number of seconds from now until the record expires. By default, namespaces reject writes with a TTL, and NSUP does not run, but this behavior is configurable. If client writes are allowed to send a TTL, the record’s void-time is set according to client TTL values. In this case, you should also configure NSUP to run and check for expired records.
Starting with Aerospike Database 7.1.0, read commands may extend the TTL of a record if the remaining TTL is within a specified percent from its original TTL. This least-recently used (LRU) eviction behavior is configurable at the namespace or set-level. The client may override the server configuration and provide an explicit read-touch policy based on the percent to the record’s original TTL. For example, in the Java client 8.1.0 the Policy.readTouchTtlPercent was added.

The following parameters are configurable per namespace, and focus on TTL assignment and enforcement.

Configuration parameter	Description	Default value	Notes
`nsup-period`	Time interval in seconds between successive starts of NSUP scans of the primary index for expired records. If NSUP takes longer to traverse the primary index than the `nsup-period`, it will effectively be running continuously.	0	By default, NSUP does not run (`nsup-period` of 0). Once NSUP is running, it does not stop until it has traversed the entire primary index of the namespace. When `nsup-period` is 0, to allow writes that have a positive integer TTL, you must set `allow-ttl-without-nsup` to `true` (see below). Regardless of the setting of `nsup-period`, writes with a non-positive TTL (<= 0) are always allowed.
`nsup-threads`	Controls how many threads NSUP uses to traverse the primary index for expired records.	1	Can be increased to shorten the NSUP cycle and spread the load across multiple cores.
`allow-ttl-without-nsup`	Parameter for testing only. Measures the impact of NSUP when running in a use case where TTL is non-0. Allows records to be written to the namespace with positive integer TTLs, even if NSUP is disabled.	false	Warning: Records that have a TTL when NSUP is not running.
`default-ttl`	Default TTL value to use for the namespace, whenever a client writes a record with a TTL of 0.	0	If the value of `default-ttl` is non-0 and `nsup-period` is 0 (its default), the Aerospike server will not start. A `default-ttl` of 0 sets a void-time of 0. Aerospike never expires or evicts records that have a void-time of 0. `default-ttl` cannot be set higher than 10 years (3650D).
`default-read-touch-ttl-pct`	For the namespace configuration, 0 means that read operations never touch a record.	0	For the namespace configuration, 0 means read operations never extend a record’s TTL. Values 1-100 specify a percentage of the record’s most recent TTL (void-time - last-update-time), so that a read within this interval of the record’s end of life will generate a touch. The touch uses the record’s previous TTL to extend the record’s life. For the set configuration, 0 means use the namespace value. A set-level configuration can explicitly override the default namespace value: -1 means reads never touch a record. Values 1-100 are the same as the namespace configuration. Clients may also send a read-touch TTL percent: 0 instructs the server to use its configuration. Other values override the server configuration. -1 means reads never touch a record, values 1-100 are the same as the server configurations.
`apply-ttl-reduction`	Controls whether a reduction of a record’s TTL is applied or ignored.	true	See Avoiding zombie records.

Client TTL values

Aerospike clients can send a TTL value with write and a read-touch value with read commands. The server uses the TTL value to determine the record’s void-time, which is the timestamp when the record expires.

Warning: records that have a TTL when NSUP is not running

The following side effects occur when NSUP is not running:

Trying to write a record with a TTL gets rejected with error code 22 (AS_ERR_FORBIDDEN).
Expired records do not get removed from the indexes and storage.
- An attempt to read an expired record returns error code 2 (AS_ERR_NOT_FOUND).
- An attempt to update an expired record creates a new record with its metadata in the same primary index slot. The previous version of the record is ignored, and the generation count is reset to 1.
- Deleting an expired record removes it from the index. The defragmentation process will eventually mark it as ready to be overwritten.

Write commands

When a namespace is configured to allow writes with a TTL, a client may send a positive TTL value, which determines how many seconds the record has until it expires. The void-time is set to the timestamp of now plus the TTL.

Additionally, there are three special TTL values that can always be used, regardless of namespace configuration:

A TTL of 0 instructs the server to use the default-ttl of the namespace when setting the void-time.
A TTL of -1 sets the record’s void-time to 0, which means that the record will not expire.
A TTL of -2 instructs the server not to modify the void-time if the write is an update operation. If the write creates a new record, the default-ttl determines the void-time.

Read commands

Aerospike Database 7.1.0 introduced an LRU eviction behavior. Client versions implementing this functionality add the ability to control if reads extend record void-time, regardless of namespace configuration, with the Policy.readTouchTtlPercent:

A value of 0 instructs the server to use the default-read-touch-ttl-pct of the namespace or set.
A value of -1 states that this read operation will never modify the record’s TTL.
A value of 1-100 describes that this read should also touch the record (extending its TTL) if the record’s void-time is within this percentage.

Avoiding zombie records

Whenever a record is updated in Aerospike, its primary index metadata entry is adjusted to point at the latest version of the record. Defragmentation compacts live record versions (pointed to from the primary index) into new write-blocks and then puts the old write-blocks on the free queue, where they await being overwritten by new record writes. Aerospike prioritizes free defragged write-blocks to be overwritten.

During a cold restart, the primary index is discarded and regenerated by reading data from the namespace storage devices. Some record removal operations might lead to deleted records being resurrected due to the presence of older versions, which haven’t yet been overwritten by defragmentation.

Following are methods for deleting records, with descriptions of how to use them and avoid the risk of them being resurrected by a cold restart.

Record removal by durable delete

A durable delete command from the client creates a tombstone, which prevents older versions from being resurrected during a cold restart. A record’s tombstone is reclaimed by the tomb-raider process once all older versions of the record are purged by defragmentation.

A tombstone always has a void-time of 0, regardless of the record’s previous void-time.

Record removal by expunge

An expunge is a delete command sent from the client without the durable delete write policy set to true. This kind of delete removes the primary index entry of the record. While it is fast, it leaves no tombstone. As a result, any past version of the record with a future void-time that happens to still be on device might be placed back into the primary index on cold restart.

Using durable deletes avoids this situation, as the tombstone is always the newest version with a void-time of 0. By default a namespace configured with strong consistency (SC mode) has its strong-consistency-allow-expunge configuration parameter set to false, meaning that expunge commands are refused. An AP mode namespace allows expunges by default, but this can be blocked using the disallow-expunge configuration parameter.

Record removal by expiration and eviction

Expiration and eviction do not durably delete records they remove. The ability to use TTLs in a namespace is off by default, and controlled by the pair of configuration parameters allow-ttl-without-nsup (default: false), and nsup-period (default: 0). If a record doesn’t have a TTL with a void-time of 0, it is neither evicted nor expires; it can be removed with a durable delete.

When expiration and eviction are enabled for a namespace, and records are given TTL values greater than zero, we expect older versions of records to be ignored by a cold restart because their void-time is in the past at the time of a cold restart. The record was expired previously for being past its void-time, and it remains too old to be included in the primary index.

One situation where this might not be the case is if a record’s void-time was reduced by subsequent versions. In this case, an older version might remain in storage with a void-time farther out than that of the record when it expired and was removed by NSUP. apply-ttl-reduction (default: true) namespace configuration parameter can be used to prevent this situation, by setting it to false. Using this control, TTLs reducing the void-time will be ignored, preventing older versions of the record from having a valid (future) void-time during a cold restart.

Bringing an old node back into the cluster after record removal

Warm restarts do not scan the namespace storage devices. They reattach to indexes (typically stored in shared memory) after a clean shutdown. However, there is a scenario where a deleted record is resurrected after a warm restart.

If a node was out of the cluster for an extended period of time longer than the tomb-raider-period, a record that was durably deleted in the interim, with all its prior versions overwritten by defragmentation and its tombstone reclaimed by the tomb-raider, might come back due to the node returning with an earlier version of it present in the primary index. An expunge will automatically open up the cluster to the risk of a zombie record if a node was out of the cluster and returned with an older version, having missed the (not durable) delete command. Once again, durable deletes should be used to reduce this window. Nodes that were offline for a longer period of time than the minimum lifespan of a tombstone should be brought back empty (without storage data) - either by initializing the namespace storage devices, or forcing a cold restart with a cold-start-empty configuration.

Truncation

In Aerospike Database Enterprise Edition (EE) truncating a set or namespace is equivalent to durably deleting all of the records within. The truncate command is logged in the cluster’s shared metadata (SMD) and persists for all nodes within the cluster, as well as communicated to any new node joining it.

Eviction and stop-writes configuration parameters

The namespace relies on evictions and stop-writes to mitigate situations where data accumulates in namespace storage and indexes faster than it is deleted. The stop-writes mechanism prevents new client writes from filling namespace storage beyond designated thresholds.

Database 7.0.0 and later

Starting with Database 7.0.0 the configuration parameters apply to any storage-engine, which share the same write-block based storage format.

Configuration parameter	Description	Default value	Notes
`evict-indexes-memory-pct`	Eviction threshold defined as the percentage of the `indexes-memory-budget`, which is the stop-writes threshold for the namespace, based on total memory used for indexes (primary, secondary and set indexes).	0	Default value of 0 disables this threshold.
`evict-used-pct`	Eviction threshold defined as the ratio of used storage (`data_used_bytes`) to the total namespace storage capacity (`data_total_bytes`).	0	Default value of 0 disables this threshold.
`evict-mounts-pct`	Eviction threshold defined as the ratio of index mount utilization (`index_mounts_used_pct` or `sindex_mounts_used_pct`) to the namespace `mounts-size-limit`.	0	Only applies when the namespace primary or secondary indexes are configured to be stored in flash or persistent memory.
`indexes-memory-budget`	Maximum memory budget (in bytes) used by the namespace for indexing (primary, secondary and set indexes). Acts as a stop-writes threshold.	0	Deletions, replica writes, and migration writes are still allowed when the namespace is in stop-writes mode.
`stop-writes-sys-memory-pct`	Percentage threshold at which client writes are refused, defined as the ratio of total memory usage (across all applications) to the system memory.	90	Deletions, replica writes, and migration writes are still allowed when the namespace is in stop-writes mode.
`stop-writes-avail-pct`	Stops client writes when the namespace storage engine has its reserve of write-blocks drop under a minimum, defined as the ratio of free write-blocks to the storage engine capacity.	5	Deletions, replica writes, and migration writes are still allowed when the namespace is in stop-writes mode.
`stop-writes-used-pct`	Stops client writes when the ratio of used storage space to total storage space (in bytes) exceeds the given max percentage.	70	Deletions, replica writes, and migration writes are still allowed when the namespace is in stop-writes mode.
`evict-tenths-pct`	Specifies the fraction of evictable records to delete per round of eviction.	5	The default value of 5 means delete 0.5% of evictable records in eviction round until the node drops below the eviction threshold.

The following configuration file snippet shows an example of namespace data retention with TTLs used and eviction enabled.

namespace NSNAME {
    stop-writes-sys-memory-pct 90 # Stop-writes threshold based on memory usage
                                  # across the host machine
    storage-engine device {
        device /dev/nvme0n1p1
        device /dev/nvme0n1p2
        stop-writes-avail-pct 5   # stop-writes threshold as a percentage
                                  # of the total device size.
        stop-writes-used-pct 70   # stop-writes threshold as a percentage
                                  # of the total device size.
        evict-used-pct 60         # eviction threshold as a percentage
                                  # of the total device size.
    }

    index-type flash {            # Primary index on flash (AKA All Flash)
        mounts-budget 64G
        evict-mounts-pct 80       # eviction threshold based on the primary index
                                  # mounts budget
    }
}

Prior to Database 7.0.0

Configuration parameter	Description	Default value	Notes
`high-water-disk-pct`	Percentage threshold at which the eviction process starts, defined as the ratio of namespace disk consumption to its device storage capacity.	0	Default value of 0 disables the threshold.
`high-water-memory-pct`	Percentage threshold at which the eviction process starts, defined as the ratio of namespace memory consumption to its `memory-size`.	0	Default value of 0 disables the threshold.
`mounts-high-water-pct`	Percentage threshold at which the eviction process starts, defined as the ratio of index mount utilization (`index_flash_used_pct` or `index_pmem_used_pct`) to the namespace `mounts-size-limit`.	0	Only applies when the namespace primary index is configured to be stored in flash or persistent memory.
`stop-writes-pct`	Percentage threshold at which client writes are refused, defined as the ratio of namespace memory consumption to its `memory-size`.	90	Deletions, replica writes, and migration writes are still allowed when the namespace is in stop-writes mode.
`stop-writes-sys-memory-pct`	Percentage threshold at which client writes are refused, defined as the ratio of total memory usage (across all applications) to the system memory.	90	Deletions, replica writes, and migration writes are still allowed when the namespace is in stop-writes mode.
`min-avail-pct`	Stops client writes when any namespace storage device (SSD, PMem or shared memory stripe) has its reserve of write blocks drop under a minimum, defined as the ratio of free write blocks to the device storage capacity.	5	Deletions, replica writes, and migration writes are still allowed when the namespace is in stop-writes mode.
`max-used-pct`	Stops client writes when the ratio of used storage space to total storage space (in bytes) exceeds the given max percentage.	70	Deletions, replica writes, and migration writes are still allowed when the namespace is in stop-writes mode.

The following configuration file snippet shows an example of namespace data retention with TTLs used and eviction enabled.

namespace NSNAME {
    stop-writes-sys-memory-pct 90 # Stop-writes threshold based on memory usage
                                  # across the host machine
    memory-size 256G
    stop-writes-pct 90            # Stop-writes threshold based on namespace memory-size

    storage-engine device {
        device /dev/nvme0n1p1
        device /dev/nvme0n1p2
        min-avail-pct 5           # Stop-writes threshold as a percentage
                                  # of the total device size.
        max-used-pct 70           # stop-writes threshold as a percentage
                                  # of the total device size.
    }

    high-water-disk-pct 60        # Eviction threshold based on namespace device-size
    high-water-memory-pct 70      # Eviction threshold based on namespace memory-size

    index-type flash {            # Primary index on flash (AKA All Flash)
        mounts-size-limit 64G
        mounts-high-water-pct 80  # Eviction threshold based on mounts-size-limit
    }
}

Managing the namespace supervisor (NSUP)

The Namespace Supervisor (NSUP) removes expired records from the primary index. Any past generation version of a record that NSUP leaves behind is cleaned up by the continuous defragmentation process.

nsup-period controls how often NSUP runs. The default value is 0, which means that NSUP does not run.
nsup-threads controls how many threads NSUP uses to scan the primary index for expired records. The default value is 1.
allow-ttl-without-nsup allows records to be written to the namespace with positive integer TTLs, even if NSUP is disabled.
If NSUP is not running, expired records are not removed unless they’re manually deleted or replaced.

This is for testing only and should not be used in production.

NSUP statistics nsup_cycle_duration and nsup_cycle_deleted_pct create visual warnings on the monitoring stack dashboard when NSUP reaches pre-determined thresholds.

The following example shows NSUP configuration for a specific namespace.

namespace test {
    nsup-period 600         # Maximum time between starting successive
                            # rounds of expiration or eviction - a value
                            # of 0 disables expiration and eviction.
    nsup-threads 2          # How many threads per round of expiration or eviction
    evict-tenths-pct 5      # Fraction of evictable records to delete
                            # per round of eviction. For example, 5 means
                            # delete 0.5 percent of evictable records).
}

Dynamically configure NSUP

Use the following command to dynamically configure NSUP with nsup-periodand nsup-threads.

asadm --enable -e  "manage config namespace TEST param nsup-threads to 3"

NSUP not keeping up

If NSUP is not able to keep up with expiring records, it might take the node a long time to restart, as the node will first remove expired records before rejoining the cluster. This can happen if the node is under heavy load, or if the nsup-period is set too high.

If the node is under heavy load, you can increase the number of threads used by NSUP with nsup-threads.

If a large percentage of records are removed at startup, the server has to deal with a temporary large increase in its defragmentation load.

In Database 6.3.0 and later, if the NSUP cycle takes longer than 2 hours and deletes more than 1% of the namespace, a warning line is written to the server log.

Disable eviction on sets

To protect a set from evictions, use disable-eviction true.

namespace NSNAME {
    set SETNAME {
        disable-eviction true
    }
}

For more information, see dynamically disabling set evictions.

Define a data size maximum on a set

To limit the amount of storage it can occupy, define the data size with stop-writes-size .

namespace <namespace-name> {
    set <set-name> {
        stop-writes-size 500M     # Limit this set's storage to 500MB
    }
}

For more information, see dynamically configure a set size cap.

Define an object count limit on a set

A set can have stop-writes-count to limit the number of records that can be written to it.

namespace <namespace-name> {
    set <set-name> {
        stop-writes-count 5000     # Limit the number of records that can
                                   # be written to this set to 5000.
    }
}

See dynamically configure an object-count limit on a set.

Specify a set-level default TTL

If you specify the default-ttl configuration option at the set level, it overrides any default-ttl option specified at the namespace level.

set test-set {
   default-ttl 60D
}

For more information, see dynamically configure a default TTL for a set.

Specify a set-level LRU eviction behavior

If you specify the default-read-touch-ttl-pct configuration option at the set level, it overrides any default-read-touch-ttl-pct option specified at the namespace level.

set test-set {
   default-read-touch-ttl-pct 1D
}

Managing evictions

Aerospike uses a high-water mark (HWM) to determine when to start evicting records. The HWM is defined as a percentage of the total storage capacity of the namespace. When the HWM is reached, Aerospike starts evicting records until the storage usage falls below the configured threshold. The eviction process is controlled by the evict-tenths-pct configuration parameter, which specifies the fraction of evictable records to delete per round of eviction. For example, a value of 5 means delete 0.5 percent of evictable records. The eviction process is performed by the namespace supervisor (NSUP) in the background, and it does not block client operations. The eviction process is also controlled by the nsup-period configuration parameter, which specifies the time interval in seconds between successive starts of NSUP scans of the primary index for expired records. The eviction process is performed in rounds, and each round deletes a fraction of the evictable records. The number of rounds is determined by the evict-tenths-pct configuration parameter, which specifies the fraction of evictable records to delete per round of eviction. For example, a value of 5 means delete 0.5 percent of evictable records.

Verify evictions

The eviction counter is reset every time the server is restarted. Use the asadm info command to verify that evictions are working the way you want:

Admin> info

This prints the free disk and memory available for each namespace. It also prints the configured limits to the eviction threshold for both memory and disk.

asadm -e "show statistics namespace for TEST like hwm_breached"

Inspect the Aerospike log for messages that show you may be evicting data. Run the following command on individual nodes:

grep -e "hwm_breached" -e "stop_writes" /var/log/aerospike/aerospike.log

List non-expirable records

Use the following asadm command to determine the number of non-expirable records:

show stat like non_expirable_objects

To find all non-expirable records, create a backup and grep for the pattern ^+ t 0 in the backup files. See asbackup command-line options and Backup file format for more information.

You can also write a user-defined function (UDF) to scan records based on the record.ttl field. This could turn into an intensive operation that may affect a production system’s performance. For examples, see How to modify TTL using UDF.

Adjusting stop-writes

Use the following asadm command to dynamically modify the stop-writes configuration parameters in the cluster:

asadm --enable -e "manage config namespace TEST param stop-writes-used-pct to 85 with 10.1.2.3"

You can also use following asinfo: command to dynamically modify the stop-writes configuration parameters on the local machine:

# asinfo only talk to one node at a time
asinfo -h 10.1.2.3 -v "set-config:context=namespace;id=TEST;stop-writes-used-pct=85"

To view your configured stop-writes parameters and their state, use the show stop-writes command.

See the detailed description of namespace eviction and stop-writes configuration parameters.

Where to Next?

Configure namespace storage-engine, which determines if records are persisted and, if so, where they are persisted.
Configure namespace durability, which determines how many replica copies of a record to keep in the cluster.