Configuring namespace data retention
This page describes how to configure Aerospike namespaces to ensure sufficient memory for continuous operation.
- Configure the Namespace Supervisor (NSUP) to expire or evict records.
- Configure a stop-writes mechanism to stop new client writes from filling namespace storage beyond a designated threshold.
Expirations, evictions, and TTL
Aerospike records include void-time metadata - the timestamp when they automatically expire. Records with a void-time of 0 do not get evicted or expire.
-
An expired record does not exist from an application point of view, but its metadata still occupies a 64 byte slot in the primary index.
-
Eviction is an early expiration process. When namespace storage exceeds a configurable high-water mark, NSUP deletes the records with non-zero void-time that are nearest to their expiration. Eviction continues until sufficient space has been recovered.
-
Applications optionally send a time to live (TTL) with writes, which declares the number of seconds from now until the record expires. By default, namespaces reject writes with a TTL, and NSUP does not run, but this behavior is configurable. If client writes are allowed to send a TTL, the record’s void-time is set according to client TTL values. In this case, you should also configure NSUP to check for expired records.
Starting with Aerospike Database 7.1, read commands may extend the TTL of a record if it is within a specified percent from its void-time. This least-recently used (LRU) eviction behavior is controlled by the namespace or set-level configuration parameter default-read-touch-ttl-pct
.
The client may override the server configuration and provide an explicit read-touch policy based on the percent to the record’s void-time. For example, in the Java client 8.1.0 the Policy.readTouchTtlPercent
was added.
Namespace Supervisor (NSUP)
The Namespace Supervisor (NSUP) removes expired records from the primary index. Any past generation version of a record that NSUP leaves behind is cleaned up by the continuous defragmentation process.
If NSUP is not running, expired records are not removed unless they’re manually deleted or replaced.
Dynamically configure NSUP
Use the following command to dynamically configure NSUP with nsup-period
and
nsup-threads
.
asadm --enable -e "manage config namespace TEST param nsup-threads to 3"
Monitor and control NSUP
- The
nsup-period
configuration parameter controls how often NSUP runs. The default value is 0, which means that NSUP does not run. - The
nsup-threads
configuration parameter controls how many threads NSUP uses to scan the primary index for expired records. The default value is 1. - The
allow-ttl-without-nsup
configuration parameter allows records to be written to the namespace with positive integer TTLs, even if NSUP is disabled. This is for testing only and should not be used in production. - The NSUP statistics
nsup_cycle_duration
andnsup_cycle_deleted_pct
create visual warnings on the monitoring stack dashboard when NSUP reaches pre-determined thresholds.
NSUP not keeping up
If NSUP is not able to keep up with expiring records, it might take the
node a long time to restart, as the node will first remove expired records before rejoining the
cluster. This can happen if the node is under heavy load, or if the nsup-period
is set too high.
If the node is under heavy load, you can increase the number of threads used by NSUP with nsup-threads
.
If a large percentage of records are removed at startup, the server has to deal with a temporary large increase in its defragmentation load.
In Database 6.3 and later, if the NSUP cycle takes longer than 2 hours and deletes more than 1% of the namespace, a warning line is written to the server log.
Disable eviction on sets
To protect a set from evictions, use disable-eviction true
.
namespace NSNAME { set SETNAME { disable-eviction true }}
For more information, see dynamically disabling set evictions.
Define a data size maximum on a set
To limit the amount of storage it can occupy, define the data size with stop-writes-size
.
namespace <namespace-name> { set <set-name> { stop-writes-size 500M # Limit this set's storage to 500MB }}
For more information, see dynamically configure a set size cap.
Define an object count limit on a set
A set can have stop-writes-count
to limit the number of records that can be written to it.
namespace <namespace-name> { set <set-name> { stop-writes-count 5000 # Limit the number of records that can # be written to this set to 5000. }}
See dynamically configure an object-count limit on a set.
Specify a set-level default TTL
If you specify the default-ttl
configuration option
at the set level, it overrides any default-ttl
option specified at the namespace level.
set test-set { default-ttl 60D}
For more information, see dynamically configure a default TTL for a set.
Specify a set-level LRU eviction behavior
If you specify the default-read-touch-ttl-pct
configuration option
at the set level, it overrides any default-read-touch-ttl-pct
option specified at the namespace level.
set test-set { default-read-touch-ttl-pct 1D}
List non-expirable records
Use the following asadm
command to determine the number of non-expirable records:
show stat like non_expirable_objects
To find all non-expirable records, create a backup and grep
for the pattern ^+ t 0
in the backup files. See asbackup command-line options
and Backup file format
for more information.
You can also write a user-defined function (UDF)
to scan records based on the record.ttl
field. This could turn into an intensive operation that may affect a production system’s performance. For examples, see How to modify TTL using UDF
.
Stop-writes
Use the following asadm
command to dynamically modify the stop-writes configuration parameters:
asadm --enable -e "manage config namespace TEST param stop-writes-used-pct to 85 with 10.1.2.3"
You can also use following asinfo
: command to dynamically modify the stop-writes configuration parameters:
# asinfo only talk to one node at a timeasinfo -h 10.1.2.3 -v "set-config:context=namespace;id=TEST;stop-writes-used-pct=85"
To view your configured stop-writes parameters and their state, use the show stop-writes
command.
See the detailed description of namespace eviction and stop-writes configuration parameters.
Evictions
Aerospike uses a high-water mark (HWM) to determine when to start evicting records. The HWM is defined as a percentage of the total storage capacity of the namespace. When the HWM is reached, Aerospike starts evicting records until the storage usage falls below the configured threshold.
The eviction process is controlled by the evict-tenths-pct
configuration parameter, which specifies the fraction of evictable records to delete per round of eviction. For example, a value of 5 means delete 0.5 percent of evictable records.
The eviction process is performed by the namespace supervisor (NSUP) in the background, and it does not block client operations. The eviction process is also controlled by the nsup-period
configuration parameter, which specifies the time interval in seconds between successive starts of NSUP scans of the primary index for expired records.
The eviction process is performed in rounds, and each round deletes a fraction of the evictable records. The number of rounds is determined by the evict-tenths-pct
configuration parameter, which specifies the fraction of evictable records to delete per round of eviction. For example, a value of 5 means delete 0.5 percent of evictable records.
Verify evictions
The eviction counter is reset every time the server is restarted. Use the asadm
info command to verify that evictions are working the way you want:
Admin> info
This prints the free disk and memory available for each namespace. It also prints the configured limits to the eviction threshold for both memory and disk.
asadm -e "show statistics namespace for TEST like hwm_breached"
Inspect the Aerospike log for messages that show you may be evicting data. Run the following command on individual nodes:
grep -e "hwm_breached" -e "stop_writes" /var/log/aerospike/aerospike.log
Data retention configuration parameters
The following parameters are configurable per namespace, and focus on data retention.
Configuration parameter | Description | Default value | Notes |
---|---|---|---|
nsup-period | The time interval in seconds between successive starts of NSUP scans of the primary index for expired records. If NSUP takes longer to traverse the primary index than the nsup-period , it will effectively be running continuously. | 0 | By default, NSUP does not run ( nsup-period of 0).Once NSUP is running, it does not stop until it has traversed the entire primary index of the namespace. When nsup-period is 0, to allow writes that have a positive integer TTL, you must set allow-ttl-without-nsup to true (see below).Regardless of the setting of nsup-period , writes with a non-positive TTL (<= 0) are always allowed. |
allow-ttl-without-nsup | A parameter for testing only. Measures the impact of NSUP when running in a use case where TTL is non-0. Allows records to be written to the namespace with positive integer TTLs, even if NSUP is disabled. | false | Warning: Records that have a TTL when NSUP is not running. |
default-ttl | The default TTL value to use for the namespace, whenever a client writes a record with a TTL of 0. | 0 | If the value of default-ttl is non-0 and nsup-period is 0 (its default), the Aerospike server will not start.A default-ttl of 0 sets a void-time of 0. Aerospike never expires or evicts records that have a void-time of 0.default-ttl cannot be set higher than 10 years (3650D). |
default-read-touch-ttl-pct | For the namespace configuration, 0 means that read operations never touch a record. | 0 | For the namespace configuration, 0 means read operations never extend a record’s TTL. Values 1-100 specify a percentage of the most recent record expiration time, so that a read within this interval of the record’s end of life will generate a touch. The touch uses the previous record TTL to extend the record’s life. For the set configuration, 0 means use the namespace value. A set-level configuration can explicitly override the default namespace value: -1 means reads never touch a record. Values 1-100 are the same as the namespace configuration. Clients may also send a read-touch TTL percent: 0 instructs the server to use its configuration. Other values override the server configuration. -1 means reads never touch a record, values 1-100 are the same as the server configurations. |
The following example shows additional namespace data retention parameters to tune data expiration and eviction, with comments describing usage.
namespace NSNAME { nsup-period 600 # Maximum time between starting successive # rounds of expiration or eviction - a value # of 0 disables expiration and eviction. nsup-threads 2 # How many threads per round of expiration or eviction evict-tenths-pct 5 # Fraction of evictable records to delete # per round of eviction. For example, 5 means # delete 0.5 percent of evictable records).}
Eviction and stop-writes configuration parameters
The following configuration parameters are used to control the eviction and stop-writes behavior of a namespace. The parameters are divided into two sections: one for Database 7.0 and later, and one for Database 6.x and earlier.
Eviction and stop-writes Database 7.0 and later
Starting with Database 7.0 the configuration parameters mentioned below apply to any storage-engine, which share the same write-block based storage format.
Configuration parameter | Description | Default value | Notes |
---|---|---|---|
evict-indexes-memory-pct | Eviction threshold defined as the percentage of the indexes-memory-budget , which is the stop-writes threshold for the namespace, based on total memory used for indexes (primary, secondary and set indexes). | 0 | The default value of 0 disables this threshold. |
evict-used-pct | Eviction threshold defined as the ratio of used storage (data_used_bytes ) to the total namespace storage capacity (data_total_bytes ). | 0 | The default value of 0 disables this threshold. |
evict-mounts-pct | Eviction threshold defined as the ratio of index mount utilization (index_mounts_used_pct or sindex_mounts_used_pct ) to the namespace mounts-size-limit . | 0 | Only applies when the namespace primary or secondary indexes are configured to be stored in flash or persistent memory. |
indexes-memory-budget | The maximum memory budget (in bytes) used by the namespace for indexing (primary, secondary and set indexes). Acts as a stop-writes threshold. | 0 | Deletions, replica writes, and migration writes are still allowed when the namespace is in stop-writes mode. |
stop-writes-sys-memory-pct | Percentage threshold at which client writes are refused, defined as the ratio of total memory usage (across all applications) to the system memory. | 90 | Deletions, replica writes, and migration writes are still allowed when the namespace is in stop-writes mode. |
stop-writes-avail-pct | Stops client writes when the namespace storage engine has its reserve of write-blocks drop under a minimum, defined as the ratio of free write-blocks to the storage engine capacity. | 5 | Deletions, replica writes, and migration writes are still allowed when the namespace is in stop-writes mode. |
stop-writes-used-pct | Stops client writes when the ratio of used storage space to total storage space (in bytes) exceeds the given max percentage. | 70 | Deletions, replica writes, and migration writes are still allowed when the namespace is in stop-writes mode. |
The following configuration file snippet shows an example of namespace data retention with TTLs used and eviction enabled.
namespace NSNAME { stop-writes-sys-memory-pct 90 # Stop-writes threshold based on memory usage # across the host machine storage-engine device { device /dev/nvme0n1p1 device /dev/nvme0n1p2 stop-writes-avail-pct 5 # stop-writes threshold as a percentage # of the total device size. stop-writes-used-pct 70 # stop-writes threshold as a percentage # of the total device size. evict-used-pct 60 # eviction threshold as a percentage # of the total device size. } index-type flash { # Primary index on flash (AKA All Flash) mounts-budget 64G evict-mounts-pct 80 # eviction threshold based on the primary index # mounts budget }}
Eviction and stop-writes prior to Database 7.0
Configuration parameter | Description | Default value | Notes |
---|---|---|---|
high-water-disk-pct | Percentage threshold at which the eviction process starts, defined as the ratio of namespace disk consumption to its device storage capacity. | 0 | Default value of 0 disables the threshold. |
high-water-memory-pct | Percentage threshold at which the eviction process starts, defined as the ratio of namespace memory consumption to its memory-size . | 0 | The default value of 0 disables the threshold. |
mounts-high-water-pct | Percentage threshold at which the eviction process starts, defined as the ratio of index mount utilization (index_flash_used_pct or index_pmem_used_pct ) to the namespace mounts-size-limit . | 0 | Only applies when the namespace primary index is configured to be stored in flash or persistent memory. |
stop-writes-pct | Percentage threshold at which client writes are refused, defined as the ratio of namespace memory consumption to its memory-size . | 90 | Deletions, replica writes, and migration writes are still allowed when the namespace is in stop-writes mode. |
stop-writes-sys-memory-pct | Percentage threshold at which client writes are refused, defined as the ratio of total memory usage (across all applications) to the system memory. | 90 | Deletions, replica writes, and migration writes are still allowed when the namespace is in stop-writes mode. |
min-avail-pct | Stops client writes when any namespace storage device (SSD or PMem) has its reserve of write blocks drop under a minimum, defined as the ratio of free write blocks to the device storage capacity. | 5 | Deletions, replica writes, and migration writes are still allowed when the namespace is in stop-writes mode. |
max-used-pct | Stops client writes when the ratio of used storage space to total storage space (in bytes) exceeds the given max percentage. | 70 | Deletions, replica writes, and migration writes are still allowed when the namespace is in stop-writes mode. |
The following configuration file snippet shows an example of namespace data retention with TTLs used and eviction enabled.
namespace NSNAME { stop-writes-sys-memory-pct 90 # Stop-writes threshold based on memory usage # across the host machine memory-size 256G stop-writes-pct 90 # Stop-writes threshold based on namespace memory-size storage-engine device { device /dev/nvme0n1p1 device /dev/nvme0n1p2 min-avail-pct 5 # Stop-writes threshold as a percentage # of the total device size. max-used-pct 70 # stop-writes threshold as a percentage # of the total device size. } high-water-disk-pct 60 # Eviction threshold based on namespace device-size high-water-memory-pct 70 # Eviction threshold based on namespace memory-size index-type flash { # Primary index on flash (AKA All Flash) mounts-size-limit 64G mounts-high-water-pct 80 # Eviction threshold based on mounts-size-limit }}
Client TTL values
Aerospike clients can send a TTL value with write and read commands. The server uses the TTL value to determine the record’s void-time, which is the timestamp when the record expires. The server uses the following rules to determine the void-time:
- If the namespace is configured to allow writes with a TTL, the client-sent TTL value is used to set the void-time.
- If the namespace is configured to not allow writes with a TTL, the client-sent TTL value is ignored, and the server uses the
default-ttl
of the namespace to set the void-time. - If the namespace is configured to allow writes with a TTL, but the client-sent TTL value is 0, the server uses the
default-ttl
of the namespace to set the void-time. - If the namespace is configured to allow writes with a TTL, but the client-sent TTL value is -1, the server sets the void-time to 0, which means that the record will not expire.
- If the namespace is configured to allow writes with a TTL, but the client-sent TTL value is -2, the server does not modify the void-time if the write is an update operation. If the write creates a new record, the
default-ttl
determines the void-time. - If the namespace is configured to allow writes with a TTL, but the client-sent TTL value is less than the current remaining life, the server reduces the record’s void-time to less than its current void-time. This may have undesirable side effects upon a cold restart. For further details, see Issues with cold-start resurrecting deleted records.
- If the namespace is configured to allow writes with a TTL, but the client-sent TTL value is greater than the current remaining life, the server sets the void-time to the current time plus the TTL value.
Write commands
When a namespace is configured to allow writes with a TTL, a client may send a positive TTL value, which determines how many seconds the record has until it expires. The void-time is set to the timestamp of now plus the TTL.
Additionally, there are three special TTL values that can always be used, regardless of namespace configuration:
- A TTL of 0 instructs the server to use the
default-ttl
of the namespace when setting the void-time. - A TTL of -1 sets the record’s void-time to 0, which means that the record will not expire.
- A TTL of -2 instructs the server not to modify the void-time if the write is an update operation. If the write creates a new record, the
default-ttl
determines the void-time.
A durable delete from the client will create a tombstone, which always has a void-time of 0, regardless of the record’s previous void-time.
Write commands
When a namespace is configured to allow writes with a TTL, a client may send a positive TTL value, which determines how many seconds the record has until it expires. The void-time is set to the timestamp of now plus the TTL.
Three special TTL values can always be used, regardless of namespace configuration:
- A TTL of 0 instructs the server to use the
default-ttl
of the namespace when setting the void-time. - A TTL of -1 sets the record’s void-time to 0, which means that the record will not expire.
- A TTL of -2 instructs the server not to modify the void-time if the write is an update operation. If the write creates a new record, the
default-ttl
determines the void-time.
Read commands
Three special TTL values can always be used, regardless of namespace configuration:
- A TTL of 0 instructs the server to use the
default-ttl
of the namespace when setting the void-time. - A TTL of -1 sets the record’s void-time to 0, which means that the record will not expire.
- A TTL of -2 instructs the server not to modify the void-time if the write is an update operation. If the write creates a new record, the
default-ttl
determines the void-time.
Aerospike Database 7.1 introduced an LRU eviction behavior. Client versions implementing this functionality add the ability to control if reads extend record void-time, regardless of namespace configuration, with the Policy.readTouchTtlPercent
:
- A value of 0 instructs the server to use the
default-read-touch-ttl-pct
of the namespace or set. - A value of -1 states that this read operation will never modify the record’s TTL.
- A value of 1-100 describes that this read should also touch the record (extending its TTL) if the record’s void-time is within this percentage.
Warning: records that have a TTL when NSUP is not running
The following side-effects occur when NSUP is not running:
- Trying to write a record with a TTL gets rejected with error code 22 (
AS_ERR_FORBIDDEN
). - Expired records do not get removed from the indexes and storage.
- An attempt to read an expired record returns error code 2 (
AS_ERR_NOT_FOUND
). - An attempt to update an expired record creates a new record with its metadata in the same primary index slot. The previous version of the record is ignored, and the generation count is reset to 1.
- Deleting an expired record removes it from the index. The defragmentation process will reclaim its storage.
- An attempt to read an expired record returns error code 2 (
Where to Next?
- Configure Storage engine which determines if and where records are persisted to.
- Configure Data durability policy which determines how many replica copies of a record to keep in the cluster.