Configure namespace storage

This page describes how to configure namespace storage on the Aerospike Database.

Overview

Aerospike stores data according to the storage engine you select for a namespace. Storage engines include solid state drives (SSD), Intel Optane Persistent Memory (PMem), and storage in memory with or without storage-backed persistence. Your choice of storage engine affects the durability, cost, and performance of your cluster.

Enterprise Edition (EE) and Standard Edition (SE) of Database 7.0 store in-memory namespace data in shared memory, enabling fast restarts of the namespace. Community Edition (CE) and previous versions of Aerospike EE and SE store in-memory namespace data in volatile process memory, which means that namespace data must be reloaded from a persistence layer when the server restarts, or filled over the network from other cluster nodes.

You configure storage engines in the Database configuration file, /etc/aerospike/aerospike.conf. Each section describes the minimal configuration to enable a particular storage engine, and the storage sizing parameters used by that engine.

Setup for an SSD storage engine

The minimal configuration for an SSD-backed namespace requires:

Setting the storage-engine parameter to device.
Adding a device parameter for each SSD device partition to be used by the namespace.

Each storage device must be properly initialized as an Aerospike device, including zeroizing the 8MiB header for new devices and zeroizing the entire drive for previously-used devices. See Initializing SSDs for more information. The maximum size of a device is 2TiB. Larger devices must be partitioned into multiple equally-sized partitions that are less than 2TiB each.

Flush size

You can increase or decrease the flush size dynamically. The default value is 1MiB and the configured value of this parameter must be a power of 2. The options are: 4K, 8K, 16K, 32K, 64K, 128K, 256K, 512K, 1M, 2M, 4M, and 8M. In most direct-attached NVMe devices, the ideal size is 128K.

To identify the optimal settings, Aerospike recommends running a benchmark tool such as ACT. Enterprise licensees can contact Aerospike Support for guidance.

Flushing in Database 7.1

Each wblock is filled with incoming write transactions and then flushed to a persistent storage device.

The flush-size configuration parameter defines the size in bytes of each I/O unit that is written to disk. A flush event happens either when the 8MiB SWB is full, or when the flush-max-ms period expires. At this point, the most recently written data is flushed from the SWB to disk in a series of flush-size units. These writes are appended to each other until the write block is full.

Each device associated with a namespace has a write queue, and a cache. The configuration max-write-cache controls the number of bytes of pending write blocks that the system is allowed to keep before failing writes, if the write queue can’t immediately flush a streaming write buffer to a write block on the disk.

Flushing prior to Database 7.1

Each wblock is filled with incoming write transactions and then flushed to a persistent storage device as follows:

When the streaming write buffer (of size write-block-size) is full, or when the next record to be written doesn’t fit.
When the streaming write buffer has not been flushed for flush-max-ms milliseconds (default of one second).
On every write transaction when configured through the commit-to-device parameter for strong-consistency enabled namespaces.
In Database 7.0, an in-memory namespace without having a storage-backed persistence device or file configured does not flush to any storage device; its wblocks reside in shared memory alone.

For performance, consider reducing the flush-size from the default of 1MiB to 128KiB on SSD-backed namespaces. This may vary based on the specific workload and average record size. Run benchmarks with asbench to find the right setting.

Database 7.0 and later
Prior to 7.0

The following configuration file snippet is for a namespace with data on SSD:

namespace NSNAME {
    stop-writes-sys-memory-pct 90 # (optional) stop client writes to this namespace when
                                  # total memory consumption reaches 90% of system memory.
    max-record-size 7680K         # (optional) since Database 7.1 defaults to 1M, can range up to 8M regardless of flush-size
    # memory-size SIZE            # (obsolete) Do not use memory-size in Database 7.0 or later
    storage-engine device {       # configure the storage-engine to use persistence
        device /dev/nvme0n1p1     # raw SSD device partition. Maximum size is 2TiB
        device /dev/nvme0n1p2     # (optional) another raw device
        flush-size 128K           # (optional) replaces write-block-size in Database 7.1
}

You can optionally change the memory-size from the default of 4GiB to a size appropriate for the expected primary index size. See the Sizing Guide to learn about sizing memory.

namespace NSNAME {
    stop-writes-sys-memory-pct 90 # (optional) stop client writes to this namespace when
                                  # total memory consumption reaches 90% of system memory.
    max-record-size 128K          # (optional) otherwise write-block-size dictates the maximum record size
    memory-size 64G               # memory budget for the namespace to base other configurations on
    storage-engine device {       # configure the storage-engine to use persistence
        device /dev/nvme0n1p1     # raw device. Maximum size is 2 TiB
        device /dev/nvme0n1p2     # (optional) another raw device.
        write-block-size 128K     # (optional) adjust block size to make it efficient for SSDs.
    }
}

Setup for in-memory with storage-backed persistence

The persistence layer for an in-memory namespace can be configured as either one or more file, or alternatively one or more SSD device partitions.

Database 7.0 and later
Prior to 7.0

To configure an in-memory namespace, set the storage-engine option to memory.

The following configuration file snippet shows an in-memory namespace with storage-backed persistence.

namespace NSNAME {
    max-record-size 128K              # (optional) since Database 7.1 defaults to 1M, can range up to 8M regardless of flush-size
    storage-engine memory {
        file /opt/aerospike/ns1.dat   # location of a namespace data file on server
        filesize 64G                  # maximum size of each file in GiB; maximum size is 2TiB
        stop-writes-avail-pct 5       # (optional) stop-writes threshold as a percentage of
                                      # devices/files size or data-size
        stop-writes-used-pct 70       # (optional) stop-writes threshold as a percentage of
                                      # devices/files size, or data-size
        evict-used-pct 60             # (optional) eviction threshold, as a percentage of
                                      # devices/files size, or data-size
    }
}

Aerospike allocates an amount of shared memory equal to the total size of the persistence layer. The number of shared memory segments, called stripes, will be equal to the total number of files or the total number of the SSD device partitions for this namespace. The size of the persistence layer is defined as either the value of the filesize option multiplied by the number of files, or the total storage space of the SSD device partitions for this namespace. Each stripe size is also equal to the filesize option or to the size of a respective SSD device partition. This is because starting with Database 7.0 the in-memory data of the namespace is mirrored to the persistence layer; your capacity planning should reflect a 1:1 ratio between the sizes of memory storage and persistent storage for the namespace.

Verify that your operating system is able to allocate enough shared memory to mirror your filesize or device size.

sysctl -n kernel.shmmax
kernel.shmmax = 2147483648

# 2GiB shmmax is too small for a 64GiB filesize
sysctl -w kernel.shmmax=17592186044416

sysctl -n kernel.shmall
kernel.shmall = 2097152

getconf PAGE_SIZE
4096

# 8GiB (2097152 * 4096) shmall is too small for a 64GiB filesize
sysctl -w kernel.shmall=4294967296

The minimal configuration for an in-memory namespace includes:

Setting storage-engine to device.
Setting data-in-memory to true.
Entering a list of file parameters to configure where data will be persisted. Use the file parameter with the namespace context, and not the logging context.

The filesize must be large enough to support the size of the data on disk, with a maximum allowed value of 2TiB. For common use cases, this should roughly be four times the memory-size.

You may need to change the memory-size from the default of 4GiB to a size appropriate to handle the expected primary index size, and the expected size of the data in memory. See the Sizing Guide to learn about sizing memory.

namespace NSNAME {
    memory-size 64G                   # memory budget for the namespace to base other configurations on
    high-water-memory-pct 60          # (optional) eviction threshold as a percent of memory-size
    max-record-size 1M                # (optional) limit the maximum record size
    storage-engine device {
        write-block-size 1M
        file /opt/aerospike/ns1.dat   # location of data file on server
        file /opt/aerospike/ns2.dat   # (optional) Location of data file on server
        file /opt/aerospike/ns3.dat   # (optional) Location of data file on server
        file /opt/aerospike/ns4.dat   # (optional) Location of data file on server
        filesize 64G                  # maximum size of each file in GiB; maximum size is 2TiB
        data-in-memory true           # indicates that all data should also be
                                      # in memory.
        min-avail-pct 5               # (optional) stop-writes threshold as a percentage of
                                      # devices/files size or data-size.
        max-used-pct 70               # (optional) stop-writes threshold as a percentage of
                                      # devices/files
    }
    high-water-disk-pct 60            # (optional) eviction threshold, as a percentage of
                                      # devices/files size
}

Setup in-memory without storage-backed persistence

Database 7.0 and later
Prior to 7.0

Aerospike EE and SE store in-memory namespace data in shared memory, enabling fast restarts of the namespace. These editions can also cold restart from namespace data in shared memory. Shared memory can be backed up to the filesystem using the Aerospike Shared Memory Tool (asmt) before restarting the host machine.

To configure an in-memory namespace, set the storage-engine option to memory.

The following configuration file snippet is for an in-memory namespace without storage-backed persistence.

namespace NSNAME {
    max-record-size 1M                # (optional) since Database 7.1 defaults to 1M, can range up to 8M regardless of flush-size
    storage-engine memory {
        data-size 64G                 # memory pre-allocated for the data of this namespace
        stop-writes-avail-pct 5       # (optional) stop-writes threshold as a percentage of
                                      # devices/files size or data-size
        stop-writes-used-pct 70       # (optional) stop-writes threshold as a percentage of
                                      # devices/files size, or data-size
        evict-used-pct 60             # (optional) eviction threshold, as a percentage of
                                      # devices/files size, or data-size
    }
}

Aerospike allocates an amount of memory equal to the value of the data-size. This size is split into 8 stripes equally.

Verify that your operating system can allocate enough shared memory for the data storage stripes. In the example above each stripe is 1/8th the data-size.

sysctl -a | grep shmmax
kernel.shmmax = 2147483648

# If data-size is 64GiB, 2GiB shmmax is too small for the 8GiB stripes
sysctl -w kernel.shmmax=17592186044416

sysctl -n kernel.shmall
kernel.shmall = 2097152

getconf PAGE_SIZE
4096

# 8GiB (2097152 * 4096) shmall is too small for a 64GiB data-size
sysctl -w kernel.shmall=4294967296

Aerospike CE stores in-memory namespace data in volatile process memory, which does not survive restarts of the Aerospike daemon (asd).

The minimal configuration for a namespace without persistence is to set storage-engine to memory. If your namespace requires more than the default 4GiB memory-size allocation for the primary index and data in memory, then you must also adjust memory-size. See the Sizing Guide to learn about sizing memory.

namespace NSNAME {
    memory-size 64G           # memory budget for the namespace to base other configurations on
    storage-engine memory     # does not use persistence
    max-record-size 1M        # (optional) limit the maximum record size
    stop-writes-pct 90        # (optional) stop-writes threshold as a percent of memory-size
    high-water-memory-pct 60  # (optional) eviction threshold as a percent of memory-size
}

Setup for data on Intel Optane Persistent Memory (PMem)

The minimal configuration for the persistent memory storage namespace requires setting two parameters in aerospike.conffor each PMem storage file to be used by this namespace:

storage-engine
file Use the file parameter with the namespace context, and not the logging context.

The filesize must be large enough to support the size of the data, up to the maximum allowed value of 2TiB.

In Database 5.1 and later, persistent memory namespaces are treated equivalently to data-in-memory namespaces for the purpose of computing the default number of service-threads. The value of service-threads will default to the number of CPUs, unless there is at least one SSD namespace.

On systems with hyperthreading, only physical cores are counted. In multi-socketed systems, if non-uniform memory access (NUMA) pinning is enabled, each Aerospike instance counts only the CPU cores on the socket it is servicing.

Database 7.0 and later
Prior to 7.0

namespace NSNAME {
    storage-engine pmem {        # configure the storage-engine to use persistence
        file /mnt/pmem/ns1.dat   # location of PMem data file on server, where /mnt/pmem is the
                                 # mount point of an EXT4 or XFS file system that resides in PMem
                                 # and has been mounted with the DAX option
        file /mnt/pmem/ns2.dat   # (optional) Location of PMem data file on server
        filesize 64G             # maximum size of each file in GiB; maximum size is 2TiB
    }
}

You may need to change the memory-size from the default of 4GiB to a size appropriate for the expected primary index size. See the Sizing Guide to learn about sizing memory.

namespace NSNAME {
    memory-size 32G             # memory allocation for the namespace
    storage-engine pmem {       # configure the storage-engine to use
                                # persistence; maximum size is 2TiB
        file /mnt/pmem/ns1.dat  # location of PMem data file on server, where /mnt/pmem is the
                                # mount point of an EXT4 or XFS file system that resides in PMem
                                # and has been mounted with the DAX option.
        file /mnt/pmem/ns2.dat  # (optional) Location of PMem data file on server.
        filesize 64G            # maximum size of each file in GiB; maximum size is 2TiB
    }
}

Setup for a data-in-index storage engine

A data-in-index configuration is a highly-specialized namespace for niche use cases such as counters. Use the data-in-index engine if your data is single-bin, fits in 8 bytes, and you need the performance of an in-memory namespace but do not want to lose the fast restart capability provided in Aerospike Enterprise Edition.

The minimal configuration for a data-in-index namespace includes:

Setting single-bin to true.
Setting data-in-index to true.
Setting data-in-memory to true.
The storage-engine must be device
The file or device parameters must be configured to map to the persisted storage device to be used by this namespace. Use the file parameter with the namespace context, and not the logging context.

You may need to change the memory-size from its default of 4GiB to a size that can accommodate the primary index, and filesize from its 16GiB default to the size of the data on disk, with a maximum allowed value of 2TiB. See the Sizing guide to learn about sizing memory.

namespace NSNAME {
    memory-size 64G                   # memory budget for the namespace to base other configurations on
    single-bin true                   # required true by data-in-index
    storage-engine device {           # configure the storage-engine to use
                                      # persistence
      file /opt/aerospike/ns1.dat     # location of data file on server
      file /opt/aerospike/ns2.dat     # (optimal) location of another data file on server
      filesize 64G                    # maximum size of each file in GiB; maximum size is 2TiB
      data-in-memory true             # required true by data-in-index
    }
}

Setup for shadow device

The shadow device storage model is designed for cloud environments with extremely high-performance SSDs that are ephemeral (not persistent), and where the persisted devices are not providing the necessary performance.

Shadow devices act as persisted stores, and must be greater than or equal to the size of the primary device. The primary device receives all read/write commands as usual, and all writes are duplicated to a shadow device. This creates a persisted data volume with lower input/output operations per second (IOPS) requirements, while gaining the IOPS benefit of the non-persisted volume without using large amounts of RAM. The shadow device only needs to satisfy the write IOPS requirements of your workload, not reads.

To use shadow devices, add the persisted volume after the declaration of the non-persisted volume on the same line in aerospike.conf.

namespace NSNAME {
    storage-engine device {
        device /dev/sdb /dev/sdf  # sdb is the fast ephemeral volume,
                                  # and sdf is the slower persisted volume
    }
}

In the example, /dev/sdb is the fast non-persisted device. /dev/sdf is the persisted device. The device order is important, with the fast non-persisted device named first, and the shadow device named second. The two devices must be listed on the same line.

You may configure multiple shadow devices, with each device pair on its own line. Each shadow device must be paired with only one primary device:

storage-engine device {
    device /dev/sdb /dev/sdf
    device /dev/sdc /dev/sdg
    device /dev/sdd /dev/sdh
}

Instance recovery

If the ephemeral device is damaged - missing header information for example - and there is a valid shadow device, the server will load data from the EBS shadow device into the ephemeral disk and into memory (primary index, secondary index, data-in-memory). See the instance failure section of the AWS deployment guide.
If the ephemeral device fails, when the instance restarts it populates the data from the shadow device. The server reads transactions as usual after the node rejoins the cluster.

Updating the `filesize` parameter

To change filesize on a namespace with storage-engine set to device, use the following procedure.

Increase `filesize`

Perform the following steps on your cluster one node at a time.

Change the filesize parameter in the configuration file. Ensure that the relevant partition has sufficient free disk space. To add a new file to the configuration, place it as the last entry in the Aerospike storage configuration.

You do not have to delete and recreate the file when increasing the configured size of the file.
Restart Aerospike:
Terminal window
```
/etc/init.d/aerospike restart
```
Wait for port 3000 to open and for the node to rejoin the cluster.

The following shell command is useful for discovering whether a node has started successfully:
Terminal window
```
cat /var/log/aerospike/aerospike.log | grep -i 'cake'
```
Proceed with the other nodes in the cluster one by one, repeating the above steps. To avoid data inconsistency, wait for migrations to complete between each restart.

Reduce `filesize`

Reducing the size of an existing data file may result in the loss of data. Proceed with caution.

To avoid data inconsistency, delete the data file and update the configuration file on one node at a time, allowing the data to migrate to other nodes before proceeding to the next node.

Stop Aerospike:

/etc/init.d/aerospike stop

Delete the file and update the configuration file with the new filesize.
Start Aerospike:

/etc/init.d/aerospike start

Where to next?

Configure Data Retention Policy, which determines how long records are kept after they are initially written.
Configure Data Durability Policy which determines how many replica copies of a record to keep in the cluster.

When a namespace runs low on storage

When a namespace can no longer write data, you will see error messages in the log, like the following example message:

Sep 05 2022 21:28:48 GMT: INFO (namespace): (base/namespace.c:458) {test} lwm breached true, hwm_breached true, stop_writes true, memory sz:22971755648 nobjects:358933683 nbytesmem:0 hwm:23192823808 sw:34789232640, disk sz:216122189312 hwm:216116854784 sw:341237137408

The example shows that the namespace test on the node has reached the high-water-mark for either disk or memory, and the stop-writes percentage. As a result, the namespace can no longer accept write requests. Messages that look like this are the result of the stop_write parameter being true either on this node, or other nodes:

Sep 05 2022 21:28:48 GMT: INFO (rw): (base/thr_rw.c:2300) writing pickled failed 8 for digest 7318ad7422e51009

Resolve this by adjusting configuration parameters:

Increase your defragmentation priority or rate.
Slow your migration speed, if migrations are active.
If you configured evictions, speed up your current eviction rate by reducing the evict-used-pct in Database 7.0 and later. Prior to Database 7.0. use high-water-disk-pct and high-water-memory-pct.
Increase the stop-writes configuration parameters such as stop-writes-sys-memory-pct. Use stop-writes-used-pct in Database 7.0 or later. Prior to Database 7.0. use stop-writes-pct.

All of these parameters can be changed dynamically in the main Aerospike configuration file on the node.

Avoiding 0% available space

When storage running low occurs too frequently, you will see log entries similar to the following:

Apr 27 2022 02:53:12 GMT: WARNING (drv_ssd): (storage/drv_ssd.c:1844) could not allocate storage on device /dev/sdb

Since Database 7.0, when data_avail_pct goes to zero, all the subsequent writes will fail. This should not happen if the default stop-writes-avail-pct is not modified.

Prior to Database 7.0, when the device_available_pct (or pmem_available_pct for PMem storage) goes to zero, all the subsequent writes will fail. This should not happen if the default min-avail-pct is not modified.

Configure namespace storage

Overview

Setup for an SSD storage engine

Flush size

Flushing in Database 7.1

Flushing prior to Database 7.1

Setup for in-memory with storage-backed persistence

Setup in-memory without storage-backed persistence

Setup for data on Intel Optane Persistent Memory (PMem)

Setup for a data-in-index storage engine

Setup for shadow device

Instance recovery

Updating the filesize parameter

Increase filesize

Reduce filesize

Where to next?

When a namespace runs low on storage

Avoiding 0% available space

Updating the `filesize` parameter

Increase `filesize`

Reduce `filesize`