Skip to main content
Loading

Configure strong consistency

Overview

This page describes how to configure a namespace with strong consistency (SC). This process takes place in the following phases:

Acquire SC-enabled feature-key file

  1. Contact Aerospike Sales to acquire a feature-key file with SC enabled.
  2. Install the new feature-key file, and overwrite the existing file. feature-key-file:

Install and configure clock synchronization

Aerospike recommends using a clock synchronization system compatible with your environment. The most common method of synchronizing clocks is the common NTP protocol, which easily exceeds the granularity that Aerospike requires.

  • Aerospike's gossip heartbeat protocol monitors the amount of skew in a cluster, and sends an alert if it detects a large amount of skew.
  • 10 seconds or fewer of skew is well within acceptable limits and does not trigger any warnings.
  • By default, warnings begin at 12 seconds of skew to provide early notification before conditions worsen.
  • By default, the database enters stop-writes mode at 17 seconds of skew to prevent data loss.
  • Data loss is possible at 23 seconds or more of skew with the default heartbeat configuration.

For information on installing NTP, see How to configure NTP for Aerospike.

Configure node IDs with SC

Aerospike node-id is an 8 byte number which is derived using the server's MAC address and the fabric port by default. The 6 least significant bytes are copied from the MAC address, and the 2 most significant bytes are copied from the fabric port.

note

Consider using configured node IDs for your SC cluster. Hardware-generated node IDs, while convenient for availability mode's automatic algorithms, can become cumbersome when managing SC.

  • For more readable node IDs, you can configure a specific 1 to 16 character hexadecimal node-id for each node. For example, "01" and "B2" and "beef" are valid, but "fred" is not.
    • This feature works with cloud providers such as Amazon AWS. Specifying a node-id to match a particular EBS volume allows for a node to be removed and a new instance created without having to change the node-id when you bring up Aerospike with the new instance.
  • The cluster does not allow nodes with duplicate node-ids to join the cluster. The refusal is noted in the log file.

To convert a cluster from automatic to specified node IDs, you may change the configuration file for a server and restart. For an SC namespace, you will have to change the roster. To prevent data unavailability, if you are changing multiple servers, update a single server, restart it, modify and commit the new roster, then repeat for the next server. You do not have to wait until data migration finishes, but you must validate and apply the new roster.

To configure the node ID, edit your configuration file and add node-id <HEX NODE ID> to the service context.

caution

The configuration file options node-id and node-id-interface are mutually exclusive.

service {
user root
group root
service-threads 20 # Should be 5 times number of vCPUs for 4.7+ and at least one
# SSD namespace, otherwise number of vCPUs
nsup-period 16
proto-fd-max 15000

node-id a1
}

When the server starts, validate that the node-id is as expected, then change the roster of any SC namespace intending to include this server.

Configure SC for expiration

SC should be used carefully with expiration. For background on expiration and eviction, see Definition of expiration and eviction.

For each namespace where you want SC, add strong-consistency true and default-ttl 0 to the namespace stanza. This configuration requires Database 7.0 or later.

namespace test {
replication-factor 2
default-ttl 0
strong-consistency true
storage-engine memory {
file /var/lib/aerospike/test.dat
filesize 4G
}
}

Non-durable deletes and configuration settings

A key consideration for SC is the length of time records exist, which is a function of several configuration parameters.

If a record is non-durably deleted in one master partition but has not yet been consistently marked as deleted in another partition, if the master has to heal by restoring a prior replica, the "deleted" record is restored in the live partition.

You need to be sure that when expunge, expiration, or eviction of non-durable-delete events occur, there are no transactions updating the record. With strong-consistency-allow-expunge true, for highly active or essential records, you have the following options:

Commit-to-device

Determine whether you need commit-to-device. If you are running in a situation where no data loss is acceptable even in the case of simultaneous server hardware failures, you can choose that a namespace commits to disk. This will cause performance degradation, and only provides benefit where hardware failures are within milliseconds of each other. Add commit-to-device true within the storage engine scope of the namespaces in question.

commit-to-device requires serializing writes to the output buffer, and will thus flush more data than is strictly necessary. Although Aerospike automatically determines the smallest flush increment for a given drive, this can be raised with the optional commit-min-size parameter.

Start the servers.

systemctl start aerospike

The cluster forms with all nodes but without a per-roster namespace. In order to use your SC namespace, you'll need to add the roster. Until you do so, client requests will fail.

Admin> show stat -flip like cluster_size
~~~~~~~~~~~~~~~~~~~~~~~~Service Statistics~~~~~~~~~~~~~~~~~~~~~
NODE cluster_size
node2.aerospike.com:3000 5
node4.aerospike.com:3000 5
node5.aerospike.com:3000 5
node6.aerospike.com:3000 5
node7.aerospike.com:3000 5
Number of rows: 5

~~~~~~~~~~~~~~~~~~~~~~~~test Namespace Statistics~~~~~~~~~~~~~~~~~~~~~
NODE ns_cluster_size
node2.aerospike.com:3000 0
node4.aerospike.com:3000 0
node5.aerospike.com:3000 0
node6.aerospike.com:3000 0
node7.aerospike.com:3000 0
Number of rows: 5

This result is expected, since the test namespace's roster has not yet been modified. To make this namespace useful we will need to create a roster, see below.

Configure the initial roster

The roster is the list of nodes which are expected to be in the cluster, for this particular namespace. This list is stored persistently in a distributed table within each Aerospike server, similar to how index configuration is stored. In order to change and manipulate this roster, use the following Aerospike real-time tools. The tools referenced here are part of the aerospike-tools package, which should be installed.

Rosters are specified using node IDs. You can specify the node ID of each server, or use the automatically generated node ID. For further information about specifying node IDs, see that section.

The general process of managing a roster is to first bring together the cluster of nodes, then list the node IDs within the cluster that have the namespace defined, and finally add the node's IDs to the roster. To commit the changes, execute a recluster command.

Configure with asadm

With Tools package 6.2.x or later you can do this with the following commands:

Admin> enable;
Admin+> manage roster stage observed ns test
Pending roster now contains observed nodes.
Run "manage recluster" for your changes to take affect.
Admin+> manage recluster
Admin+> pager on
Admin+> show roster
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Roster (2021-10-22 20:14:01 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Node| Node ID|Namespace| Current Roster| Pending Roster| Observed Nodes
node2.aerospike.com:3000|BB9070016AE4202 |test |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202|BB9070016AE4202,BB9060016AE4202,BB9050016AE4202|BB9070016AE4202,BB9060016AE4202,BB9050016AE4202
node4.aerospike.com:3000|BB9060016AE4202 |test |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202|BB9070016AE4202,BB9060016AE4202,BB9050016AE4202|BB9070016AE4202,BB9060016AE4202,BB9050016AE4202
node6.aerospike.com:3000|*BB9050016AE4202|test |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202|BB9070016AE4202,BB9060016AE4202,BB9050016AE4202|BB9070016AE4202,BB9060016AE4202,BB9050016AE4202
Number of rows: 3

Configure with asinfo

Optionally, you can use the equivalent asinfo commands:

  1. Get a list of nodes with this namespace definition.
asinfo -v 'roster:namespace=[ns-name]'

The observed_nodes are the nodes which are in the cluster and have the namespace defined.

  1. Copy the observed_nodes list.
Admin+> asinfo -v "roster:namespace=test"
node6.aerospike.com:3000 (192.168.10.6) returned:
roster=null:pending_roster=null:observed_nodes=BB9070016AE4202,BB9060016AE4202,BB9050016AE4202

node4.aerospike.com:3000 (192.168.10.4) returned:
roster=null:pending_roster=null:observed_nodes=BB9070016AE4202,BB9060016AE4202,BB9050016AE4202

node2.aerospike.com:3000 (192.168.10.2) returned:
roster=null:pending_roster=null:observed_nodes=BB9070016AE4202,BB9060016AE4202,BB9050016AE4202
  1. Set the roster to the observed_nodes.
roster-set:namespace=[ns-name];nodes=[observed nodes list]
note

It is safe to issue this command multiple times, once for each observed node, or create a list. The roster is not active until you recluster.

Admin+> asinfo -v "roster-set:namespace=test;nodes=BB9070016AE4202,BB9060016AE4202,BB9050016AE4202" with BB9020016AE4202
node2.aerospike.com:3000 (192.168.10.2) returned:
ok

You now have a roster but it isn't active yet.

  1. Validate your roster* with the roster: command.
Admin+> asinfo -v "roster:"
node2.aerospike.com:3000 (192.168.10.2) returned:
ns=test:roster=null:pending_roster=BB9070016AE4202,BB9060016AE4202,BB9050016AE4202:observed_nodes=null

node6.aerospike.com:3000 (192.168.10.6) returned:
ns=test:roster=null:pending_roster=BB9070016AE4202,BB9060016AE4202,BB9050016AE4202:observed_nodes=null

node4.aerospike.com:3000 (192.168.10.4) returned:
ns=test:roster=null:pending_roster=BB9070016AE4202,BB9060016AE4202,BB9050016AE4202:observed_nodes=null

Roster is null but pending_roster is set with the provided roster.

  1. Apply the pending_roster with the recluster: command.
Admin+> asinfo -v "recluster:"
node2.aerospike.com:3000 (192.168.10.2) returned:
ignored-by-non-principal

node6.aerospike.com:3000 (192.168.10.6) returned:
ignored-by-non-principal

node4.aerospike.com:3000 (192.168.10.4) returned:
ok
  1. Verify that the new roster was applied with the roster: command.
Admin+> asinfo -v "roster:"
node2.aerospike.com:3000 (192.168.10.2) returned:
ns=test:roster=BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202:pending_roster=BB9070016AE4202,
BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202

node6.aerospike.com:3000 (192.168.10.6) returned:
ns=test:roster=BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202:pending_roster=BB9070016AE4202,
BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202

node4.aerospike.com:3000 (192.168.10.4) returned:
ns=test:roster=BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202:pending_roster=BB9070016AE4202,
BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202

Both roster and pending_roster are set to the provided roster.

  1. Validate that the namespace cluster size agrees with the service cluster size.

The namespace statistic ns_cluster_size should now agree with the service cluster_size assuming all nodes in the service have this namespace. When they do not, it could mean that either the namespace is not defined on all nodes, or nodes are missing from the roster.

Admin> show stat -flip like cluster_size
~~~~~~~~~~~~~~~~~~~~~~~~Service Statistics~~~~~~~~~~~~~~~~~~~~~
NODE cluster_size
node2.aerospike.com:3000 5
node4.aerospike.com:3000 5
node5.aerospike.com:3000 5
node6.aerospike.com:3000 5
node7.aerospike.com:3000 5
Number of rows: 5

~~~~~~~~~~~~~~~~~~~~~~~~test Namespace Statistics~~~~~~~~~~~~~~~~~~~~~
NODE ns_cluster_size
node2.aerospike.com:3000 5
node4.aerospike.com:3000 5
node5.aerospike.com:3000 5
node6.aerospike.com:3000 5
node7.aerospike.com:3000 5
Number of rows: 5

Rack awareness

For an SC namespace to be rack aware, the roster list becomes a series of node-id@rack-id pairs. Each entry in the roster list needs to define which rack a node is on (defaulting to rack-id 0 if none is defined). The roster can be manually constructed by appending @rack-id to each node-id in a comma-separated list.

Managing this list can be simplified by configuring rack-id on each node. These configured rack IDs automatically appear in the observed_nodes list, which can then be used as a roster list in the set-roster command.

note

The rack ID is specified in each namespace. It is possible to set different rack IDs for different namespaces, and it is possible to have some namespaces not be rack aware.

Configure rack awareness

Configure SC namespaces with rack awareness using the rack-id configuration to generate the observed_nodes list.

In SC mode, the rack-idconfiguration is only used to facilitate setting the roster by copying and pasting the observed_nodes list. Only the rack ID configured when setting the roster is applied, and it could be different than what has been configured in the configuration file or directly dynamically through the rack-id configuration parameter.

note

Starting with Database 7.2, the active rack feature dynamically designates a particular rack-id to hold all master partition copies. For instructions on how to enable active rack, see Designate an active rack.

Configure in the configuration file

Modify the rack-id parameter for each namespace in the aerospike.conf file.

namespace test {
replication-factor 2
memory-size 1G
default-ttl 0
strong-consistency true
rack-id 101

storage-engine device {
file /var/lib/aerospike/test.dat
filesize 4G
data-in-memory false
commit-to-device true
}
}

Configure dynamically

note

Tools package 6.2.x or later is required to use asadm's manage roster commands. Otherwise, use the equivalent asinfo - roster-set command described above.

  1. Start the Aerospike server.
systemctl start aerospike
  1. Copy the observed nodes into the pending roster with asadm's manage roster commands
Admin+> manage roster stage observed ns test
  1. View the roster with asadm's show roster and notice that the Pending Roster has been updated.

  2. Apply the roster with asadm's manage recluster command.

Admin+> manage recluster
Successfully started recluster
  1. Verify the configured rack-ids are active with asadm's show racks command., and that the displayed rack-id_ matches what is configured.
Admin+> show racks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Racks (2021-10-21 20:33:28 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Namespace|Rack| Nodes
| ID|
test |101 |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202
Number of rows: 1

The cluster is now configured with rack awareness.

Configure dynamically with manage config command

note

Tools package 6.0.x or later is required to use asadm's manage config commands. Otherwise, use the equivalent asinfo - set-config command described above.

  1. Dynamically set rack-id with asadm's manage config command:
Admin+> manage config namespace test param rack-id to 101 with 192.168.10.2 192.168.10.4 192.168.10.5
~Set Namespace Param rack-id to 101~
Node|Response
node2.aerospike.com:3000|ok
node4.aerospike.com:3000|ok
node5.aerospike.com:3000|ok
Number of rows: 3

Admin+> manage config namespace test param rack-id to 102 with 192.168.10.6 192.168.10.7
~Set Namespace Param rack-id to 102~
Node|Response
node6.aerospike.com:3000|ok
node7.aerospike.com:3000|ok
Number of rows: 2
  1. Issue a recluster with asadm's manage recluster command.
Admin+> manage recluster
Successfully started recluster

Rack awareness reads

Rack awareness also provides a mechanism for database clients to read from the servers in the closest rack or zone on a preferential basis. This can result in lower latency, increased stability, and significantly reduced traffic charges by limiting cross-availability-zone traffic.

The feature is available in Java, C, C#, Go and Python.

Set up rack aware reads

  1. Set up clusters in logical racks. (See Configure rack awareness)

  2. Set the rackId and rackAware flags in the ClientPolicy object. Use the rack ID specified in the nodes for the associated-AZ where that application is running. The following example uses Java to demonstrate how to enable rack awareness. Operations are similar in other clients.


    ClientPolicy clientPolicy = new ClientPolicy();
    clientPolicy.rackId = <<rack id>>;
    clientPolicy.rackAware = true;

note

To avoid hard-coding, the rack ID can be obtained dynamically using the cloud provider's API, or set in a local property file.

  1. Once the application has connected, set 2 additional parameters in the policy associated with the reads to be rack aware.


    Policy policy = new Policy();
    policy.readModeSC = ReadModeSC.ALLOW_REPLICA;
    policy.replica = Replica.PREFER_RACK;
    readModeSC.ALLOW_REPLICA indicates that all replicas can be consulted.
    policy.replica = Replica.PREFER_RACK indicates that the record in the same rack should be accessed if possible.

    tip
    • These parameters are additional to any other policy flags such as timeouts and retry intervals.
    • It is a best practice to set the reads to time out after a reasonable duration (e.g. 50ms) and have Aerospike automatically retry.
    • The retry is usually on a different node from the first read, giving more resilient reads in the face of a node failure.
    • If the node the client is reading from fails during the read, the retry moves to a different node in a different AZ with rack awareness set up, which will most likely still be there.

Designate an active rack

Active rack dynamically designates a particular rack-id to hold all master partition copies. For active-rack to take effect, all nodes must agree on the same active rack, and the number of racks must be at most equal to the configured replication-factor.

Also, active-rack = 0 disables the feature. This means that you can’t designate rack_id 0 as the active rack.

Changing the rack_id on all nodes with rack_id 0 to a new value that is distinct from any other racks does not cause any migrations.

Enable active rack in aerospike.conf

namespace ns-name {
...
rack-id X
active-rack Y # Y may be the same as X
...
}

Enable active rack using asadm

Admin> enable
Admin+> manage config namespace ns-name param active-rack to 1
~Set Namespace Param active-rack to 1~
Node|Response
172.22.22.1:3000|ok
172.22.22.2:3000|ok
172.22.22.3:3000|ok
172.22.22.4:3000|ok
172.22.22.5:3000|ok
172.22.22.6:3000|ok
Number of rows: 6

Enable active rack for SC

In SC mode, in contrast to AP mode, the information about the active rack must be communicated with the roster. The roster should be applied when the cluster is stable. Having these values on the roster ensures that the roster nodes, racks, and active rack agree on all nodes, even if the cluster is later split into subclusters by network partitions.

Use the following steps to enable active rack on a SC namespace.

  1. Configure the namespace.

    namespace cp {
    nsup-period 120
    default-ttl 5d
    replication-factor 2
    strong-consistency true
    active-rack 1

    storage-engine memory {
    data-size 2G
    evict-used-pct 60
    }
    }
  2. Initial balance: set the roster

    Admin> enable
    Admin+> manage roster stage observed ns cp
    Pending roster now contains observed nodes.
    Run "manage recluster" for your changes to take effect.
    Admin+> manage recluster

    Admin+> show pmap
    ~~~~~~~~~~~~~Partition Map Analysis (2024-07-26 17:30:10 UTC)~~~~~~~~~~~~~
    Namespace| Node| Cluster Key|~~~~~~~~~~~~Partitions~~~~~~~~~~~~
    | | |Primary|Secondary|Unavailable|Dead
    cp |172.22.22.1:3000|3ABD3A9B0390| 682| 683| 0| 0
    cp |172.22.22.2:3000|3ABD3A9B0390| 682| 683| 0| 0
    cp |172.22.22.3:3000|3ABD3A9B0390| 683| 682| 0| 0
    cp |172.22.22.4:3000|3ABD3A9B0390| 683| 682| 0| 0
    cp |172.22.22.5:3000|3ABD3A9B0390| 683| 683| 0| 0
    cp |172.22.22.6:3000|3ABD3A9B0390| 683| 683| 0| 0
    cp | | | 4096| 4096| 0| 0
    Number of rows: 6

  3. Configure racks.

    Admin+> manage config namespace cp param rack-id to 1 with 172.22.22.1 172.22.22.2 172.22.22.3
    ~Set Namespace Param rack-id to 1~
    Node|Response
    172.22.22.1:3000|ok
    172.22.22.2:3000|ok
    172.22.22.3:3000|ok
    Number of rows: 3

    Run "manage recluster" for your changes to rack-id to take effect.
    Admin+> manage recluster
    Successfully started recluster
    Admin+> manage roster stage observed ns cp
    Pending roster now contains observed nodes.
    Run "manage recluster" for your changes to take effect.
    Admin+> manage recluster
    Successfully started recluster

  4. Balanced with ‘active rack’ Verify that migrations have completed:

    ```asciidoc

    Admin+> info namespace object

    Namespace|            Node|Rack|  Repl|Expirations|  Total|~~~~~~~~~~Objects~~~~~~~~~~|~~~~~~~~~Tombstones~~~~~~~~|~~~~Pending~~~~
    | | ID|Factor| |Records| Master| Prole|Non-Replica| Master| Prole|Non-Replica|~~~~Migrates~~~
    | | | | | | | | | | | | Tx| Rx
    cp |172.22.22.1:3000| 1| 2| 0.000 |0.000 |0.000 |0.000 | 0.000 |0.000 |0.000 | 0.000 |0.000 |0.000
    cp |172.22.22.2:3000| 1| 2| 0.000 |0.000 |0.000 |0.000 | 0.000 |0.000 |0.000 | 0.000 |0.000 |0.000
    cp |172.22.22.3:3000| 1| 2| 0.000 |0.000 |0.000 |0.000 | 0.000 |0.000 |0.000 | 0.000 |0.000 |0.000
    cp |172.22.22.4:3000| 0| 2| 0.000 |0.000 |0.000 |0.000 | 0.000 |0.000 |0.000 | 0.000 |0.000 |0.000
    cp |172.22.22.5:3000| 0| 2| 0.000 |0.000 |0.000 |0.000 | 0.000 |0.000 |0.000 | 0.000 |0.000 |0.000
    cp |172.22.22.6:3000| 0| 2| 0.000 |0.000 |0.000 |0.000 | 0.000 |0.000 |0.000 | 0.000 |0.000 |0.000
    cp | | | | 0.000 |0.000 |0.000 |0.000 | 0.000 |0.000 |0.000 | 0.000 |0.000 |0.000
    Number of rows: 6

    ```
    Admin+> show pmap

    ```asciidoc
    ~~~~~~~~~~~~~Partition Map Analysis (2024-07-26 17:30:25 UTC)~~~~~~~~~~~~~
    Namespace| Node| Cluster Key|~~~~~~~~~~~~Partitions~~~~~~~~~~~~
    | | |Primary|Secondary|Unavailable|Dead
    cp |172.22.22.1:3000|7320E4F4EE63| 1365| 0| 0| 0
    cp |172.22.22.2:3000|7320E4F4EE63| 1365| 0| 0| 0
    cp |172.22.22.3:3000|7320E4F4EE63| 1366| 0| 0| 0
    cp |172.22.22.4:3000|7320E4F4EE63| 0| 1365| 0| 0
    cp |172.22.22.5:3000|7320E4F4EE63| 0| 1365| 0| 0
    cp |172.22.22.6:3000|7320E4F4EE63| 0| 1366| 0| 0
    cp | | | 4096| 4096| 0| 0
    Number of rows: 6

    ```
    All master (or “Primary”) partitions are now on the nodes that were designated `rack-id 1`.


Managing SC

Managing Data Consistency describes adding and removing nodes, starting and stopping servers safely, validating partition availability, and reviving dead partitions. Managing Data Consistency also describes the auto-revive feature, which was added in Database 7.1.