Manage Nodes in Strong Consistency
Overviewโ
This page describes how to manage nodes in strong consistency (SC) namespaces. The topics here include how to add and remove nodes to the namespace and roster, how to validate partition availability, and how to revive dead partitions.
Add nodes to the cluster and rosterโ
Install and configure Aerospike on the new nodes as described in Install & Configure for Strong Consistency.
When the nodes have joined the cluster, use the following command to verify that the result in
cluster_size
is greater than the result inns_cluster_size
. This is visible in the following output wherecluster_size
shows 6 rows, while ns_cluster_size shows 5 rows.
Admin> show stat -flip like cluster_size
~Service Statistics (2021-10-22 23:43:35 UTC)~
Node|cluster_size
node1.aerospike.com:3000| 6
node2.aerospike.com:3000| 6
node4.aerospike.com:3000| 6
node5.aerospike.com:3000| 6
node6.aerospike.com:3000| 6
node7.aerospike.com:3000| 6
Number of rows: 6
~test Namespace Statistics (2021-10-22 23:43:35 UTC)~
Node|ns_cluster_size
node1.aerospike.com:3000| 5
node2.aerospike.com:3000| 5
node4.aerospike.com:3000| 5
node5.aerospike.com:3000| 5
node6.aerospike.com:3000| 5
node7.aerospike.com:3000| 5
Number of rows: 6Use
show roster
to see the newly observed nodes in its Observed Nodes section.
Tools package 6.2.x or later is
required to use the asadm
manage roster & show roster
commands. Otherwise, use the equivalent asinfo - roster and asinfo - roster-set commands.
Use the following command to copy the
Observed Nodes
list into thePending Roster
.Admin> enable
Admin+> manage roster stage observed ns test
Pending roster now contains observed nodes.
Run "manage recluster" for your changes to take affect.Activate the new roster with the
manage recluster
command.Admin+> manage recluster
Successfully started reclusterRun
show roster
to confirm that the roster has been updated on all nodes. Verify that the service'scluster_size
matches the namespace'sns_cluster_size
.
Remove nodes and update the rosterโ
This section describes how to remove a node from an existing namespace configured with SC.
Do not simultaneously remove a number of nodes equal to or greater than your replication-factor. Removing too many nodes simultaneously may result in unavailable partitions, meaning some data is unavailable to the application. If you observe unavailable partitions, re-add the nodes and wait for the cluster to synchronize before proceeding.
Namespaces with replication-factor
set to 1 (RF1) have some partitions unavailable
whenever any node leaves the cluster, making it impractical to
perform a rolling restart or upgrade.
Strong consistency implies the guarantee that with RF N, N copies of data are written to the cluster. A fully formed cluster must contain X nodes where X >= N to satisfy this. Creating a cluster where X = N means that all partitions become unavailable during a single node shut down.
Remove node from the clusterโ
Tools package 6.2.x or later is
required to use the asadm
manage roster & show roster
commands. Otherwise, use the equivalent asinfo - roster and asinfo - roster-set commands.
Run
show roster
. Verify all roster nodes are present in the cluster.Admin+> pager on
Admin+> show roster
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Roster (2021-10-23 00:08:54 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Node | Node ID | Namespace| Current Roster | Pending Roster | Observed Nodes
node1.aerospike.com:3000 |BB9070016AE4202 | test |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202 |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202 |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202
node3.aerospike.com:3000 |BB9060016AE4202 | test |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202 |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202 |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202
node4.aerospike.com:3000 |BB9050016AE4202 | test |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202 |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202 |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202
node5.aerospike.com:3000 |BB9040016AE4202 | test |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202 |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202 |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202
node6.aerospike.com:3000 |BB9010016AE4202 | test |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202 |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202 |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202
node7.aerospike.com:3000 |*BB9020016AE4202| test |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202 |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202 |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202
Number of rows: 6Shut down safely the nodes to be removed with the following command (in this example on
node1.aerospike.com:3000
with node idBB9070016AE4202
). Verify you are removing fewer nodes that your configured RF.noteAny operational procedure that executes a
SIGTERM
is safe. After startup, aSIGTERM
flushes data to disk and properly signal other servers.systemctl stop aerospike
When migrations are complete, run the
stat
command onpartitions_remaining
until themigrate_partitions_remaining
stat becomes zero on all nodes.Admin> show stat service like partitions_remaining -flip
~~~~~Service Statistics (2021-10-23 00:25:10 UTC)~~~~
Node|migrate_partitions_remaining
node2.aerospike.com:3000| 0
node4.aerospike.com:3000| 0
node5.aerospike.com:3000| 0
node6.aerospike.com:3000| 0
node7.aerospike.com:3000| 0
Number of rows: 5
Remove node from the rosterโ
Run
show roster
to verify that BB9070016AE4202 is removed from the pending roster. In the following example there is one fewer node in theObserved Nodes
column than inCurrent Roster
andPending Roster
.Admin+> pager on
Admin+> show roster
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Roster (2021-10-23 00:26:56 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Node| Node ID|Namespace| Current Roster| Pending Roster| Observed Nodes
node1.aerospike.com:3000|BB9070016AE4202 |test |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202
node3.aerospike.com:3000|BB9060016AE4202 |test |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202
node4.aerospike.com:3000|BB9050016AE4202 |test |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202
node5.aerospike.com:3000|BB9040016AE4202 |test |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202
node6.aerospike.com:3000|BB9010016AE4202 |test |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202
node7.aerospike.com:3000|*BB9020016AE4202|test |BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9070016AE4202,BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202|BB9060016AE4202,BB9050016AE4202,BB9040016AE4202,BB9020016AE4202,BB9010016AE4202
Number of rows: 6
Copy the
Observed Nodes
list into thePending Roster
.Admin+> manage roster stage observed ns test
Pending roster now contains observed nodes.
Run "manage recluster" for your changes to take affect.Run
manage recluster
to apply the change.Admin+> manage recluster
Successfully started reclusterCheck if startup is complete. Run the following command periodically until it returns
ok
.asinfo -h [ip of host] -v 'status'
Avoid executing a process shut down such as restarting during startup. For more information, see Dead Partitions.
Partition availabilityโ
In cases such as network partition and server hardware failure, partitions can become unavailable or dead. This section describes how to validate partitions and how to manage partitions that are unavailable or dead.
Unavailable partitionsโ
If you have unavailable partitions, there may be missing data. In other words, nodes that are expected in the roster are not currently in the cluster.
If roster nodes are missing from the cluster, take the initial step of restoring the missing nodes by changing the roster or fixing hardware or network issues.
Dead partitionsโ
If you have dead partitions, it means that all the roster nodes are in the cluster, but partitions are still unavailable due to storage failure or potential loss due to buffered writes.
Certain hardware failures, combined with certain Aerospike configurations, may result in one of the following possible cases where data has been lost:
- Drive loss, such as user actions to erase drives (clearing or wiping), or catastrophic hardware failure.
- If the
commit-to-device
configuration option is not set, untrusted or unclean shut down (crash, SIGKILL, SIGSEGV, and others). - Shutdown during a Fast Restart is not a "clean shutdown". It does not lead to data loss but may lead to dead partitions.
There may be lost data in these cases, depending on the number of nodes simultaneously affected, the replication factor, and whether data migration has completed for the partitions in question.
Aerospike detects multiple failures and potential true data loss. Affected partitions are marked "dead", and require user intervention to continue. Allowing reads in this circumstance violates strong consistency.
Unclean shut downโ
The effect of an unclean shutdown can be limited by setting the commit-to-device
namespace option.
With this option, simultaneous crashes do not cause data loss, and never generate dead partitions.
However, enabling commit-to-device
generates a flush on every write, and comes with significant performance penalties low write throughput cases, or on high performance storage such as some direct-attach NVMe drives or some high performance enterprise SANs.
If commit-to-device
is not enabled, server crashes may cause dead partitions.
User-generated restarts (upgrades) should not cause this effect, as a restart flushes buffered writes to disk and other metadata on shut down even without commit-to-device
.
Detected potential data lossโ
In the case of detected potential data loss, the cluster maintains unavailable dead partitions so that you can take corrective action. With Aerospike's feature that alerts you to these conditions, consistent reads are guaranteed, allowing time for user intervention.
You may determine that no data has been lost. For example, a shutdown during Fast Restart while running a test cluster, you may decide that availability in the case of this potential data loss is preferable. Or you may decide to restore data from an external source, if available.
If you have an external trusted source, consider disabling applications, reviving the potentially flawed namespace, restoring from the external trusted source, and then enabling applications.
Auto reviveโ
Database 7.1 introduced the auto-revive
feature, which selectively revives partitions at startup. Auto revive is for deployments that donโt have any remediation options available and/or want to minimize downtime due to potential data loss. Auto revive is selective; it will only revive partitions that could have data.
In a case where two nodes in a strong consistency, RF2 namespace were shut down uncleanly, via SIGKILL for instance, the cluster detects possible data loss, which may result in dead partitions. The partitions will be "unavailable" until the nodes are added back to the cluster. Only when all nodes from the roster are in the cluster will the partitions report as dead. You must follow remediation steps, including issuing the revive command to the cluster.
With auto-revive
enabled, the cluster ignores this potential data-loss event and continues to allow client traffic.
However, if the two nodes in this scenario had their data wiped, then auto revive would not restore the affected partitions and would still require that you take remediation steps which include issuing the revive command.
Validate partitionsโ
When you validate partitions, each node reports the global number of dead or unavailable partitions. For example, if the entire cluster has determined that 100 partitions are unavailable, all of the current nodes report 100 unavailable partitions.
Use the following command to display which nodes report dead or unavailable partitions.
show stat namespace for [NAMESPACE NAME] like 'unavailable|dead' -flip
In the output of this command, the columns
unavailable
anddead
should be 0 for each node.Admin> show stat namespace for test like 'unavailable|dead' -flip
~~~~~~test Namespace Statistics (2021-10-23 00:36:43 UTC)~~~~~~~
Node|dead_partitions|unavailable_partitions
node1.aerospike.com:30000| 0| 0
node2.aerospike.com:30000| 0| 0
node4.aerospike.com:30000| 0| 0
node5.aerospike.com:30000| 0| 0
node6.aerospike.com:30000| 0| 0
Number of rows: 5
Revive dead partitionsโ
You may wish to use your namespace in spite of potentially missing data. For example, you may have entered a maintenance state where you have disabled application use, and are preparing to reapply data from a reliable message queue or other source.
Identify dead partitions.
Admin> show stat namespace for test like dead -flip
~test Namespace Statistics (2021-10-23 00:38:41 UTC)~
Node|dead_partitions
node1.aerospike.com:3000| 264
node2.aerospike.com:3000| 264
node4.aerospike.com:3000| 264
node5.aerospike.com:3000| 264
node6.aerospike.com:3000| 264
Number of rows: 5Run
revive
to acknowledge the potential data loss on each server.Admin+> manage revive ns test
~~~Revive Namespace Partitions~~~
Node|Response
node1.aerospike.com:3000|ok
node2.aerospike.com:3000|ok
node4.aerospike.com:3000|ok
node5.aerospike.com:3000|ok
node6.aerospike.com:3000|ok
Number of rows: 5Run
recluster
to revive the dead partitions.Admin+> manage recluster
Successfully started reclusterVerify that there are no longer any dead partitions with the
dead_partitions
metric.Admin> show stat namespace for test like dead -flip
~test Namespace Statistics (2021-10-23 00:40:41 UTC)~
Node|dead_partitions
node1.aerospike.com:3000| 0
node2.aerospike.com:3000| 0
node4.aerospike.com:3000| 0
node5.aerospike.com:3000| 0
node6.aerospike.com:3000| 0
Number of rows: 5