Cluster consistency

For the complete documentation index see: llms.txt

All documentation pages available in markdown.

This page describes how to manage the cluster nodes of a strong consistency (SC) namespace in the Aerospike Database.

Add nodes to the cluster and roster

Use the asadm manage roster & show roster commands. Otherwise, use the equivalent asinfo - roster and asinfo - roster-set commands.

Install and configure Aerospike on the new nodes as described in Configure strong consistency.

When the nodes have joined the cluster, use the following command to verify that the result in cluster_size is greater than the result in ns_cluster_size. In the following output, all 4 nodes report cluster_size: 4, but only the 3 existing roster members report ns_cluster_size: 3. The newly added node (node2) reports ns_cluster_size: 0 because it is not yet on the roster.

Admin> show stat -flip like cluster_size
~Service Statistics (2026-04-14 00:32:38 UTC)~
       Node|cluster_size
node1:3000 |           4
node2:3000 |           4
node3:3000 |           4
node4:3000 |           4
Number of rows: 4

~test Namespace Statistics (2026-04-14 00:32:38 UTC)~
       Node|ns_cluster_size
node1:3000 |              3
node2:3000 |              0
node3:3000 |              3
node4:3000 |              3
Number of rows: 4

Use show roster to see the newly observed nodes in its Observed Nodes section.

Use the following command to copy the Observed Nodes list into the Pending Roster.

Admin> enable
Admin+> manage roster stage observed ns test
Pending roster now contains observed nodes.
Run "manage recluster" for your changes to take affect.

Activate the new roster with the manage recluster command.
Terminal window
```
Admin+> manage recluster
Successfully started recluster
```
Run show roster to confirm that the roster has been updated on all nodes. Verify that the service’s cluster_size matches the namespace’s ns_cluster_size.

Remove nodes and update the roster

This section describes how to remove a node from an existing namespace configured with SC.

Remove node from the cluster

Run show roster. Verify all roster nodes are present in the cluster.

Admin+> show roster
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Roster (2026-04-14 00:32:55 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Node|         Node ID|Namespace|                                 Current Roster|                                 Pending Roster|                                 Observed Nodes
node1:3000|*BB978F2CCF9F18A|test     |BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA|BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA|BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA
node2:3000| BB9547249D8CE66|test     |BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA|BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA|BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA
node3:3000| BB919C6ABA304FA|test     |BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA|BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA|BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA
node4:3000| BB957376A7ADD8E|test     |BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA|BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA|BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA
Number of rows: 4

Safely shut down the nodes to be removed with the following command (in this example, on node2:3000 with node id BB9547249D8CE66). Verify you are removing fewer nodes than your configured RF.

Note
Any operational procedure that executes a SIGTERM is safe. After startup, a SIGTERM flushes data to disk and properly signal other servers.
Terminal window
```
systemctl stop aerospike
```

When migrations are complete, run the stat command on partitions_remaining until the migrate_partitions_remaining stat becomes zero on all nodes.

Admin> show stat service like partitions_remaining -flip
~~~~~Service Statistics (2026-04-14 00:33:10 UTC)~~~~
       Node|migrate_partitions_remaining
node1:3000 |                           0
node3:3000 |                           0
node4:3000 |                           0
Number of rows: 3

Remove node from the roster

Run show roster to verify that BB9547249D8CE66 is no longer in the Observed Nodes. In the following example there is one fewer node in the Observed Nodes column than in Current Roster and Pending Roster.

Admin+> show roster
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Roster (2026-04-14 00:34:56 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Node|         Node ID|Namespace|                                 Current Roster|                                 Pending Roster|                         Observed Nodes
node1:3000|*BB978F2CCF9F18A|test     |BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA|BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA|BB978F2CCF9F18A,BB957376A7ADD8E,BB919C6ABA304FA
node3:3000| BB919C6ABA304FA|test     |BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA|BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA|BB978F2CCF9F18A,BB957376A7ADD8E,BB919C6ABA304FA
node4:3000| BB957376A7ADD8E|test     |BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA|BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA|BB978F2CCF9F18A,BB957376A7ADD8E,BB919C6ABA304FA
Number of rows: 3

Copy the Observed Nodes list into the Pending Roster.

Admin+> manage roster stage observed ns test
Pending roster now contains observed nodes.
Run "manage recluster" for your changes to take affect.

Run manage recluster to apply the change.

Admin+> manage recluster
Successfully started recluster

Check if startup is complete. Run the following command periodically until it returns ok.
Terminal window
```
asinfo -h [ip of host] -v 'status'
```

Planned maintenance

For rolling upgrades and planned maintenance, such as OS patching or host reboots on SC namespaces, process one node at a time. In multi-AZ rack-aware deployments, you can process one rack at a time.

During planned maintenance, the same nodes return to the cluster after the upgrade or reboot, so the roster does not need to be updated. The roster only changes when you permanently add nodes or remove nodes. If a node fails to rejoin after maintenance and must be replaced, see Remove nodes and update the roster for how to update the roster, and Revive dead partitions if dead partitions result. If a procedure is stuck on a verification step or you encounter unexpected behavior, see Troubleshooting, search the Support Knowledge Base, or open a support case.

Before you begin

Configure migrate-fill-delay on every node to a value that exceeds the expected time for a single node to complete maintenance and rejoin. This suppresses unnecessary “fill” migrations to stand-in (non-roster) replicas while a node is temporarily out of the cluster. In SC namespaces, migrate-fill-delay only affects non-roster replicas; roster replica migrations proceed immediately. See Delay migrations for details.

Upgrading Aerospike (asd restart only, no host reboot)

When only the Aerospike daemon is restarted, for example, during a rolling upgrade, the node can warm restart because shared memory segments holding the primary index and secondary indexes in shared memory survive. Process one node at a time:

Quiesce the node, then trigger a recluster.

Admin+> manage quiesce with <node-ip>
Admin+> manage recluster

Verify: show statistics like pending_quiesce shows true on the target:

Admin+> show statistics like pending_quiesce
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~test Namespace Statistics (2026-04-14 00:28:10 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Node           |node1:3000|node2:3000|node3:3000|node4:3000
pending_quiesce|false     |true      |false     |false
Number of rows: 2

After recluster, show statistics like quiesce shows effective_is_quiesced: true on the target and nodes_quiesced: 1 on all nodes:

Admin+> show statistics like quiesce
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~test Namespace Statistics (2026-04-14 00:28:20 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Node                 |node1:3000|node2:3000|node3:3000|node4:3000
effective_is_quiesced|false     |true      |false     |false
nodes_quiesced       |1         |1         |1         |1
pending_quiesce      |false     |true      |false     |false
Number of rows: 4

Wait for the quiesce handoff to complete. The quiesced node hands off master status. Wait until no active traffic or proxies are hitting the quiesced node.
Terminal window
```
Admin+> show latencies
```
Verify: ops/sec drops to zero on the quiesced node, then confirm client_proxy_* and batch_sub_proxy_* counters stop incrementing on the quiesced node, and from_proxy_* counters stop on the remaining nodes. See the quiesce verification reference for full output examples.

Shut down asd, perform the upgrade, and restart asd. The node warm restarts.

sudo systemctl stop aerospike
# ... perform upgrade ...
sudo systemctl start aerospike

Verify: info network shows the node has rejoined at the expected cluster size.

Admin> info network
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Network Information (2026-04-14 00:29:18 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         Node|         Node ID|          IP|    Build|Migrations|~~~~~~~~~~~~~~~~~~Cluster~~~~~~~~~~~~~~~~~~|Client|  Uptime
             |                |            |         |          |Size|         Key|Integrity|      Principal| Conns|
node1:3000   | BB94B7AEB45DB52|10.0.3.1:3000|E-8.1.1.2|   0.000  |   4|AA5AF50552AF|True     |BB9D787F6BAF3D6|     5|00:20:34
node2:3000   |*BB9D787F6BAF3D6|10.0.3.2:3000|E-8.1.1.2|   0.000  |   4|AA5AF50552AF|True     |BB9D787F6BAF3D6|     5|00:00:14
node3:3000   | BB989A1BF1D8116|10.0.3.3:3000|E-8.1.1.2|   0.000  |   4|AA5AF50552AF|True     |BB9D787F6BAF3D6|     5|00:20:34
node4:3000   | BB9D48DF5A70CEE|10.0.3.4:3000|E-8.1.1.2|   0.000  |   4|AA5AF50552AF|True     |BB9D787F6BAF3D6|     5|00:20:34
Number of rows: 4

Validate that the cluster has no unavailable or dead partitions.
Repeat from step 1 for the next node.

Planned maintenance with host reboot

When the host itself is rebooted, shared memory is wiped and the node cold restarts unless the primary index is persisted beforehand. Use ASMT to avoid a cold restart and its side effects, such as potential unreplicated records and zombie records. Process one node at a time:

Quiesce the node, then trigger a recluster.
Terminal window
```
Admin+> manage quiesce with <node-ip>
Admin+> manage recluster
```
Verify: same as rolling upgrade step 1.
Wait for the quiesce handoff to complete.
Terminal window
```
Admin+> show latencies
```
Verify: same as rolling upgrade step 2.
Shut down asd.
Terminal window
```
sudo systemctl stop aerospike
```
Back up the indexes of each namespace with asmt. The -z option enables compression, which is recommended for planned maintenance.
Terminal window
```
sudo asmt -b -v -z -p <path-to-backup-directory> -n <ns1,ns2,...>
```
See Back up indexes with ASMT for full output details.

Note
ASMT is not needed for namespaces whose primary index is configured on Persistent Memory or Flash (SSDs), as those persist across reboots. Similarly, secondary indexes configured with sindex-type flash or sindex-type pmem persist across reboots and do not require ASMT.
See When not to use ASMT.
Reboot the host and perform OS or hardware maintenance.
After the host is back, restore the indexes of each namespace with asmt. The -z option is not needed; ASMT auto-detects compressed files.
Terminal window
```
sudo asmt -r -v -p <path-to-backup-directory> -n <ns1,ns2,...>
```
See Restore indexes with ASMT for full output details.

Start asd. The node warm restarts from the restored index instead of cold restarting. The Aerospike log confirms this with beginning warm restart for each namespace (instead of beginning cold start).

sudo systemctl start aerospike

Verify: info network shows the node has rejoined at the expected cluster size.

Admin> info network
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Network Information (2026-04-14 00:35:42 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         Node|         Node ID|          IP|    Build|Migrations|~~~~~~~~~~~~~~~~~~Cluster~~~~~~~~~~~~~~~~~~|Client|  Uptime
             |                |            |         |          |Size|         Key|Integrity|      Principal| Conns|
node1:3000   | BB94B7AEB45DB52|10.0.3.1:3000|E-8.1.1.2|   0.000  |   4|3866DA39491B|True     |BB9D48DF5A70CEE|     5|00:23:37
node2:3000   | BB90B0CA8BD688A|10.0.3.2:3000|E-8.1.1.2|   0.000  |   4|3866DA39491B|True     |BB9D48DF5A70CEE|     5|00:00:14
node3:3000   | BB989A1BF1D8116|10.0.3.3:3000|E-8.1.1.2|   0.000  |   4|3866DA39491B|True     |BB9D48DF5A70CEE|     5|00:23:37
node4:3000   |*BB9D48DF5A70CEE|10.0.3.4:3000|E-8.1.1.2|   0.000  |   4|3866DA39491B|True     |BB9D48DF5A70CEE|     5|00:23:37
Number of rows: 4

Validate that the cluster has no unavailable or dead partitions.
Repeat from step 1 for the next node.

Rack at a time (multi-AZ rack-aware deployments)

In a rack-aware cluster deployed across multiple availability zones, you can take down a full rack at a time instead of one node at a time, provided the remaining racks can maintain partition availability.

Availability requirements

The remaining racks must hold enough roster replicas to keep all partitions available while the target rack is down. In SC, a partition is available when the surviving nodes form a majority of the roster and at least one roster replica for that partition is among them. This depends on the number of racks and the replication-factor (RF):

3+ racks, RF >= number of racks (for example, 3 AZs with RF=3): Every partition has a roster replica on every rack. One rack down leaves a clear majority of roster nodes, each holding a replica. All partitions remain available. No special configuration is needed.
3+ racks, RF < number of racks (for example, 3 AZs with RF=2): Each partition has replicas on RF distinct racks. One rack down still leaves a majority of roster nodes, and because rack-aware placement spreads each partition’s replicas across different racks, at least one roster replica for every partition survives. All partitions remain available. Do not take down more than one rack at a time. Losing two of three racks removes the majority and causes unavailable partitions.
2 racks (any RF): Taking down one rack leaves exactly half the roster, so there is no majority. Partitions whose roster-master is on the downed rack become unavailable (~50%). Use the active-rack optimization in the following section to maintain full availability during planned maintenance.

Tie-breaker node deployments

Some deployments add a single permanently quiesced “tie-breaker” node on a third rack to provide a majority vote for two equally sized data racks. While this topology can maintain availability when one data rack is down, it introduces significant operational complexity: the tie-breaker has a divergent configuration, can interfere with migrate-fill-delay behavior, may unexpectedly accept writes in small clusters, and adds network connectivity requirements across three zones instead of two. The rack-by-rack procedures on this page apply to the two data racks, but do not address tie-breaker-specific considerations. If you operate a tie-breaker deployment, see How to ensure availability in Strong Consistency with equally sized racks? for detailed caveats. For new deployments, prefer either three equal racks with RF=3 or two equal racks with the active-rack feature instead.

Procedure

Quiesce all nodes in the target rack and trigger a recluster.

Admin+> manage quiesce with <node-ip-1>
Admin+> manage quiesce with <node-ip-2>
...
Admin+> manage recluster

Verify: show statistics like quiesce shows effective_is_quiesced: true on the quiesced nodes and nodes_quiesced equals the number of quiesced nodes on all nodes.

Admin+> show statistics like quiesce
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~test Namespace Statistics (2026-04-14 00:41:02 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Node                 |node1:3000|node2:3000|node3:3000|node4:3000|node5:3000|node6:3000
effective_is_quiesced|false     |false     |true      |true      |false     |false
nodes_quiesced       |2         |2         |2         |2         |2         |2
pending_quiesce      |false     |false     |true      |true      |false     |false
Number of rows: 4

Wait for the quiesce handoff. Verify no traffic or proxies reach the quiesced nodes.
Terminal window
```
Admin+> show latencies
```
Verify: ops/sec drops to zero on all quiesced nodes, client_proxy_* and batch_sub_proxy_* counters stop incrementing, and from_proxy_* counters stop on the remaining nodes. See the quiesce verification reference for full output examples.
Recommended: Dynamically change cluster-name on the nodes in that rack to a different value. This ejects them from the cluster cleanly. The nodes depart on their own rather than being detected as failed, which avoids the evade flag that would otherwise exclude them from super-majority calculations on rejoin. The static configuration file (aerospike.conf) retains the original cluster-name, so no file edits are needed.
Terminal window
```
asinfo -v 'set-config:context=service;cluster-name=maintenance-temp' -h <node-ip>
```
Tip
A clean asd shutdown via systemctl stop (SIGTERM) does not set the evade flag, so this step is not strictly required when you can guarantee a clean shutdown. It is defense-in-depth: it ensures a clean departure even if the shutdown is interrupted (for example, by an OOM kill or crash). It is essential if you need to eject nodes while they are still running.
Shut down asd on each node. If hosts will be rebooted, use ASMT to back up the indexes of each namespace before rebooting and restore them afterward.
Terminal window
```
sudo systemctl stop aerospike
# If rebooting:
sudo asmt -b -v -z -p <path-to-backup-directory> -n <ns1,ns2,...>
# ... reboot and perform maintenance ...
sudo asmt -r -v -p <path-to-backup-directory> -n <ns1,ns2,...>
```
Verify: Validate that the cluster reports zero unavailable and zero dead partitions with the rack down. This confirms the remaining racks hold a majority of the roster with at least one replica for every partition.
Perform maintenance on the rack’s hosts.

Start asd. On startup, the node reads the original cluster-name from the static configuration and automatically rejoins the main cluster.

sudo systemctl start aerospike

Verify: info network shows all nodes have rejoined at the expected cluster size.

Admin> info network
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Network Information (2026-04-14 00:48:17 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         Node|         Node ID|          IP|    Build|Migrations|~~~~~~~~~~~~~~~~~~Cluster~~~~~~~~~~~~~~~~~~|Client|  Uptime
             |                |            |         |          |Size|         Key|Integrity|      Principal| Conns|
node1:3000   | BB94B7AEB45DB52|10.0.3.1:3000|E-8.1.1.2|   0.000  |   6|A3F7C912DE4B|True     |BB9E312FA6C9B28|     5|01:11:25
node2:3000   |*BB9D787F6BAF3D6|10.0.3.2:3000|E-8.1.1.2|   0.000  |   6|A3F7C912DE4B|True     |BB9E312FA6C9B28|     5|01:11:25
node3:3000   | BB989A1BF1D8116|10.0.3.3:3000|E-8.1.1.2|   0.000  |   6|A3F7C912DE4B|True     |BB9E312FA6C9B28|     5|00:00:14
node4:3000   | BB9D48DF5A70CEE|10.0.3.4:3000|E-8.1.1.2|   0.000  |   6|A3F7C912DE4B|True     |BB9E312FA6C9B28|     5|00:00:14
node5:3000   | BB9A63D2E17F490|10.0.3.5:3000|E-8.1.1.2|   0.000  |   6|A3F7C912DE4B|True     |BB9E312FA6C9B28|     5|01:11:25
node6:3000   | BB9E312FA6C9B28|10.0.3.6:3000|E-8.1.1.2|   0.000  |   6|A3F7C912DE4B|True     |BB9E312FA6C9B28|     5|01:11:25
Number of rows: 6

Validate that the cluster has no unavailable or dead partitions and that all nodes have rejoined.

Wait for migrations to complete. Use cluster-stable to check. It returns ERROR while migrations are in progress and the cluster key when they are done. Run it periodically until all nodes return the same key:

Admin+> asinfo -v 'cluster-stable:size=6;ignore-migrations=false'
node1:3000 (10.0.3.1) returned:
A3F7C912DE4B

node2:3000 (10.0.3.2) returned:
A3F7C912DE4B

node3:3000 (10.0.3.3) returned:
A3F7C912DE4B

node4:3000 (10.0.3.4) returned:
A3F7C912DE4B

node5:3000 (10.0.3.5) returned:
A3F7C912DE4B

node6:3000 (10.0.3.6) returned:
A3F7C912DE4B

Repeat for the next rack.

Optimization with `active-rack`

In Database 7.2.0 and later, the active-rack feature can both ensure full availability for two equally sized racks and shorten the procedure. When active-rack is configured, the designated active rack holds all master partitions. The passive rack holds only secondaries. This means:

The quiesce step can be skipped for the passive rack. Since the passive rack has no masters, there is no master handoff needed, so taking it down does not cause a master gap.
The active rack remains fully available while the passive rack is down.

Procedure with active-rack:

Designate the rack that will stay up as active-rack. In SC mode, this requires setting the config, reclustering, re-rostering (with manage roster stage observed), and reclustering again. See Change active rack for SC dynamically. Wait for migrations to complete (masters move to the active rack).
Terminal window
```
Admin+> manage config namespace <ns> param active-rack to <rack-id>
Admin+> manage recluster
Admin+> manage roster stage observed ns <ns>
Admin+> manage recluster
```
The manage roster stage observed command prompts for interactive confirmation because the active-rack change modifies the roster prefix (for example, from no marker to M1). For scripted or non-interactive use, set the roster directly with roster-set, including the M<rack-id> prefix that appears in the Observed Nodes list:
Terminal window
```
asinfo -v 'roster-set:namespace=<ns>;nodes=M<rack-id>|<node1>@<rack>,<node2>@<rack>,...'
asadm ... -e "manage recluster"
```
Verify: asinfo -v 'cluster-stable:size=N;ignore-migrations=false' returns the same cluster key on all nodes (migrations complete). Use show pmap to confirm all Primary partitions are on the active rack’s nodes.
Recommended: Dynamically change cluster-name on the passive rack’s nodes to eject them from the cluster. The static config retains the original cluster-name. See the tip about when this step is essential in the rack procedure.
Terminal window
```
asinfo -v 'set-config:context=service;cluster-name=maintenance-temp' -h <node-ip>
```

Shut down asd. If hosts will be rebooted, use ASMT to back up and later restore the indexes of each namespace.

sudo systemctl stop aerospike
# If rebooting:
sudo asmt -b -v -z -p <path-to-backup-directory> -n <ns1,ns2,...>
# ... reboot and perform maintenance ...
sudo asmt -r -v -p <path-to-backup-directory> -n <ns1,ns2,...>

Perform maintenance on the passive rack’s hosts.

Start asd. The node reads the original cluster-name from static config and rejoins automatically.

sudo systemctl start aerospike

Verify: info network shows all nodes have rejoined at the expected cluster size.

Admin> info network
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Network Information (2026-04-14 01:12:45 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         Node|         Node ID|          IP|    Build|Migrations|~~~~~~~~~~~~~~~~~~Cluster~~~~~~~~~~~~~~~~~~|Client|  Uptime
             |                |            |         |          |Size|         Key|Integrity|      Principal| Conns|
node1:3000   | BB94B7AEB45DB52|10.0.3.1:3000|E-8.1.1.2|   0.000  |   4|D74E19A3B82C|True     |BB9D48DF5A70CEE|     5|01:35:53
node2:3000   |*BB9D787F6BAF3D6|10.0.3.2:3000|E-8.1.1.2|   0.000  |   4|D74E19A3B82C|True     |BB9D48DF5A70CEE|     5|01:35:53
node3:3000   | BB989A1BF1D8116|10.0.3.3:3000|E-8.1.1.2|   0.000  |   4|D74E19A3B82C|True     |BB9D48DF5A70CEE|     5|00:00:14
node4:3000   | BB9D48DF5A70CEE|10.0.3.4:3000|E-8.1.1.2|   0.000  |   4|D74E19A3B82C|True     |BB9D48DF5A70CEE|     5|00:00:14
Number of rows: 4

Validate that the cluster has no unavailable or dead partitions and that all nodes have rejoined.

Wait for migrations to complete before switching active-rack. Use cluster-stable to check. It returns ERROR while migrations are in progress and the cluster key when they are done. Run it periodically until all nodes return the same key:

Admin+> asinfo -v 'cluster-stable:size=4;ignore-migrations=false'
node1:3000 (10.0.3.1) returned:
D74E19A3B82C

node2:3000 (10.0.3.2) returned:
D74E19A3B82C

node3:3000 (10.0.3.3) returned:
D74E19A3B82C

node4:3000 (10.0.3.4) returned:
D74E19A3B82C

Switch active-rack to point to the now-maintained rack (repeat the set config / recluster / re-roster / recluster sequence). Wait for migrations to complete.
Repeat steps 2-8 for the other rack.
After both racks are maintained, disable active-rack to restore normal balanced partition distribution.
Terminal window
```
Admin+> manage config namespace <ns> param active-rack to 0
Admin+> manage recluster
Admin+> manage roster stage observed ns <ns>
Admin+> manage recluster
```
Verify: show pmap shows Primary and Secondary partitions evenly distributed across all nodes.

Validate partitions

When you validate partitions, each node reports the global number of dead or unavailable partitions. For example, if the entire cluster has determined that 100 partitions are unavailable, all of the current nodes report 100 unavailable partitions.

Use show pmap to display the partition map. The Unavailable and Dead columns should be 0 for each node.

Admin> show pmap
~~~~~~~~~~~~~~~~~~~Partition Map Analysis~~~~~~~~~~~~~~~~~~
Namespace|     Node| Cluster Key|~~~~~~~~~~~~Partitions~~~~~~~~~~~~
         |         |            |Primary|Secondary|Unavailable|Dead
test     |node1:3000|A1D7BBA0D9EF|   1024|     1024|          0|   0
test     |node2:3000|A1D7BBA0D9EF|   1024|     1024|          0|   0
test     |node3:3000|A1D7BBA0D9EF|   1024|     1024|          0|   0
test     |node4:3000|A1D7BBA0D9EF|   1024|     1024|          0|   0
test     |         |            |   4096|     4096|          0|   0
Number of rows: 4

Revive dead partitions

You may wish to use your namespace in spite of potentially missing data. For example, you may have entered a maintenance state where you have disabled application use, and are preparing to reapply data from a reliable message queue or other source.

Identify dead partitions with show pmap.

Admin> show pmap
~~~~~~~~~~~~~~~~~~~Partition Map Analysis~~~~~~~~~~~~~~~~~~
Namespace|     Node| Cluster Key|~~~~~~~~~~~~Partitions~~~~~~~~~~~~
         |         |            |Primary|Secondary|Unavailable|Dead
test     |node1:3000|A1D7BBA0D9EF|    915|      915|          0| 264
test     |node2:3000|A1D7BBA0D9EF|    915|      915|          0| 264
test     |node3:3000|A1D7BBA0D9EF|    915|      915|          0| 264
test     |node4:3000|A1D7BBA0D9EF|    915|      915|          0| 264
test     |         |            |   3660|     3660|          0| 264
Number of rows: 4

Run revive to acknowledge the potential data loss on each server.

Admin+> manage revive ns test
~~~Revive Namespace Partitions~~~
       Node|Response
node1:3000 |ok
node2:3000 |ok
node3:3000 |ok
node4:3000 |ok
Number of rows: 4

Run recluster to revive the dead partitions.

Admin+> manage recluster
Successfully started recluster

Verify that there are no longer any dead partitions with show pmap.

Admin> show pmap
~~~~~~~~~~~~~~~~~~~Partition Map Analysis~~~~~~~~~~~~~~~~~~
Namespace|     Node| Cluster Key|~~~~~~~~~~~~Partitions~~~~~~~~~~~~
         |         |            |Primary|Secondary|Unavailable|Dead
test     |node1:3000|A1D7BBA0D9EF|   1024|     1024|          0|   0
test     |node2:3000|A1D7BBA0D9EF|   1024|     1024|          0|   0
test     |node3:3000|A1D7BBA0D9EF|   1024|     1024|          0|   0
test     |node4:3000|A1D7BBA0D9EF|   1024|     1024|          0|   0
test     |         |            |   4096|     4096|          0|   0
Number of rows: 4