Skip to content

Cluster consistency

This page describes how to manage the cluster nodes of a strong consistency (SC) namespace in the Aerospike Database.

Add nodes to the cluster and roster

Use the asadm manage roster & show roster commands. Otherwise, use the equivalent asinfo - roster and asinfo - roster-set commands.

  1. Install and configure Aerospike on the new nodes as described in Configure strong consistency.

  2. When the nodes have joined the cluster, use the following command to verify that the result in cluster_size is greater than the result in ns_cluster_size. In the following output, all 4 nodes report cluster_size: 4, but only the 3 existing roster members report ns_cluster_size: 3. The newly added node (node2) reports ns_cluster_size: 0 because it is not yet on the roster.

    Terminal window
    Admin> show stat -flip like cluster_size
    ~Service Statistics (2026-04-14 00:32:38 UTC)~
    Node|cluster_size
    node1:3000 | 4
    node2:3000 | 4
    node3:3000 | 4
    node4:3000 | 4
    Number of rows: 4
    ~test Namespace Statistics (2026-04-14 00:32:38 UTC)~
    Node|ns_cluster_size
    node1:3000 | 3
    node2:3000 | 0
    node3:3000 | 3
    node4:3000 | 3
    Number of rows: 4
  3. Use show roster to see the newly observed nodes in its Observed Nodes section.

  4. Use the following command to copy the Observed Nodes list into the Pending Roster.

    Terminal window
    Admin> enable
    Admin+> manage roster stage observed ns test
    Pending roster now contains observed nodes.
    Run "manage recluster" for your changes to take affect.
  5. Activate the new roster with the manage recluster command.

    Terminal window
    Admin+> manage recluster
    Successfully started recluster
  6. Run show roster to confirm that the roster has been updated on all nodes. Verify that the service’s cluster_size matches the namespace’s ns_cluster_size.

Remove nodes and update the roster

This section describes how to remove a node from an existing namespace configured with SC.

Remove node from the cluster

  1. Run show roster. Verify all roster nodes are present in the cluster.

    Terminal window
    Admin+> show roster
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Roster (2026-04-14 00:32:55 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Node| Node ID|Namespace| Current Roster| Pending Roster| Observed Nodes
    node1:3000|*BB978F2CCF9F18A|test |BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA|BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA|BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA
    node2:3000| BB9547249D8CE66|test |BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA|BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA|BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA
    node3:3000| BB919C6ABA304FA|test |BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA|BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA|BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA
    node4:3000| BB957376A7ADD8E|test |BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA|BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA|BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA
    Number of rows: 4
  2. Safely shut down the nodes to be removed with the following command (in this example, on node2:3000 with node id BB9547249D8CE66). Verify you are removing fewer nodes than your configured RF.

    Terminal window
    systemctl stop aerospike
  3. When migrations are complete, run the stat command on partitions_remaining until the migrate_partitions_remaining stat becomes zero on all nodes.

    Terminal window
    Admin> show stat service like partitions_remaining -flip
    ~~~~~Service Statistics (2026-04-14 00:33:10 UTC)~~~~
    Node|migrate_partitions_remaining
    node1:3000 | 0
    node3:3000 | 0
    node4:3000 | 0
    Number of rows: 3

Remove node from the roster

  1. Run show roster to verify that BB9547249D8CE66 is no longer in the Observed Nodes. In the following example there is one fewer node in the Observed Nodes column than in Current Roster and Pending Roster.

    Terminal window
    Admin+> show roster
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Roster (2026-04-14 00:34:56 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Node| Node ID|Namespace| Current Roster| Pending Roster| Observed Nodes
    node1:3000|*BB978F2CCF9F18A|test |BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA|BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA|BB978F2CCF9F18A,BB957376A7ADD8E,BB919C6ABA304FA
    node3:3000| BB919C6ABA304FA|test |BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA|BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA|BB978F2CCF9F18A,BB957376A7ADD8E,BB919C6ABA304FA
    node4:3000| BB957376A7ADD8E|test |BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA|BB978F2CCF9F18A,BB957376A7ADD8E,BB9547249D8CE66,BB919C6ABA304FA|BB978F2CCF9F18A,BB957376A7ADD8E,BB919C6ABA304FA
    Number of rows: 3
  2. Copy the Observed Nodes list into the Pending Roster.

    Terminal window
    Admin+> manage roster stage observed ns test
    Pending roster now contains observed nodes.
    Run "manage recluster" for your changes to take affect.
  3. Run manage recluster to apply the change.

    Terminal window
    Admin+> manage recluster
    Successfully started recluster
  4. Check if startup is complete. Run the following command periodically until it returns ok.

    Terminal window
    asinfo -h [ip of host] -v 'status'

Planned maintenance

For rolling upgrades and planned maintenance, such as OS patching or host reboots on SC namespaces, process one node at a time. In multi-AZ rack-aware deployments, you can process one rack at a time.

During planned maintenance, the same nodes return to the cluster after the upgrade or reboot, so the roster does not need to be updated. The roster only changes when you permanently add nodes or remove nodes. If a node fails to rejoin after maintenance and must be replaced, see Remove nodes and update the roster for how to update the roster, and Revive dead partitions if dead partitions result. If a procedure is stuck on a verification step or you encounter unexpected behavior, see Troubleshooting, search the Support Knowledge Base, or open a support case.

Before you begin

Configure migrate-fill-delay on every node to a value that exceeds the expected time for a single node to complete maintenance and rejoin. This suppresses unnecessary “fill” migrations to stand-in (non-roster) replicas while a node is temporarily out of the cluster. In SC namespaces, migrate-fill-delay only affects non-roster replicas; roster replica migrations proceed immediately. See Delay migrations for details.

If you set migrate-fill-delay dynamically, the value reverts to the static configuration on node restart. Since planned maintenance involves restarting nodes, set this value in the configuration file (aerospike.conf) so it persists across restarts.

Upgrading Aerospike (asd restart only, no host reboot)

When only the Aerospike daemon is restarted, for example, during a rolling software upgrade, the node can warm restart because shared memory segments holding the primary index and secondary indexes in shared memory survive. Process one node at a time:

  1. Quiesce the node, then trigger a recluster.

    Terminal window
    Admin+> manage quiesce with <node-ip>
    Admin+> manage recluster

    Verify: show statistics like pending_quiesce shows true on the target:

    Terminal window
    Admin+> show statistics like pending_quiesce
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~test Namespace Statistics (2026-04-14 00:28:10 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Node |node1:3000|node2:3000|node3:3000|node4:3000
    pending_quiesce|false |true |false |false
    Number of rows: 2

    After recluster, show statistics like quiesce shows effective_is_quiesced: true on the target and nodes_quiesced: 1 on all nodes:

    Terminal window
    Admin+> show statistics like quiesce
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~test Namespace Statistics (2026-04-14 00:28:20 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Node |node1:3000|node2:3000|node3:3000|node4:3000
    effective_is_quiesced|false |true |false |false
    nodes_quiesced |1 |1 |1 |1
    pending_quiesce |false |true |false |false
    Number of rows: 4
  2. Wait for the quiesce handoff to complete. The quiesced node hands off master status. Wait until no active traffic or proxies are hitting the quiesced node.

    Terminal window
    Admin+> show latencies

    Verify: ops/sec drops to zero on the quiesced node, then confirm client_proxy_* and batch_sub_proxy_* counters stop incrementing on the quiesced node, and from_proxy_* counters stop on the remaining nodes. See the quiesce verification reference for full output examples.

  3. Shut down asd, perform the upgrade, and restart asd. The node warm restarts.

    Terminal window
    $ sudo systemctl stop aerospike
    # ... perform upgrade ...
    $ sudo systemctl start aerospike

    Verify: info network shows the node has rejoined at the expected cluster size.

    Terminal window
    Admin> info network
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Network Information (2026-04-14 00:29:18 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Node| Node ID| IP| Build|Migrations|~~~~~~~~~~~~~~~~~~Cluster~~~~~~~~~~~~~~~~~~|Client| Uptime
    | | | | |Size| Key|Integrity| Principal| Conns|
    node1:3000 | BB94B7AEB45DB52|10.0.3.1:3000|E-8.1.1.2| 0.000 | 4|AA5AF50552AF|True |BB9D787F6BAF3D6| 5|00:20:34
    node2:3000 |*BB9D787F6BAF3D6|10.0.3.2:3000|E-8.1.1.2| 0.000 | 4|AA5AF50552AF|True |BB9D787F6BAF3D6| 5|00:00:14
    node3:3000 | BB989A1BF1D8116|10.0.3.3:3000|E-8.1.1.2| 0.000 | 4|AA5AF50552AF|True |BB9D787F6BAF3D6| 5|00:20:34
    node4:3000 | BB9D48DF5A70CEE|10.0.3.4:3000|E-8.1.1.2| 0.000 | 4|AA5AF50552AF|True |BB9D787F6BAF3D6| 5|00:20:34
    Number of rows: 4
  4. Validate that the cluster has no unavailable or dead partitions.

  5. Repeat from step 1 for the next node.

Planned maintenance with host reboot

When the host itself is rebooted, shared memory is wiped and the node cold restarts unless the primary index is persisted beforehand. Use ASMT to avoid a cold restart and its side effects, such as potential unreplicated records and zombie records. Process one node at a time:

  1. Quiesce the node, then trigger a recluster.

    Terminal window
    Admin+> manage quiesce with <node-ip>
    Admin+> manage recluster

    Verify: same as rolling upgrade step 1.

  2. Wait for the quiesce handoff to complete.

    Terminal window
    Admin+> show latencies

    Verify: same as rolling upgrade step 2.

  3. Shut down asd.

    Terminal window
    $ sudo systemctl stop aerospike
  4. Back up the indexes of each namespace with asmt. The -z option enables compression, which is recommended for planned maintenance.

    Terminal window
    $ sudo asmt -b -v -z -p <path-to-backup-directory> -n <ns1, ns2, ...>

    See Backing up indexes with ASMT for full output details.

  5. Reboot the host and perform OS or hardware maintenance.

  6. After the host is back, restore the indexes of each namespace with asmt. The -z option is not needed; ASMT auto-detects compressed files.

    Terminal window
    $ sudo asmt -r -v -p <path-to-backup-directory> -n <ns1, ns2, ...>

    See Restoring indexes with ASMT for full output details.

  7. Start asd. The node warm restarts from the restored index instead of cold restarting. The Aerospike log confirms this with beginning warm restart for each namespace (instead of beginning cold start).

    Terminal window
    $ sudo systemctl start aerospike

    Verify: info network shows the node has rejoined at the expected cluster size.

    Terminal window
    Admin> info network
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Network Information (2026-04-14 00:35:42 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Node| Node ID| IP| Build|Migrations|~~~~~~~~~~~~~~~~~~Cluster~~~~~~~~~~~~~~~~~~|Client| Uptime
    | | | | |Size| Key|Integrity| Principal| Conns|
    node1:3000 | BB94B7AEB45DB52|10.0.3.1:3000|E-8.1.1.2| 0.000 | 4|3866DA39491B|True |BB9D48DF5A70CEE| 5|00:23:37
    node2:3000 | BB90B0CA8BD688A|10.0.3.2:3000|E-8.1.1.2| 0.000 | 4|3866DA39491B|True |BB9D48DF5A70CEE| 5|00:00:14
    node3:3000 | BB989A1BF1D8116|10.0.3.3:3000|E-8.1.1.2| 0.000 | 4|3866DA39491B|True |BB9D48DF5A70CEE| 5|00:23:37
    node4:3000 |*BB9D48DF5A70CEE|10.0.3.4:3000|E-8.1.1.2| 0.000 | 4|3866DA39491B|True |BB9D48DF5A70CEE| 5|00:23:37
    Number of rows: 4
  8. Validate that the cluster has no unavailable or dead partitions.

  9. Repeat from step 1 for the next node.

Rack at a time (multi-AZ rack-aware deployments)

In a rack-aware cluster deployed across multiple availability zones, you can take down a full rack at a time instead of one node at a time, provided the remaining racks can maintain partition availability.

Availability requirements

The remaining racks must hold enough roster replicas to keep all partitions available while the target rack is down. In SC, a partition is available when the surviving nodes form a majority of the roster and at least one roster replica for that partition is among them. This depends on the number of racks and the replication-factor (RF):

  • 3+ racks, RF >= number of racks (for example, 3 AZs with RF=3): Every partition has a roster replica on every rack. One rack down leaves a clear majority of roster nodes, each holding a replica. All partitions remain available. No special configuration is needed.

  • 3+ racks, RF < number of racks (for example, 3 AZs with RF=2): Each partition has replicas on RF distinct racks. One rack down still leaves a majority of roster nodes, and because rack-aware placement spreads each partition’s replicas across different racks, at least one roster replica for every partition survives. All partitions remain available. Do not take down more than one rack at a time. Losing two of three racks removes the majority and causes unavailable partitions.

  • 2 racks (any RF): Taking down one rack leaves exactly half the roster, so there is no majority. Partitions whose roster-master is on the downed rack become unavailable (~50%). Use the active-rack optimization below to maintain full availability during planned maintenance.

Procedure

  1. Quiesce all nodes in the target rack and trigger a recluster.

    Terminal window
    Admin+> manage quiesce with <node-ip-1>
    Admin+> manage quiesce with <node-ip-2>
    ...
    Admin+> manage recluster

    Verify: show statistics like quiesce shows effective_is_quiesced: true on the quiesced nodes and nodes_quiesced equals the number of quiesced nodes on all nodes.

    Terminal window
    Admin+> show statistics like quiesce
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~test Namespace Statistics (2026-04-14 00:41:02 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Node |node1:3000|node2:3000|node3:3000|node4:3000|node5:3000|node6:3000
    effective_is_quiesced|false |false |true |true |false |false
    nodes_quiesced |2 |2 |2 |2 |2 |2
    pending_quiesce |false |false |true |true |false |false
    Number of rows: 4
  2. Wait for the quiesce handoff. Verify no traffic or proxies reach the quiesced nodes.

    Terminal window
    Admin+> show latencies

    Verify: ops/sec drops to zero on all quiesced nodes, client_proxy_* and batch_sub_proxy_* counters stop incrementing, and from_proxy_* counters stop on the remaining nodes. See the quiesce verification reference for full output examples.

  3. Recommended: Dynamically change cluster-name on the nodes in that rack to a different value. This ejects them from the cluster cleanly. The nodes depart on their own rather than being detected as failed, which avoids the evade flag that would otherwise exclude them from super-majority calculations on rejoin. The static configuration file (aerospike.conf) retains the original cluster-name, so no file edits are needed.

    Terminal window
    $ asinfo -v 'set-config:context=service;cluster-name=maintenance-temp' -h <node-ip>
  4. Shut down asd on each node. If hosts will be rebooted, use ASMT to back up the indexes of each namespace before rebooting and restore them afterward.

    Terminal window
    $ sudo systemctl stop aerospike
    # If rebooting:
    $ sudo asmt -b -v -z -p <path-to-backup-directory> -n <ns1, ns2, ...>
    # ... reboot and perform maintenance ...
    $ sudo asmt -r -v -p <path-to-backup-directory> -n <ns1, ns2, ...>

    Verify: Validate that the cluster reports zero unavailable and zero dead partitions with the rack down. This confirms the remaining racks hold a majority of the roster with at least one replica for every partition.

  5. Perform maintenance on the rack’s hosts.

  6. Start asd. On startup, the node reads the original cluster-name from the static configuration and automatically rejoins the main cluster.

    Terminal window
    $ sudo systemctl start aerospike

    Verify: info network shows all nodes have rejoined at the expected cluster size.

    Terminal window
    Admin> info network
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Network Information (2026-04-14 00:48:17 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Node| Node ID| IP| Build|Migrations|~~~~~~~~~~~~~~~~~~Cluster~~~~~~~~~~~~~~~~~~|Client| Uptime
    | | | | |Size| Key|Integrity| Principal| Conns|
    node1:3000 | BB94B7AEB45DB52|10.0.3.1:3000|E-8.1.1.2| 0.000 | 6|A3F7C912DE4B|True |BB9E312FA6C9B28| 5|01:11:25
    node2:3000 |*BB9D787F6BAF3D6|10.0.3.2:3000|E-8.1.1.2| 0.000 | 6|A3F7C912DE4B|True |BB9E312FA6C9B28| 5|01:11:25
    node3:3000 | BB989A1BF1D8116|10.0.3.3:3000|E-8.1.1.2| 0.000 | 6|A3F7C912DE4B|True |BB9E312FA6C9B28| 5|00:00:14
    node4:3000 | BB9D48DF5A70CEE|10.0.3.4:3000|E-8.1.1.2| 0.000 | 6|A3F7C912DE4B|True |BB9E312FA6C9B28| 5|00:00:14
    node5:3000 | BB9A63D2E17F490|10.0.3.5:3000|E-8.1.1.2| 0.000 | 6|A3F7C912DE4B|True |BB9E312FA6C9B28| 5|01:11:25
    node6:3000 | BB9E312FA6C9B28|10.0.3.6:3000|E-8.1.1.2| 0.000 | 6|A3F7C912DE4B|True |BB9E312FA6C9B28| 5|01:11:25
    Number of rows: 6
  7. Validate that the cluster has no unavailable or dead partitions and that all nodes have rejoined.

  8. Wait for migrations to complete. Use cluster-stable to check. It returns ERROR while migrations are in progress and the cluster key when they are done. Run it periodically until all nodes return the same key:

    Terminal window
    Admin+> asinfo -v 'cluster-stable:size=6;ignore-migrations=false'
    node1:3000 (10.0.3.1) returned:
    A3F7C912DE4B
    node2:3000 (10.0.3.2) returned:
    A3F7C912DE4B
    node3:3000 (10.0.3.3) returned:
    A3F7C912DE4B
    node4:3000 (10.0.3.4) returned:
    A3F7C912DE4B
    node5:3000 (10.0.3.5) returned:
    A3F7C912DE4B
    node6:3000 (10.0.3.6) returned:
    A3F7C912DE4B
  9. Repeat for the next rack.

Optimization with active-rack

Starting with Database 7.2.0, the active-rack feature can both ensure full availability for two equally sized racks and shorten the procedure. When active-rack is configured, the designated active rack holds all master partitions. The passive rack holds only secondaries. This means:

  • The quiesce step can be skipped for the passive rack. Since the passive rack has no masters, there is no master handoff needed, so taking it down does not cause a master gap.
  • The active rack remains fully available while the passive rack is down.

With RF=2 and two racks, the active rack has no redundancy while the passive rack is down: every partition has exactly one roster replica (the master on the active rack). If any node in the active rack fails during this window, partitions mastered on that node become unavailable until it returns. Minimize the maintenance window and monitor the active rack closely.

Procedure with active-rack:

  1. Designate the rack that will stay up as active-rack. In SC mode, this requires setting the config, reclustering, re-rostering (with manage roster stage observed), and reclustering again. See Change active rack for SC dynamically. Wait for migrations to complete (masters move to the active rack).

    Terminal window
    Admin+> manage config namespace <ns> param active-rack to <rack-id>
    Admin+> manage recluster
    Admin+> manage roster stage observed ns <ns>
    Admin+> manage recluster

    The manage roster stage observed command prompts for interactive confirmation because the active-rack change modifies the roster prefix (for example, from no marker to M1). For scripted or non-interactive use, set the roster directly with roster-set, including the M<rack-id> prefix that appears in the Observed Nodes list:

    Terminal window
    $ asinfo -v 'roster-set:namespace=<ns>;nodes=M<rack-id>|<node1>@<rack>,<node2>@<rack>,...'
    $ asadm ... -e "manage recluster"

    Verify: asinfo -v 'cluster-stable:size=N;ignore-migrations=false' returns the same cluster key on all nodes (migrations complete). Use show pmap to confirm all Primary partitions are on the active rack’s nodes.

  2. Recommended: Dynamically change cluster-name on the passive rack’s nodes to eject them from the cluster. The static config retains the original cluster-name. See the tip above about when this step is essential.

    Terminal window
    $ asinfo -v 'set-config:context=service;cluster-name=maintenance-temp' -h <node-ip>
  3. Shut down asd. If hosts will be rebooted, use ASMT to back up and later restore the indexes of each namespace.

    Terminal window
    $ sudo systemctl stop aerospike
    # If rebooting:
    $ sudo asmt -b -v -z -p <path-to-backup-directory> -n <ns1, ns2, ...>
    # ... reboot and perform maintenance ...
    $ sudo asmt -r -v -p <path-to-backup-directory> -n <ns1, ns2, ...>
  4. Perform maintenance on the passive rack’s hosts.

  5. Start asd. The node reads the original cluster-name from static config and rejoins automatically.

    Terminal window
    $ sudo systemctl start aerospike

    Verify: info network shows all nodes have rejoined at the expected cluster size.

    Terminal window
    Admin> info network
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Network Information (2026-04-14 01:12:45 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Node| Node ID| IP| Build|Migrations|~~~~~~~~~~~~~~~~~~Cluster~~~~~~~~~~~~~~~~~~|Client| Uptime
    | | | | |Size| Key|Integrity| Principal| Conns|
    node1:3000 | BB94B7AEB45DB52|10.0.3.1:3000|E-8.1.1.2| 0.000 | 4|D74E19A3B82C|True |BB9D48DF5A70CEE| 5|01:35:53
    node2:3000 |*BB9D787F6BAF3D6|10.0.3.2:3000|E-8.1.1.2| 0.000 | 4|D74E19A3B82C|True |BB9D48DF5A70CEE| 5|01:35:53
    node3:3000 | BB989A1BF1D8116|10.0.3.3:3000|E-8.1.1.2| 0.000 | 4|D74E19A3B82C|True |BB9D48DF5A70CEE| 5|00:00:14
    node4:3000 | BB9D48DF5A70CEE|10.0.3.4:3000|E-8.1.1.2| 0.000 | 4|D74E19A3B82C|True |BB9D48DF5A70CEE| 5|00:00:14
    Number of rows: 4
  6. Validate that the cluster has no unavailable or dead partitions and that all nodes have rejoined.

  7. Wait for migrations to complete before switching active-rack. Use cluster-stable to check. It returns ERROR while migrations are in progress and the cluster key when they are done. Run it periodically until all nodes return the same key:

    Terminal window
    Admin+> asinfo -v 'cluster-stable:size=4;ignore-migrations=false'
    node1:3000 (10.0.3.1) returned:
    D74E19A3B82C
    node2:3000 (10.0.3.2) returned:
    D74E19A3B82C
    node3:3000 (10.0.3.3) returned:
    D74E19A3B82C
    node4:3000 (10.0.3.4) returned:
    D74E19A3B82C
  8. Switch active-rack to point to the now-maintained rack (repeat the set config / recluster / re-roster / recluster sequence). Wait for migrations to complete.

  9. Repeat steps 2-8 for the other rack.

  10. After both racks are maintained, disable active-rack to restore normal balanced partition distribution.

    Terminal window
    Admin+> manage config namespace <ns> param active-rack to 0
    Admin+> manage recluster
    Admin+> manage roster stage observed ns <ns>
    Admin+> manage recluster

    Verify: show pmap shows Primary and Secondary partitions evenly distributed across all nodes.

Validate partitions

When you validate partitions, each node reports the global number of dead or unavailable partitions. For example, if the entire cluster has determined that 100 partitions are unavailable, all of the current nodes report 100 unavailable partitions.

Use show pmap to display the partition map. The Unavailable and Dead columns should be 0 for each node.

Terminal window
Admin> show pmap
~~~~~~~~~~~~~~~~~~~Partition Map Analysis~~~~~~~~~~~~~~~~~~
Namespace| Node| Cluster Key|~~~~~~~~~~~~Partitions~~~~~~~~~~~~
| | |Primary|Secondary|Unavailable|Dead
test |node1:3000|A1D7BBA0D9EF| 1024| 1024| 0| 0
test |node2:3000|A1D7BBA0D9EF| 1024| 1024| 0| 0
test |node3:3000|A1D7BBA0D9EF| 1024| 1024| 0| 0
test |node4:3000|A1D7BBA0D9EF| 1024| 1024| 0| 0
test | | | 4096| 4096| 0| 0
Number of rows: 4

Revive dead partitions

You may wish to use your namespace in spite of potentially missing data. For example, you may have entered a maintenance state where you have disabled application use, and are preparing to reapply data from a reliable message queue or other source.

  1. Identify dead partitions with show pmap.

    Terminal window
    Admin> show pmap
    ~~~~~~~~~~~~~~~~~~~Partition Map Analysis~~~~~~~~~~~~~~~~~~
    Namespace| Node| Cluster Key|~~~~~~~~~~~~Partitions~~~~~~~~~~~~
    | | |Primary|Secondary|Unavailable|Dead
    test |node1:3000|A1D7BBA0D9EF| 915| 915| 0| 264
    test |node2:3000|A1D7BBA0D9EF| 915| 915| 0| 264
    test |node3:3000|A1D7BBA0D9EF| 915| 915| 0| 264
    test |node4:3000|A1D7BBA0D9EF| 915| 915| 0| 264
    test | | | 3660| 3660| 0| 264
    Number of rows: 4
  2. Run revive to acknowledge the potential data loss on each server.

    Terminal window
    Admin+> manage revive ns test
    ~~~Revive Namespace Partitions~~~
    Node|Response
    node1:3000 |ok
    node2:3000 |ok
    node3:3000 |ok
    node4:3000 |ok
    Number of rows: 4
  3. Run recluster to revive the dead partitions.

    Terminal window
    Admin+> manage recluster
    Successfully started recluster
  4. Verify that there are no longer any dead partitions with show pmap.

    Terminal window
    Admin> show pmap
    ~~~~~~~~~~~~~~~~~~~Partition Map Analysis~~~~~~~~~~~~~~~~~~
    Namespace| Node| Cluster Key|~~~~~~~~~~~~Partitions~~~~~~~~~~~~
    | | |Primary|Secondary|Unavailable|Dead
    test |node1:3000|A1D7BBA0D9EF| 1024| 1024| 0| 0
    test |node2:3000|A1D7BBA0D9EF| 1024| 1024| 0| 0
    test |node3:3000|A1D7BBA0D9EF| 1024| 1024| 0| 0
    test |node4:3000|A1D7BBA0D9EF| 1024| 1024| 0| 0
    test | | | 4096| 4096| 0| 0
    Number of rows: 4
Feedback

Was this page helpful?

What type of feedback are you giving?

What would you like us to know?

+Capture screenshot

Can we reach out to you?