Skip to content

Remove a node

Removing a node from a cluster is as easy as stopping the node, but it is important to follow the steps outlined to ensure proper operation of the cluster and its tools in the long run.

Examples:

  • Prevent the remaining nodes in the cluster from reconnecting to the removed node if they are restarted.
  • Prevent a cluster from attempting to join another cluster when one of its previously removed nodes is recommissioned.

Best practices

It is a good practice to quiesce a node prior to shutting it down or removing it from a cluster. See Quiesce node for further details.

If the node is shipping records using XDR, it is also a good practice to wait for lag to drop to zero prior to removing the node from the cluster.

If you want to take down multiple nodes from the cluster, make sure that you start from step 1 and take one node down at a time, waiting for migrations to complete between each node, to avoid losing any data.

Asadm and the admin port

The commands on this page are run from Aerospike Admin (asadm). In Database 8.1 and later, you can run asadm to connect to port 3003, a special admin port used if the other ports are unresponsive due to excessive I/O operations.

Remove a node

  1. Ensure there are no ongoing migrations. Run info namespace inside asadm to bring up the following display. Make sure the migration-related statistics Pending Migrates (tx%,rx%) are 0 on all the nodes.

    Terminal window
    Admin+> info namespace
    Terminal window
    Namespace Node Avail% Evictions Master Replica Repl Stop Pending Disk Disk HWM Mem Mem HWM Stop
    . . . . (Objects,Tombstones) (Objects,Tombstones) Factor Writes Migrates Used Used% Disk% Used Used% Mem% Writes%
    . . . . . . . . (tx%,rx%) . . . . . . .
    test 10.0.0.100:3000 N/E 0.000 (0.000 ,0.000 ) (0.000 ,0.000 ) 2 false (0,0) N/E N/E 50 0.000 B 0 60 90
    test 10.0.0.103:3000 N/E 0.000 (0.000 ,0.000 ) (0.000 ,0.000 ) 2 false (0,0) N/E N/E 50 0.000 B 0 60 90
    test 10.0.0.101:3000 N/E 0.000 (0.000 ,0.000 ) (0.000 ,0.000 ) 2 false (0,0) N/E N/E 50 0.000 B 0 60 90
    test 0.000 (0.000 ,0.000 ) (0.000 ,0.000 ) (0,0) 0.000 B 0.000 B
    Number of rows: 4

    You can also run show statistics like migrate and ensure that the returned statistic shows 0 for each node.

    Make sure the migrate_partitions_remaining statistic shows 0 for each node. From the asadm admin prompt, run:

    Terminal window
    Admin+> show statistics like migrate
    Terminal window
    NODE : 10.0.0.100:3000 10.0.0.103:3000 10.0.0.101:3000
    migrate_allowed : true true true
    migrate_partitions_remaining: 0 0 0

    See Monitoring Migrations for more details.

  2. Shut down the node gracefully by stopping the Aerospike daemon.

    Terminal window
    sudo service aerospike stop

    The shutdown is successful when you see the following log message:

    finished clean shutdown – exiting

    You can also observe the status of the Aerospike daemon with the following command:

    Terminal window
    sudo service aerospike status
    * aerospike is not running
  3. Update configuration on all other nodes in the cluster.

    If this node is in the seed list of other nodes, you need to update the configuration of all the other nodes to ensure that they do not try to connect to this node if they are restarted.

    Modify the configuration file sections shown in the following example. By default, the configuration file is stored on each node at /etc/aerospike/aerospike.conf.

    Consider a cluster with a node at 10.0.0.100 that you want to remove. Modify the list of seed nodes under the network.heartbeat configuration section in each configuration file.

    For example, in the configuration file for 10.0.0.101, comment out the removed node’s address and port line.

    network {
    service {
    address any
    access-address 10.0.0.101
    port 3000
    }
    heartbeat {
    mode mesh
    address 10.0.0.101
    port 3002 # Heartbeat port for this node.
    # List one or more other nodes, one ip-address & port per line:
    # mesh-seed-address-port 10.0.0.100 3002
    mesh-seed-address-port 10.0.0.101 3002
    mesh-seed-address-port 10.0.0.103 3002
    interval 250
    timeout 10
    }
  4. Clear the configured hostname tip list from the mesh-mode heartbeat list to prevent the remaining nodes from continuously sending heartbeats to the removed node.

    In the following example, asadm is running in interactive mode.

    • ‘hostname’ is the hostname of the node(s) to be removed.
    $ asadm -e "enable; asinfo -v 'tip-clear:host-port-list=<hostname>:3002'"
    Terminal window
    Admin+> asinfo -v 'tip-clear:host-port-list=10.0.0.100:3002'
    10.0.0.101:3000 (10.0.0.101) returned:
    ok
    10.0.0.103:3000 (10.0.0.103) returned:
    ok

    To validate tip-clear, run the following command to log the heart-beat dump in the log file located at /var/log/aerospike/aerospike.log. The heartbeat dump should not contain the node that is decommissioned.

    $ asadm
    Admin+> asinfo -v 'dump-hb:verbose=true'

    On node 10.0.0.101

    Jan 23 2017 15:02:21 GMT-0800: INFO (hb): (hb.c:2605) Heartbeat Dump:
    Jan 23 2017 15:02:21 GMT-0800: INFO (hb): (hb.c:2616) HB Mode: mesh (2)
    Jan 23 2017 15:02:21 GMT-0800: INFO (hb): (hb.c:2618) HB Addresses: {10.0.0.101:3002}
    Jan 23 2017 15:02:21 GMT-0800: INFO (hb): (hb.c:2619) HB MTU: 1500
    Jan 23 2017 15:02:21 GMT-0800: INFO (hb): (hb.c:2621) HB Interval: 250
    Jan 23 2017 15:02:21 GMT-0800: INFO (hb): (hb.c:2622) HB Timeout: 10
    Jan 23 2017 15:02:21 GMT-0800: INFO (hb): (hb.c:2623) HB Fabric Grace Factor: -1
    Jan 23 2017 15:02:21 GMT-0800: INFO (hb): (hb.c:2626) HB Protocol: v2 (4)
    Jan 23 2017 15:02:21 GMT-0800: INFO (hb): (hb.c:8447) HB Mesh Node (seed): Node: bb9235677270008, Status: active, Last updated: 32223653, Endpoints: {10.0.0.103:3002}
    Jan 23 2017 15:02:21 GMT-0800: INFO (hb): (hb.c:6196) HB Channel Count 1
    Jan 23 2017 15:02:21 GMT-0800: INFO (hb): (hb.c:6181) HB Channel (mesh): Node: bb9235677270008, Fd: 65, Endpoint: 10.0.0.103:3002, Polarity: inbound, Last Received: 44236581
    Jan 23 2017 15:02:21 GMT-0800: INFO (hb): (hb.c:9947) HB Adjacency Size: 1
    Jan 23 2017 15:02:21 GMT-0800: INFO (hb): (hb.c:9933) HB Adjacent Node: Node: bb9235677270008, Protocol: 26723, Endpoints: {10.0.0.103:3002}, Last Updated: 44236581

    On node 10.0.0.103

    Jan 23 2017 23:07:23 GMT: INFO (hb): (hb.c:2605) Heartbeat Dump:
    Jan 23 2017 23:07:23 GMT: INFO (hb): (hb.c:2616) HB Mode: mesh (2)
    Jan 23 2017 23:07:23 GMT: INFO (hb): (hb.c:2618) HB Addresses: {10.0.0.103:3002}
    Jan 23 2017 23:07:23 GMT: INFO (hb): (hb.c:2619) HB MTU: 1500
    Jan 23 2017 23:07:23 GMT: INFO (hb): (hb.c:2621) HB Interval: 250
    Jan 23 2017 23:07:23 GMT: INFO (hb): (hb.c:2622) HB Timeout: 10
    Jan 23 2017 23:07:23 GMT: INFO (hb): (hb.c:2623) HB Fabric Grace Factor: -1
    Jan 23 2017 23:07:23 GMT: INFO (hb): (hb.c:2626) HB Protocol: v2 (4)
    Jan 23 2017 23:07:23 GMT: INFO (hb): (hb.c:8447) HB Mesh Node (seed): Node: bb90a09e3270008, Status: active, Last updated: 32188503, Endpoints: {10.0.0.101:3002}
    Jan 23 2017 23:07:23 GMT: INFO (hb): (hb.c:6196) HB Channel Count 1
    Jan 23 2017 23:07:23 GMT: INFO (hb): (hb.c:6181) HB Channel (mesh): Node: bb90a09e3270008, Fd: 60, Endpoint: 10.0.0.101:3002, Polarity: outbound, Last Received: 44510068
    Jan 23 2017 23:07:23 GMT: INFO (hb): (hb.c:9947) HB Adjacency Size: 1
    Jan 23 2017 23:07:23 GMT: INFO (hb): (hb.c:9933) HB Adjacent Node: Node: bb90a09e3270008, Protocol: 26723, Endpoints: {10.0.0.101:3002}, Last Updated: 44510068
  5. Remove the node from the alumni list. The alumni list is used by some tools to refer to all nodes in a cluster, even nodes that may have split from the cluster, so it is important to also clear this node from the list. Run this command to remove the old node from the alumni list on all the remaining nodes in the cluster:

    asadm
    Admin+> asinfo -v 'services-alumni-reset'
    10.0.0.101:3000 (10.0.0.101) returned:
    ok
    10.0.0.103:3000 (10.0.0.103) returned:
    ok
Feedback

Was this page helpful?

What type of feedback are you giving?

What would you like us to know?

+Capture screenshot

Can we reach out to you?