Remove a node
Removing a node from a cluster is as easy as stopping the node, but it is important to follow the steps outlined to ensure proper operation of the cluster and its tools in the long run.
Examples:
- Prevent the remaining nodes in the cluster from reconnecting to the removed node if they are restarted.
- Prevent a cluster from attempting to join another cluster when one of its previously removed nodes is recommissioned.
Best practices
It is a good practice to quiesce a node prior to shutting it down or removing it from a cluster. See Quiesce node for further details.
If the node is shipping records using XDR, it is also a good practice to wait for lag
to drop to zero prior to removing the node from the cluster.
If you want to take down multiple nodes from the cluster, make sure that you start from step 1 and take one node down at a time, waiting for migrations to complete between each node, to avoid losing any data.
Asadm and the admin port
The commands on this page are run from Aerospike Admin (asadm
).
In Database 8.1 and later, you can run asadm
to connect to port 3003, a special admin port used if the other ports are unresponsive due to excessive I/O operations.
Remove a node
-
Ensure there are no ongoing migrations. Run
info namespace
insideasadm
to bring up the following display. Make sure the migration-related statisticsPending Migrates (tx%,rx%)
are0
on all the nodes.Terminal window Admin+> info namespaceTerminal window Namespace Node Avail% Evictions Master Replica Repl Stop Pending Disk Disk HWM Mem Mem HWM Stop. . . . (Objects,Tombstones) (Objects,Tombstones) Factor Writes Migrates Used Used% Disk% Used Used% Mem% Writes%. . . . . . . . (tx%,rx%) . . . . . . .test 10.0.0.100:3000 N/E 0.000 (0.000 ,0.000 ) (0.000 ,0.000 ) 2 false (0,0) N/E N/E 50 0.000 B 0 60 90test 10.0.0.103:3000 N/E 0.000 (0.000 ,0.000 ) (0.000 ,0.000 ) 2 false (0,0) N/E N/E 50 0.000 B 0 60 90test 10.0.0.101:3000 N/E 0.000 (0.000 ,0.000 ) (0.000 ,0.000 ) 2 false (0,0) N/E N/E 50 0.000 B 0 60 90test 0.000 (0.000 ,0.000 ) (0.000 ,0.000 ) (0,0) 0.000 B 0.000 BNumber of rows: 4You can also run
show statistics like migrate
and ensure that the returned statistic shows0
for each node.Make sure the
migrate_partitions_remaining
statistic shows0
for each node. From theasadm
admin prompt, run:Terminal window Admin+> show statistics like migrateTerminal window NODE : 10.0.0.100:3000 10.0.0.103:3000 10.0.0.101:3000migrate_allowed : true true truemigrate_partitions_remaining: 0 0 0See Monitoring Migrations for more details.
-
Shut down the node gracefully by stopping the Aerospike daemon.
Terminal window sudo service aerospike stopThe shutdown is successful when you see the following log message:
finished clean shutdown – exitingYou can also observe the status of the Aerospike daemon with the following command:
Terminal window sudo service aerospike status* aerospike is not running -
Update configuration on all other nodes in the cluster.
If this node is in the seed list of other nodes, you need to update the configuration of all the other nodes to ensure that they do not try to connect to this node if they are restarted.
Modify the configuration file sections shown in the following example. By default, the configuration file is stored on each node at
/etc/aerospike/aerospike.conf
.Consider a cluster with a node at
10.0.0.100
that you want to remove. Modify the list of seed nodes under thenetwork.heartbeat
configuration section in each configuration file.For example, in the configuration file for
10.0.0.101
, comment out the removed node’s address and port line.network {service {address anyaccess-address 10.0.0.101port 3000}heartbeat {mode meshaddress 10.0.0.101port 3002 # Heartbeat port for this node.# List one or more other nodes, one ip-address & port per line:# mesh-seed-address-port 10.0.0.100 3002mesh-seed-address-port 10.0.0.101 3002mesh-seed-address-port 10.0.0.103 3002interval 250timeout 10} -
Clear the configured hostname tip list from the mesh-mode heartbeat list to prevent the remaining nodes from continuously sending heartbeats to the removed node.
In the following example,
asadm
is running in interactive mode.- ‘hostname’ is the hostname of the node(s) to be removed.
$ asadm -e "enable; asinfo -v 'tip-clear:host-port-list=<hostname>:3002'"Terminal window Admin+> asinfo -v 'tip-clear:host-port-list=10.0.0.100:3002'10.0.0.101:3000 (10.0.0.101) returned:ok10.0.0.103:3000 (10.0.0.103) returned:okTo validate tip-clear, run the following command to log the heart-beat dump in the log file located at
/var/log/aerospike/aerospike.log
. The heartbeat dump should not contain the node that is decommissioned.$ asadmAdmin+> asinfo -v 'dump-hb:verbose=true'On node 10.0.0.101
Jan 23 2017 15:02:21 GMT-0800: INFO (hb): (hb.c:2605) Heartbeat Dump:Jan 23 2017 15:02:21 GMT-0800: INFO (hb): (hb.c:2616) HB Mode: mesh (2)Jan 23 2017 15:02:21 GMT-0800: INFO (hb): (hb.c:2618) HB Addresses: {10.0.0.101:3002}Jan 23 2017 15:02:21 GMT-0800: INFO (hb): (hb.c:2619) HB MTU: 1500Jan 23 2017 15:02:21 GMT-0800: INFO (hb): (hb.c:2621) HB Interval: 250Jan 23 2017 15:02:21 GMT-0800: INFO (hb): (hb.c:2622) HB Timeout: 10Jan 23 2017 15:02:21 GMT-0800: INFO (hb): (hb.c:2623) HB Fabric Grace Factor: -1Jan 23 2017 15:02:21 GMT-0800: INFO (hb): (hb.c:2626) HB Protocol: v2 (4)Jan 23 2017 15:02:21 GMT-0800: INFO (hb): (hb.c:8447) HB Mesh Node (seed): Node: bb9235677270008, Status: active, Last updated: 32223653, Endpoints: {10.0.0.103:3002}Jan 23 2017 15:02:21 GMT-0800: INFO (hb): (hb.c:6196) HB Channel Count 1Jan 23 2017 15:02:21 GMT-0800: INFO (hb): (hb.c:6181) HB Channel (mesh): Node: bb9235677270008, Fd: 65, Endpoint: 10.0.0.103:3002, Polarity: inbound, Last Received: 44236581Jan 23 2017 15:02:21 GMT-0800: INFO (hb): (hb.c:9947) HB Adjacency Size: 1Jan 23 2017 15:02:21 GMT-0800: INFO (hb): (hb.c:9933) HB Adjacent Node: Node: bb9235677270008, Protocol: 26723, Endpoints: {10.0.0.103:3002}, Last Updated: 44236581On node 10.0.0.103
Jan 23 2017 23:07:23 GMT: INFO (hb): (hb.c:2605) Heartbeat Dump:Jan 23 2017 23:07:23 GMT: INFO (hb): (hb.c:2616) HB Mode: mesh (2)Jan 23 2017 23:07:23 GMT: INFO (hb): (hb.c:2618) HB Addresses: {10.0.0.103:3002}Jan 23 2017 23:07:23 GMT: INFO (hb): (hb.c:2619) HB MTU: 1500Jan 23 2017 23:07:23 GMT: INFO (hb): (hb.c:2621) HB Interval: 250Jan 23 2017 23:07:23 GMT: INFO (hb): (hb.c:2622) HB Timeout: 10Jan 23 2017 23:07:23 GMT: INFO (hb): (hb.c:2623) HB Fabric Grace Factor: -1Jan 23 2017 23:07:23 GMT: INFO (hb): (hb.c:2626) HB Protocol: v2 (4)Jan 23 2017 23:07:23 GMT: INFO (hb): (hb.c:8447) HB Mesh Node (seed): Node: bb90a09e3270008, Status: active, Last updated: 32188503, Endpoints: {10.0.0.101:3002}Jan 23 2017 23:07:23 GMT: INFO (hb): (hb.c:6196) HB Channel Count 1Jan 23 2017 23:07:23 GMT: INFO (hb): (hb.c:6181) HB Channel (mesh): Node: bb90a09e3270008, Fd: 60, Endpoint: 10.0.0.101:3002, Polarity: outbound, Last Received: 44510068Jan 23 2017 23:07:23 GMT: INFO (hb): (hb.c:9947) HB Adjacency Size: 1Jan 23 2017 23:07:23 GMT: INFO (hb): (hb.c:9933) HB Adjacent Node: Node: bb90a09e3270008, Protocol: 26723, Endpoints: {10.0.0.101:3002}, Last Updated: 44510068 -
Remove the node from the alumni list. The alumni list is used by some tools to refer to all nodes in a cluster, even nodes that may have split from the cluster, so it is important to also clear this node from the list. Run this command to remove the old node from the alumni list on all the remaining nodes in the cluster:
asadm Admin+> asinfo -v 'services-alumni-reset'10.0.0.101:3000 (10.0.0.101) returned:ok10.0.0.103:3000 (10.0.0.103) returned:ok