Add a node to an Aerospike cluster
Overview
This section describes how to add capacity to an Aerospike cluster by adding a node. You can add single or multiple nodes to a running cluster. The number of nodes you can add is not based on the number of existing nodes. Clusters configured with rack awareness typically add the same number of nodes in each group.
IMPORTANT PREREQUISITES
It is important to prepare the node. To avoid any scenario of pre-existing data, the drives must be initialized. New SSD drives must be prepared/installed before use. This process erases the drives as described in the Initializing Solid State Drives (SSDs) section.
The files under the /opt/aerospike/smd
folder should be removed if the node was previously in use in a different cluster. The system metadata files in this folder contain details about secondary index definitions, udf modules, user roles and permissions, truncate and truncate-namespace related information, roster information and more. Adding a node in a cluster with existing data from a different cluster in the smd folder would result in potentially disastrous situations.
Method
If configured correctly, the new node will join the cluster automatically upon starting up the Aerospike daemon. This document is then mostly covering how to configure the new node.
The configuration file is by default under /etc/aerospike/aerospike.conf
Adding a node to a cluster initiates migrations to rebalance the number of partitions across all the available nodes. Migrations occupy various system resources, so it is advised to add a node during low traffic period. This data rebalancing mechanism ensures that query volume distributes evenly across all cluster nodes, and is persistent during node failure. See Auto Rebalance for more information.
Important points to consider when adding a new node to a cluster
The following must be enforced to ensure the node will properly join the cluster:
- For both in-memory and persistent namespaces, the total memory / disk space allocated for each namespace on each node should be the same (other than transient state when changing capacity).
- If the namespace is persistent the individual devices allocated to the namespace have the same capacity.
- Refer to the Namespace Storage Configuration page for more information on configuring a namespace.
There is no need to restart the existing nodes in the cluster to add a new node.
For the purposes of performing a rolling upgrade, Aerospike supports mixed-version clusters. Aerospike does not, however, support long term use of clusters with nodes running different versions of the Aerospike database. See rolling upgrade for more information.
As a recommendation – do not add multiple nodes simultaneously to avoid corner cases where the new nodes form a cluster on their own before joining the main cluster, adding more partition versions that would cause the subsequent duplicate resolution to be heavier. We recommend, as best practice, to wait for a new node to successfully join the cluster before adding the next one.
Important when adding empty nodes: For server version on the older paxos and heartbeat protocol (prior to 3.13), it may be preferable to wait for migrations to complete between adding empty nodes. The older cluster protocol would drop redundant partition copies prior to completing the migration for the partition, leaving some partitions (the one that would be owned by the newly added node) with 1 less copy than dictated by the replication factor. This adds risks of data unavailability in the unlikely event of a node leaving the cluster prior to migrations completing. The more new nodes added at the same time, the more partitions would be losing a copy putting more data at risk in case of a node leaving the cluster. Replica copies of a partition are kept around until the migration of the partition completes.
The configuration file options node-id
and node-id-interface
are mutually exclusive.
Configuration setup
Mesh mode
- See configure Aerospike database for a detailed description of various sections of the configuration file and their relevant configuration parameters.
- The following steps assume the following:
- A cluster(at least 1 node) is already setup in mesh mode
- The Aerospike daemon is installed on the new node. PSee Install Aerospike for the Aerospike installation procedure.
- The example used to illustrate this procedure consists of a 2 node cluster to which a 3rd node is added.
Example: 2 node cluster
Admin> info
Node Node Ip Build Cluster Cluster Cluster Principal Client Uptime
. Id . . Size Key Integrity . Conns .
10.0.0.101:3000 BB90A09E3270008 10.0.0.101:3000 E-3.11.0.2 2 F0322B636B7B0FE9 True BB9AF1F8D270008 5 09:26:10
10.0.0.103:3000 BB9235677270008 10.0.0.103:3000 E-3.11.0.2 2 F0322B636B7B0FE9 True BB9AF1F8D270008 10 09:26:14
Number of rows: 2
Configuration setup for node 10.0.0.101 which is already part of the cluster
# Aerospike database configuration file for deployments using mesh heartbeats.
service {
user root
group root
pidfile /var/run/aerospike/asd.pid
service-threads 20 # Should be 5 times number of vCPUs for 4.7+ and
# at least one SSD namespace, otherwise number of vCPUs
proto-fd-max 15000
node-id-interface eth1
}
logging {
# Log file must be an absolute path.
file /var/log/aerospike/aerospike.log {
context any info
}
}
network {
service {
address any
access-address 10.0.0.101
port 3000
}
heartbeat {
mode mesh
address 10.0.0.101
port 3002 # Heartbeat port for this node.
# List one or more other nodes, one ip-address & port per line:
# Please note that we do not have the address of the incoming node in this list
mesh-seed-address-port 10.0.0.103 3002
mesh-seed-address-port 10.0.0.101 3002
# Having the node itself as a mesh seed node is allowed
# and helps with consistent configuration files across the cluster
interval 250
timeout 10
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace test {
# Data in memory without persistence namespace
replication-factor 2
memory-size 4G
storage-engine memory
}
namespace bar {
memory-size 4G # Maximum memory allocation for primary and secondary indexes.
# Warning - legacy data in defined raw partition devices will be erased.
# These partitions must not be mounted by the file system.
storage-engine device { # Configure the storage-engine to use persistence
# Add raw device(s). Maximum size is 2 TiB.
device /dev/sdb1
device /dev/sdc1
device /dev/sdd1
device /dev/sde1
# The 2 lines below optimize for SSD.
scheduler-mode noop
write-block-size 128K # adjust block size to make it efficient for SSDs.
}
}
Configuration setup for node 10.0.0.103 which is already part of the cluster
# Aerospike database configuration file for deployments using mesh heartbeats.
service {
user root
group root
pidfile /var/run/aerospike/asd.pid
service-threads 20 # Should be 5 times number of vCPUs for 4.7+ and
# at least one SSD namespace, otherwise number of vCPUs
proto-fd-max 15000
node-id-interface eth1
}
logging {
# Log file must be an absolute path.
file /var/log/aerospike/aerospike.log {
context any info
}
}
network {
service {
address any
access-address 10.0.0.103
port 3000
}
heartbeat {
mode mesh
address 10.0.0.103
port 3002 # Heartbeat port for this node.
# List one or more other nodes, one ip-address & port per line:
# Please note that we do not have the address of the incoming node in this list
mesh-seed-address-port 10.0.0.101 3002
mesh-seed-address-port 10.0.0.103 3002
interval 250
timeout 10
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace test {
# Data in memory without persistence namespace
replication-factor 2
memory-size 4G
storage-engine memory
}
namespace bar {
memory-size 4G # Maximum memory allocation for primary and secondary indexes.
# Warning - legacy data in defined raw partition devices will be erased.
# These partitions must not be mounted by the file system.
storage-engine device { # Configure the storage-engine to use persistence
# Add raw device(s). Maximum size is 2 TiB
device /dev/sdb3
device /dev/sdc3
device /dev/sdd3
device /dev/sde3
# The 2 lines below optimize for SSD.
scheduler-mode noop
write-block-size 128K # adjust block size to make it efficient for SSDs.
}
}
Configuration for the new incoming node with IP 10.0.0.100
# Aerospike database configuration file for deployments using mesh heartbeats.
service {
user root
group root
pidfile /var/run/aerospike/asd.pid
service-threads 20 # Should be 5 times number of vCPUs for 4.7+ and
# at least one SSD namespace, otherwise number of vCPUs
proto-fd-max 15000
node-id-interface eth1
}
logging {
# Log file must be an absolute path.
file /var/log/aerospike/aerospike.log {
context any info
}
}
network {
service {
address any
access-address 10.0.0.100
port 3000
}
heartbeat {
mode mesh
address 10.0.0.100
port 3002 # Heartbeat port for this node.
# List one or more other nodes, one ip-address & port per line:
mesh-seed-address-port 10.0.0.100 3002
mesh-seed-address-port 10.0.0.101 3002
mesh-seed-address-port 10.0.0.103 3002
interval 250
timeout 10
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace test {
# Data in memory without persistence namespace
replication-factor 2
memory-size 4G
storage-engine memory
}
namespace bar {
memory-size 4G # Maximum memory allocation for primary and secondary indexes.
# Warning - legacy data in defined raw partition devices will be erased.
# These partitions must not be mounted by the file system.
storage-engine device { # Configure the storage-engine to use persistence
# Add raw device(s). Maximum size is 2 TiB
device /dev/sdb
device /dev/sdc
device /dev/sdd
device /dev/sde
# The 2 lines below optimize for SSD.
scheduler-mode noop
write-block-size 128K # adjust block size to make it efficient for SSDs.
}
}
Multicast mode As multicast is another mode of communication for the heartbeat protocol all the other sections of the configuration files remain the same except for the heartbeat section. Please see Network Heartbeat Configuration for more information on the heartbeat section.
So, in our case the heartbeat section of the incoming node should be
...
heartbeat {
mode multicast # Send heartbeats using Multicast
multicast-group 239.1.99.2 # multicast address
port 9918 # multicast port
address 10.0.0.100 # (Optional) (Default any) IP of the NIC to
# use to send out heartbeat and bind
# fabric ports
interval 150 # Number of milliseconds between heartbeats
timeout 10 # Number of heartbeat intervals to wait
# before timing out a node
}
...
Verify node joined the cluster
To verify the node is now a part of the cluster on any of the nodes we can issue an info command within the asadm tool. For our example the info command outputs the following
Admin> info
Node Node Ip Build Cluster Cluster Cluster Principal Client Uptime
. Id . . Size Key Integrity . Conns .
10.0.0.101:3000 BB90A09E3270008 10.0.0.101:3000 E-3.11.0.2 3 F0322B636B7B0FE9 True BB9AF1F8D270008 5 51:49:48
10.0.0.103:3000 BB9235677270008 10.0.0.103:3000 E-3.11.0.2 3 F0322B636B7B0FE9 True BB9AF1F8D270008 10 51:49:36
10.0.0.100:3000 *BB9AF1F8D270008 10.0.0.100:3000 E-3.11.0.2 3 F0322B636B7B0FE9 True BB9AF1F8D270008 10 03:05:39
Number of rows: 3
We can verify the same by searching the server log at /var/log/aerospike/aerospike.log
grep 'CLUSTER-SIZE' /var/log/aerospike/aerospike.log
You should see (from our example):
Jan 28 2017 01:00:03 GMT: INFO (info): (ticker.c:169) NODE-ID bb9af1f8d270008 CLUSTER-SIZE 3 CLUSTER-NAME myCluster