Network Heartbeat Configuration
Aerospike's heartbeat protocols are responsible for maintaining cluster integrity. There are two supported heartbeat modes:
- Multicast (UDP)
- Mesh (TCP)
Cloud Considerationโ
Lack of Multicast Support: Cloud providers, such as, Amazon and Google Compute Engine do not support multicast networking. For these providers, we offer Mesh heartbeats, which uses point-to-point TCP connections for heartbeats.
Network Variability: Often, the network latency on cloud platforms is not consistent over time. This can cause problems with heartbeat packet delivery times. For these providers, we recommend setting the heartbeat
interval
to 150 and the heartbeattimeout
to 20.Instance Pauses: At times, your cloud instance could be paused by the cloud provider for short durations. For example, Google Compute Engine (GCE) employs live migration which could pause your instance for short time durations for maintenance or software updates. The short pauses might cause the other instances in the cluster to consider this instance as "dead". We recommend upgrading to server versions 3.13 and above to help your cluster recover quickly after any network disruption or cluster changes. Refer to paxos-recovery-policy. This policy has been introduced with Aerospike server version 3.7.0.1, but it requires explicit configuration to
auto-reset-master
until version 3.8.1.
Multicast Heartbeatโ
We recommend using the multicast heartbeat protocol when available. For various reasons your network may not support multicast. See our troubleshooting guide for information on how to validate multicast in your environment.
Configuration Stepsโ
In the heartbeat sub-stanza:
- Set
mode
tomulticast
. - Set
multicast-group
to a valid multicast address (239.0.0.0-239.255.255.255). - (Optional) Set
address
to the IP of the interface intended for intracluster communication. This setting also controls the interface fabric will use. Needed when isolating intra-cluster traffic to a particular network interface. - Set
interval
andtimeout
interval
(recommended: 150) controls how often to send a heartbeat packet.timeout
(recommended: 10) controls the number of intervals after which a node is considered to be missing by rest of nodes in the cluster if they haven't received the heartbeat from missing node.- With the default settings, a node will be aware of another node leaving the cluster within 1.5 seconds.
Exampleโ
...
heartbeat {
mode multicast # Send heartbeats using Multicast
multicast-group 239.1.99.2 # multicast address
port 9918 # multicast port
address 192.168.1.100 # (Optional) (Default any) IP of the NIC to
# use to send out heartbeat and bind
# fabric ports
interval 150 # Number of milliseconds between heartbeats
timeout 10 # Number of heartbeat intervals to wait
# before timing out a node
}
...
Mesh (Unicast) Heartbeatโ
Mesh uses TCP point to point connections for heartbeats. Each node in the cluster maintains a heartbeat connection to all other nodes, resulting in many connections required for mesh. For this reason, we recommend using multicast heartbeat protocol when available.
Configuration Stepsโ
In the heartbeat sub-stanza:
- Set
mode
tomesh
. - (Optional) Set
address
to the IP of the local interface intended for intracluster communication. This setting also controls the interface fabric will use. Needed when isolating intra-cluster traffic to a particular network interface. - Set
mesh-seed-address-port
to be the IP address (or qualified DNS name as of version 3.10) and heartbeat port of a node in the cluster. - Set
interval
andtimeout
interval
(recommended: 150) controls how often to send a heartbeat packet.timeout
(recommended: 10) controls the number of intervals after which a node is considered to be missing by the rest of the nodes in the cluster if they haven't received the heartbeat from the missing node.- With the recommended settings, a node will be aware of another node leaving the cluster within 1.5 seconds.
When using fully qualified names in versions 4.3.1 and earlier, names that would not DNS resolve could cause clusters to split if the DNS server slows down and the name resolution takes longer to fail. A successful DNS resolution will replace the name with the IP address until the subsquent restart.
Exampleโ
...
heartbeat {
mode mesh # Send heartbeats using Mesh (Unicast) protocol
address 192.168.1.100 # (Optional) (Default: any) IP of the NIC on
# which this node is listening to heartbeat
port 3002 # port on which this node is listening to
# heartbeat
mesh-seed-address-port 192.168.1.100 3002 # IP address for seed node in the cluster
# This IP happens to be the local node
mesh-seed-address-port 192.168.1.101 3002 # IP address for seed node in the cluster
mesh-seed-address-port 192.168.1.102 3002 # IP address for seed node in the cluster
mesh-seed-address-port 192.168.1.103 3002 # IP address for seed node in the cluster
interval 150 # Number of milliseconds between heartbeats
timeout 10 # Number of heartbeat intervals to wait before
# timing out a node
}
...
Where to Next?โ
- Configure service, fabric, and info sub-stanzas which defines what interface will be used for application to node communication.
- (Optional) Configure Rack Aware which enables Aerospike to support top-of-rack switch failure.
- Learn more about the Clustering Architecture.
- Or return to Configure Page.