Cluster problems

Migration

When recovering from a cluster integrity fault because of a cluster size change, data migration occurs on the affected nodes. Migration threads take lower priority than the others, and can take a long time to complete depending on the environment and configuration.

See Tuning migration for more information.

Cluster size change

If you suspect that a node may be down, you can check in the following ways:

Use the asadm tool with the info command.
Check the server log and look for “CLUSTER SIZE = N”. This will tell you how many nodes that particular node thinks are in the cluster. At times there may not be agreement. This can be true if a set of nodes has split from the main cluster. This is sometimes referred to as a split brain. Generally, it is easiest to restart any “lost” nodes to get them to rejoin the cluster.

Iptable configuration

Customers may need to disable or reconfigure the IP table to ensure inter-node communications is not blocked.

Iptables interferes with the normal operation of Aerospike. We recommend that you turn it off.

To Turn OFF IP Tables run one of the following commands:

sudo /usr/sbin/ntsysv
uncheck iptables so that service won't be automatically started
click Ok

or use chkconfig

sudo /sbin/chkconfig iptables off

After you disable IP Tables, turn it off by issuing the command:

sudo /etc/init.d/iptables stop

Healthy cluster is reporting errors

If you stop seeing writes, go to the cluster indicated by the server logs. If the client is reporting errors but there are no errors in the server log, check for the following possible causes:

Check service iptables status – to see if any port is blocked. Even in cases where the service had been up, there have been cases where the port was reset to be blocked.
Check clients/servers’ CPU load average. One of the causes may be a runaway process on the client or server. If one of the CPUs on the server is showing 100%, but the others are not, this may be normal behavior defrag or another process is running. This can be most commonly done using the Linux top command.
Check to make sure heartbeat_received (foreign counter) is increasing/changing (and not stuck) using the asadm tool and issue the show statistics like heartbeat command. The node expects heartbeats from the other nodes and the foreign counter indicates how many heartbeats have being received from other nodes – the count should be constantly increasing.

If you are using mesh, heartbeats received for self are always 0 (this is expected). In multicast, heartbeats received for self should be increasing/changing.

Grep for the trans_in_progress keyword on the server log with this command:

tail -200f /var/log/aerospike/aerospike.log |grep 'in-progress'

Sample output:

Oct 03 2022 18:31:49 GMT: INFO (info): (ticker.c:303)    in-progress: info-q 0 rw-hash 0 proxy-hash 0 tree-gc-q 0 long-queries 0
Oct 03 2022 18:31:59 GMT: INFO (info): (ticker.c:303)    in-progress: info-q 0 rw-hash 0 proxy-hash 0 tree-gc-q 0 long-queries 0
Oct 03 2022 18:32:09 GMT: INFO (info): (ticker.c:303)    in-progress: info-q 0 rw-hash 0 proxy-hash 0 tree-gc-q 0 long-queries 0
Oct 03 2022 18:32:19 GMT: INFO (info): (ticker.c:303)    in-progress: info-q 0 rw-hash 0 proxy-hash 0 tree-gc-q 0 long-queries 0

If the read-write counter (“rw-hash”) is not going down, it means requests are getting backed up. Check for growth pattern.
If proxy counter (“prox-hash”) is not zero, it means the port is blocked for incoming requests. Check the fabric port 3001 by running service iptables status.
If any of the queues are constantly increasing, it may hint at some potential issues on the node unable to keep up.

Validate multicast

If you want to use multicast, you can validate that multicast is properly enabled on your network, or a particular multicast address is being correctly routed between particular machines.

These tests will only test for the initial multicast connectivity. There are many reasons that your network equipment may initially allow multicast, but may halt it at a later time.

The best ways to ensure multicast runs smoothly are:

Turn off storm control to the nodes. Often the connections on the cluster will look like a denial of service attack, so storm control may stop these connections. Check dmesg or /var/log/messages to identify if something specific is showing up as the cause of the error.
To ensure group membership, do one of the following:
- Turn off IGMP snooping on the cluster vlan.
- Turn on IGMP snooping AND turn on the querier (create a multicast router).
Identify if any hops occur in the packets sent between the nodes in the cluster. Use the traceroute command to verify. If you identify multiple hops between nodes, you can attempt to increase the multicast-ttl configuration to 2 or 3 on Aerospike configuration as a workaround.
We recommend using a very simple multicast test tool called open-mtools, made available by Google Code without restriction. Download and run the mtools (Note: mtools will not work properly in VM environments).

Install mtools on both the sending and receiving machines:

$ wget https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/open-mtools/mtools.1.0.zip
$ unzip mtools.1.0.zip
$ cd mtools
$ gcc mdump.c -o mdump; gcc msend.c -o msend

Run the following mdump on the receiving machines first:

$./mdump -Q2 -v 239.1.1.1 2345 <interface ip>

Run the msend command on a machine that will be broadcasting:

$ ./msend -m 100 -b 300 -n 600 -s 10 239.1.1.1 2345 <ttl> <interface IP>
Equiv cmd line: msend -b300 -m100 -n600 -p1000 -s10 -S65536 239.1.1.1 2345
Sending 600 bursts of 300 100-byte messages
Sending burst of 300 msgs
Sending burst of 300 msgs
Sending burst of 300 msgs
Sending burst of 300 msgs
Sending burst of 300 msgs
...
Pausing before sending 'stat
Sending stat
180000 messages sent (not including 'stat')

This will send 600 bursts of 300 100-byte messages on address 239.1.1.1 on port 2345. Aerospike is currently configured for a different address, but this test succeeding will show that multicast functionality is working.

See if a machine is receiving:

$ ./mdump -Q2 -s -v 239.1.1.1 2345
Equiv cmd line: mdump -p0 -Q2 -r4194304 -v 239.1.1.1 2345
WARNING: tried to set SO_RCVBUF to 4194304, only got 262142
echo sender equiv cmd: msend -b300 -m100 -n5 -p1000 -s10 -S65536 239.1.1.1 2345
180000 msgs sent, 180000 received (not including 'stat')
0.000000% loss

The normal test scenario would be to run msend from one machine and mdump on the rest of machines. You can change the parameter of -b to send more packets in one burst. mdump will print summary as follows which tells what is the %loss.

Verify multicast by testing sending and receiving on all nodes.

A reasonable validation scenario for four servers might be as follows:

Run with node 01 as sender and nodes 02, 03, 04 as receivers and verified that the bytes are received fine.
Repeating the test thrice each time using a different node as sender and the rest of the nodes as receivers and verified success.

We recommend performing these tests when a new cluster is set up, when a new machine is added, when a server moves within the datacenter, or when the network configuration changes.