Fast Restart
Overview​
This page describes the fast restart feature which enables Aerospike nodes to re-join their clusters quickly.
Fast restart is available in Aerospike Database Enterprise Edition (EE) and Standard Edition (SE). It is not available in Community Edition (CE).
Fast Restart (AKA warm restart, warmstart) occurs after a clean shutdown of the Aerospike daemon (asd), during which time asd persists its indexes to their storage. When a new asd process restarts, it does not need to read records from namespace data storage. The server reattaches to the indexes, scans them to rebuild statistics, then rejoins the cluster after all its namespaces have been restarted.
The speed of warm restart is relative to the media the indexes are stored in - shared memory will be the fastest, followed closely by Intel Optane Persistent Memory (PMem), then indexes on flash SSDs.
The default behavior of Aerospike is to warm restart:
sudo systemctl start aerospike
# or
service aerospike start
You can verify in the logs whether a namespace is going through warm restart.
INFO (namespace): (namespace_ee.c:361) {test} beginning warm restart
Namespaces restart independently. Some may warm restart and some may cold restart, depending if the previously described conditions are met. The Aerospike server node only joins the cluster after all its namespaces have restarted.
When does warm restart NOT happen?​
There are several situations in which the server cannot warm restart, and will switch to a cold restart. If a cluster node does switch to cold restart, its log file will mention it, and should indicate the reason for the switch.
What does Aerospike Shared Memory Tool (ASMT) do?​
ASMT enables primary and secondary indexes, as well as in-memory data storage, to be backed up from shared memory to the file system, and restored from files to shared memory before restarting the node. Warm restart can be accomplished across host machine reboots, thus enabling warm restart for asd. This is useful when upgrading the operating system.
In contrast, the older technique of backing up the node with asbackup, ahead of rebooting that node's host machine, meant that asrestore could only happen after the Aerospike node had fully restarted, while migrations were happening. Using ASMT as an alternative, and recovering the shared memory segments ahead of warm restarting asd, has a significant positive impact on a rolling restart of the Aerospike cluster.
Can I monitor system shared memory used by Aerospike?​
You can see the system’s shared memory blocks, using the command:
sudo ipcs -m
All blocks listed that have keys starting with "0xae", "0xa2", or "0xad" are Aerospike shared memory blocks. With 6.1 and above, the shared memory blocks used for secondary indexes have keys that start with “0xa2”. With 7.0 and above, the shared memory blocks used for record data have keys that start with "0xad".
'ae' (as in Aerospike) are Aerospike shared memory blocks.
An instance of Aerospike EE will always have "0xae" keys for the primary index. If EE and 6.1+ and secondary
indices are defined then there will be "0xa2" keys. If EE and 7.0+ and any namespace is configured with
storage-engine memory
then there will be "0xad" segments.
[root@da38772fdefc ~]# ipcs
------ Message Queues --------
key msqid owner perms used-bytes messages
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0xae001100 0 root 666 1073741824 1
0xae002100 1 root 666 1073741824 1
0xad001000 2 root 666 536870912 1
0xad001001 3 root 666 536870912 1
0xad001002 4 root 666 536870912 1
0xad001003 5 root 666 536870912 1
0xad001004 6 root 666 536870912 1
0xad001005 7 root 666 536870912 1
0xad001006 8 root 666 536870912 1
0xad001007 9 root 666 536870912 1
0xad002000 10 root 666 536870912 1
0xad002001 11 root 666 536870912 1
0xad002002 12 root 666 536870912 1
0xad002003 13 root 666 536870912 1
0xad002004 14 root 666 536870912 1
0xad002005 15 root 666 536870912 1
0xad002006 16 root 666 536870912 1
0xad002007 17 root 666 536870912 1
------ Semaphore Arrays --------
key semid owner perms nsems
Data-in-Index namespaces​
Support for this type of namespace was removed in Database 6.4. See setup for a data-in-index storage engine.
If the records of your in-memory namespace have a single bin with numeric data (integer or double), for example to store counters, it can take advantage of warm restart.
During the fast restart, asd logs an error and exits
if any non-numeric bin values are found. Any attempts to write
non-numeric values to the namespace will fail with an error code 12
(INCOMPATIBLE_TYPE
).