Quiescing a node
Contextโ
When the quiesce
command is issued to a node in Enterprise Edition 4.3.1.3 and later, it causes the node to be "quiesced" in the next cluster rebalance.
During rebalance, a quiesced node behaves in some ways as if it has been removed from the cluster, but in other ways as if it is still in the cluster.
Rebalance puts the quiesced node at the end of every partition's "succession list". This can cause a quiesced node that was previously master to handoff master status to a normal (not quiesced) node. This handoff happens only if the normal node and quiesced node have matching full partition versions. Such a handoff is the goal of using quiescence in one particular use case (see below).
The quiesced node at the end of the succession list is excluded from certain algorithms during rebalance: AP rack-aware, AP uniform balance, and SC "second phase" rack-aware -- for these purposes the quiesced node behaves as if it is not in the cluster.
Otherwise, the quiesced node behaves as if it is in the cluster -- it will accept transactions, schedule migrations if appropriate, and for SC it will count towards determining that a partition is available.
One last way a quiesced node behaves differently is that it will never drop its data, even if all migrations complete and leave it with a "superfluous" partition version. The assumption is that the quiesced node will be taken down and then return to its prior place as a replica -- keeping data means it will not take as long to re-sync, needing only a "delta" migration instead of a "fill" migration.
Using Quiescence for a Smooth Master Handoffโ
A common use for quiescence is to enable smooth master handoffs.
Normally, when a node is removed from a cluster, it takes a couple seconds for the remaining nodes to re-cluster and determine a new master for the partitions that had been master on the missing node. During this time, transactions (with timeouts shorter than the "master gap") that are looking for a master node will not find one, and will time out -- i.e. writes, and SC reads. AP reads will by default retry against a replica.
Quiescence can fill this gap -- if the node to be removed is first quiesced, and a rebalance triggered, master handoff will occur during the rebalance, yet the quiesced node will continue to receive transactions (and proxy them to the new master) until all the clients have discovered the new master and moved to it. Once this has happened, the quiesced node can be taken down. The re-clustering caused by this will not have a "master gap". Therefore, the burst of timeouts should instead become a burst of proxies (refer to the proxy related metrics such as client_proxy_complete
).
The term 'master gap' refers to the time elapsed between the current master node in the client partition map becoming unreachable and a new master node assuming masterhood. Once a new master has been elected, client transactions can be proxied within the cluster to the new master node. When the client has tended the remaining nodes (after the tend interval
has elapsed) transactions will be sent directly to the new master. The maximum master gap can be calculated and would be expressed as follows:
(timeout
x interval
+ quantum interval
With default values this would resolve to:
(10 x 150) + (1500 x 0.2) = 1800ms or 1.8s
Rolling upgrade procedureโ
Ensure the cluster is stable | asadm -e 'info network'
โ
Using the info network
asadm command, ensure there are no migrations, all the nodes are in the cluster, and all nodes show the same key.
Admin> info network
\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~Network Information (2021-05-27 01:21:44 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~
Node| Node ID| IP| Build|Migrations|\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~Cluster\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~|Client| Uptime
| | | | |Size| Key|Integrity| Principal| Conns|
aero-cluster1_1:3000| BB9211EF53E1600|10.0.3.196:3000|E-5.6.0.3| 0.000 | 4|39815062484B|True |BB9D9FC68290C00| 6|28:34:44
aero-cluster1_2:3000| BB9211EF53E1600|10.0.3.196:3000|E-5.6.0.3| 0.000 | 4|39815062484B|True |BB9D9FC68290C00| 4|28:34:44
aero-cluster1_3:3000| BB906A7363E1600|10.0.3.149:3000|E-5.6.0.3| 0.000 | 4|39815062484B|True |BB9D9FC68290C00| 4|28:34:44
aero-cluster1_4:3000|*BB9D9FC68290C00| 10.0.3.41:3000|E-5.6.0.3| 0.000 | 4|39815062484B|True |BB9D9FC68290C00| 6|28:34:44
Number of rows: 4
Issue the quiesce command | manage quiesce
โ
The quiesce
command can be issued from asadm
or directly through
asinfo
. If using asadm
, it should be directed to the node to be quiesced
using the with
modifier, specifying the IP address or node ID of the node to be quiesced:
Tools package 6.0.x or later is required to use asadm's manage quiesce command. Otherwise, use the equivalent asinfo - quiesce command.
Admin+> manage quiesce with 10.0.3.224
\~\~\~Quiesce Nodes\~\~\~\~
Node|Response
ubuntu:3000|ok
Number of rows: 1
Verify the command has been successful by checking the pending_quiesce
statistic:
Admin+> show statistics like pending_quiesce
\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~bar Namespace Statistics (2021-05-27 00:53:40 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~
Node |aero-cluster1_1:3000|aero-cluster1_2:3000|aero-cluster1_3:3000|aero-cluster1_4:3000
pending_quiesce|false |true |false |false
Number of rows: 2
\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~test Namespace Statistics (2021-05-27 00:53:40 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~
Node |aero-cluster1_1:3000|aero-cluster1_2:3000|aero-cluster1_3:3000|aero-cluster1_4:3000
pending_quiesce|false |true |false |false
Number of rows: 2
Delaying fill migrations is a good practice in many common situations (independent of quiescing). See the
migrate-fill-delay
configuration parameter for details.If the
manage quiesce
command has been issued to the wrong node, usemanage quiesce undo
to revert it.Quiescing happens for all namespaces on a node at the same time. Quiescing individual namespaces is not supported.
Issue a recluster command | manage recluster
โ
This is the step where quiesced masters will handoff to other nodes and migrations will start.
Issue a recluster
command:
Tools package 6.0.x or later is required to use asadm's manage recluster command. Otherwise, use the equivalent asinfo - recluster command.
Admin+> manage recluster
Successfully started recluster
Verify command was successful by checking the effective_is_quiesced
and
nodes_quiesced
statistics:
Admin+> show statistics like quiesce
\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~bar Namespace Statistics (2021-05-27 01:08:57 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~
Node |aero-cluster1_1:3000|aero-cluster1_2:3000|aero-cluster1_3:3000|aero-cluster1_4:3000
effective_is_quiesced|false |true |false |false
nodes_quiesced |1 |1 |1 |1
pending_quiesce |false |true |false |false
Number of rows: 4
\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~test Namespace Statistics (2021-05-27 01:08:57 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~
Node |aero-cluster1_1:3000|aero-cluster1_2:3000|aero-cluster1_3:3000|aero-cluster1_4:3000
effective_is_quiesced|false |true |false |false
nodes_quiesced |1 |1 |1 |1
pending_quiesce |false |true |false |false
Number of rows: 4
Check that no more transactions are hitting the quiesced node and proxies counters are as expectedโ
1) The quiesced node should not be receiving any active traffic
A few seconds should be enough to be sure all clients have "moved" from the quiesced node to the new masters.
The asadm
show latencies
command can be used to check that the read and write throughput to the quiesced node is down to zero. Usual metrics can be used to verify other type of transactions (e.g batch) have also stopped against the node to be quiesced.
Admin+> show latencies
\~\~\~\~\~\~\~\~\~\~\~Latency (2021-05-27 01:12:04 UTC)\~\~\~\~\~\~\~\~\~\~
Namespace|Histogram| Node|ops/sec|>1ms|>8ms|>64ms
test |read |aero-cluster1_1:3000| 9.3| 0.0| 0.0| 0.0
test |read |aero-cluster1_2:3000| 0.0| 0.0| 0.0| 0.0
test |read |aero-cluster1_3:3000| 7.5| 0.0| 0.0| 0.0
test |read |aero-cluster1_4:3000| 8.4| 0.0| 0.0| 0.0
| | | 25.2| 0.0| 0.0| 0.0
test |write |aero-cluster1_1:3000| 2.0| 0.0| 0.0| 0.0
test |write |aero-cluster1_2:3000| 0.0| 0.0| 0.0| 0.0
test |write |aero-cluster1_3:3000| 3.2| 0.0| 0.0| 0.0
test |write |aero-cluster1_4:3000| 4.5| 0.0| 0.0| 0.0
| | | 9.9| 0.0| 0.0| 0.0
Number of rows: 8
For Database 5.7 and earlier, client libraries send SI query transactions to all nodes in the cluster. If the workload involves SI queries, a quiesced node shows a value in the ops/sec column of query histograms. This can be safely ignored. The quiesced node owns no partitions and so will not return any records, though it will track query progress. You will observe the same behavior for older client libraries that do not query per partition, against server versions 6.0 and later.
2) The quiesced node should no longer be doing proxy
There would typically be a second or two of proxy transactions on the node that was quiesced as clients retrieve the updated partition map and start directing transactions to the new master nodes for the partitions previously owned by the quiesced nodes. It is a good practice to also monitor for proxies transactions to stop on the quiesced node prior to shutting it down.
On the quiesced node, confirm that the following statistics are not incrementing. For details regarding the metrics, refer to the Metrics Reference page.
Admin+> show statistics like client_proxy
\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~bar Namespace Statistics (2021-05-27 01:17:48 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~
Node |aero-cluster1_1:3000|aero-cluster1_2:3000|aero-cluster1_3:3000|aero-cluster1_4:3000
client_proxy_complete|0 |20 |0 |0
client_proxy_error |0 |0 |0 |0
client_proxy_timeout |0 |0 |0 |0
Number of rows: 4
\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~test Namespace Statistics (2021-05-27 01:17:48 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~
Node |aero-cluster1_1:3000|aero-cluster1_2:3000|aero-cluster1_3:3000|aero-cluster1_4:3000
client_proxy_complete|0 |10 |0 |0
client_proxy_error |0 |0 |0 |0
client_proxy_timeout |0 |0 |0 |0
Number of rows: 4
Admin+> show statistics like batch_sub_proxy
\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~bar Namespace Statistics (2021-05-27 01:17:48 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~
Node |aero-cluster1_1:3000|aero-cluster1_2:3000|aero-cluster1_3:3000|aero-cluster1_4:3000
batch_sub_proxy_complete|0 |20 |0 |0
batch_sub_proxy_error |0 |0 |0 |0
batch_sub_proxy_timeout |0 |0 |0 |0
Number of rows: 4
\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~test Namespace Statistics (2021-05-27 01:17:48 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~
Node |aero-cluster1_1:3000|aero-cluster1_2:3000|aero-cluster1_3:3000|aero-cluster1_4:3000
batch_sub_proxy_complete|0 |20 |0 |0
batch_sub_proxy_error |0 |0 |0 |0
batch_sub_proxy_timeout |0 |0 |0 |0
Number of rows: 4
3) The non-quiesced nodes should not be a destination of proxy transactions
On the nodes that were not quiesced, check statistics for the transactions that confirm that the destination of proxy transactions. These statistics would be the ones that are beginning with from_proxy
.
Admin > show statistics like from_proxy_read
Admin > show statistics like from_proxy_write
Admin > show statistics like from_proxy_batch_sub
Other ways to monitor proxies can be to simply monitor the proxy transactions on the client transaction metric log line or, alternatively, dynamically enable proxy histogram and monitor their throughput using the Log Latency Tool.
Take down the quiesced node and proceed with the upgrade or maintenanceโ
$ sudo systemctl stop aerospike
Verify the node has stopped and the cluster is now showing one less node:
Admin> info network
\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~Network Information (2021-05-27 01:21:44 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~
Node| Node ID| IP| Build|Migrations|\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~Cluster\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~|Client| Uptime
| | | | |Size| Key|Integrity| Principal| Conns|
aero-cluster1_1:3000| BB9211EF53E1600|10.0.3.196:3000|E-5.6.0.3| 0.000 | 3|39815062484B|True |BB9D9FC68290C00| 6|28:34:44
aero-cluster1_3:3000| BB906A7363E1600|10.0.3.149:3000|E-5.6.0.3| 0.000 | 3|39815062484B|True |BB9D9FC68290C00| 4|28:34:44
aero-cluster1_4:3000|*BB9D9FC68290C00| 10.0.3.41:3000|E-5.6.0.3| 0.000 | 3|39815062484B|True |BB9D9FC68290C00| 6|28:34:44
Number of rows: 3
The nodes_quiesced
statistic is now back to 0:
Admin+> show statistics like quiesce
\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~bar Namespace Statistics (2021-05-27 01:08:57 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~
Node |aero-cluster1_1:3000|aero-cluster1_3:3000|aero-cluster1_4:3000
effective_is_quiesced|false |false |false
nodes_quiesced |0 |0 |0
pending_quiesce |false |false |false
Number of rows: 4
\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~test Namespace Statistics (2021-05-27 01:08:57 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~
Node |aero-cluster1_1:3000|aero-cluster1_3:3000|aero-cluster1_4:3000
effective_is_quiesced|false |false |false
nodes_quiesced |0 |0 |0
pending_quiesce |false |false |false
Number of rows: 4
Proceed with the upgrade or other maintenance needed.
Bring the quiesced node back upโ
Bring the quiesced node back up, and make sure it joins the cluster:
$ sudo systemctl start aerospike
Admin> info network
\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~Network Information (2021-05-27 01:21:44 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~
Node| Node ID| IP| Build|Migrations|\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~Cluster\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~|Client| Uptime
| | | | |Size| Key|Integrity| Principal| Conns|
aero-cluster1_1:3000| BB9211EF53E1600|10.0.3.196:3000|E-5.6.0.3| 1.015 K| 4|39815062484B|True |BB9D9FC68290C00| 6|28:34:44
aero-cluster1_2:3000| BB912EA003E1600|10.0.3.224:3000|E-5.6.0.3| 2.798 K| 4|39815062484B|True |BB9D9FC68290C00| 6|00:00:00
aero-cluster1_3:3000| BB906A7363E1600|10.0.3.149:3000|E-5.6.0.3| 828.000| 4|39815062484B|True |BB9D9FC68290C00| 4|28:34:44
aero-cluster1_4:3000|*BB9D9FC68290C00| 10.0.3.41:3000|E-5.6.0.3| 953.000| 4|39815062484B|True |BB9D9FC68290C00| 6|28:34:44
Wait for migrations to complete | asinfo -v 'cluster-stable:...
โ
Make sure migrations have completed prior to moving on to the next node. This is done using the cluster-stable command. The command should be run on each node in the cluster. The command should return the same cluster key for every node in the cluster. The cluster-stable can be scripted and the results compared programmatically.
For most common cases, migrations at this point would only consist of lead migrations and the time required for completion would be proportional to how long the node has been quiesced.
For situations where the stored data is deleted, or when there is no persisted storage, migrations would need to repopulate the data on the node, which would usually take longer. A cold restart occurring could also have impact on how long migrations would take as a node would take longer to cold restart and could also, in some cases, resurrect previously deleted records.
Admin+> asinfo -v 'cluster-stable:size=4;ignore-migrations=no'
aero-cluster1_4:3000 (10.0.3.41) returned:
5EDF7C44A664
aero-cluster1_2:3000 (10.0.3.224) returned:
5EDF7C44A664
aero-cluster1_3:3000 (10.0.3.149) returned:
5EDF7C44A664
aero-cluster1_1:3000 (10.0.3.196) returned:
5EDF7C44A664
Waiting for migrations to complete in this step is necessary to ensure that when quiescing the next node, the master ownership change will happen as soon as the recluster is done. Quiescing the next node without waiting for migrations to complete would be also possible but then requires to wait for migrations to complete prior to shutting the node to avoid abrupt client and fabric connections cut offs.
Move to the next nodeโ
Repeat those steps on the next node.
Using Quiescence for Extra Durability
Quiescence may also be used to provide extra durability in various scenarios.
For example, in an AP cluster (replication factor 2) in which a node must be taken down, if it is quiesced, a rebalance triggered, and migrations do complete before removing the quiesced node, two full copies will be present when the node is removed. The cluster will then still have all data available if another node accidentally goes down before the first node returns.
Quiescing multiple nodes
Quiescing multiple nodes at the same time can be useful in rack-aware clusters. In such cases, quiescing a whole rack at the same time can speed up maintenance procedures.
In strong-consistency
enabled namespaces, quiescing replication-factor
number of nodes or more will force masterhood handover (after migrations complete) but will result in unavailability when the nodes are eventually shut down (unless if all nodes quiesced are in the same rack).
Quiescing nodes for namespaces with replication factor 1
For namespaces configured with replication-factor
1
, it is necessary to wait for migrations to complete prior to shutting down the server in order for the single copy of each partition owned by that node to have migrated to another node.