# Quiesce a node

This page describes how to [`quiesce`](https://aerospike.com/docs/database/reference/info#quiesce) a node so that the node is “quiesced” in the next cluster rebalance.

When rebalancing, a quiesced node is placed at the end of every partition’s [succession list](https://aerospike.com/docs/database/learn/architecture/clustering/data-distribution#succession-list). This can cause a quiesced node that was previously master to handoff master status to a non-quiesced node. This handoff happens only if the non-quiesced node and the quiesced node have matching full partition versions. Such a handoff is the goal of using quiescence in one particular use case.

-   In some ways, a quiesced node behaves like it is in the cluster. It accepts transactions, schedules migrations if appropriate, and counts towards determining that a partition is available for [strong consistency (SC)](https://aerospike.com/docs/database/reference/config#namespace__strong-consistency).
    
-   One difference in how a quiesced node behaves is that it never drops its data, even if all migrations complete and leave it with a superfluous partition version. It is assumed that the quiesced node will be taken down and returned to its prior place as a replica. Keeping data means it will not take as long to re-sync, needing only a “delta” migration instead of a “fill” migration.
    

## Using quiescence for a smooth master handoff

A common use for quiescence is to enable smooth master handoffs.

Normally, when a node is removed from a cluster, it takes a few seconds for the remaining nodes to re-cluster and determine a new master for the partitions that had been master on the removed node. This period of time is the _master gap_. Some transactions, such as writes and [SC](https://aerospike.com/docs/database/reference/config#namespace__strong-consistency) reads, may have timeouts shorter than the master gap and will not find a master node during this timeout. AP reads retry against a replica by default.

Quiescing the node to be removed can fill this gap. If it is first quiesced, and a rebalance is triggered, the master handoff will occur during the rebalance. The quiesced node will continue to receive transactions and proxy them to the new master until all the clients have discovered the new master and moved to it. After this has happened, the quiesced node can be taken down. The re-clustering will not have a master gap and the burst of timeouts should instead become a burst of proxies. For more information, see the proxy-related metrics such as [`client_proxy_complete`](https://aerospike.com/docs/database/reference/metrics#namespace__client_proxy_complete).

::: note
The term ‘master gap’ refers to the time elapsed between the current master node in the client partition map becoming unreachable and a new master node assuming masterhood. Once a new master has been elected, client transactions can be proxied within the cluster to the new master node. When the client has tended the remaining nodes (after the `tend interval` has elapsed) transactions will be sent directly to the new master. The maximum master gap can be calculated and would be expressed as follows:

([`timeout`](https://aerospike.com/docs/database/reference/config#network__timeout) x [`interval`](https://aerospike.com/docs/database/reference/config#network__interval) + [`quantum interval`](https://support.aerospike.com/s/article/FAQ-What-is-the-Quantum-Interval-and-how-does-it-affect-cluster-reformation-time)

With default values this would resolve to:

(10 x 150) + (1500 x 0.2) = 1800ms or **1.8s**
:::

## Rolling upgrade procedure

### Verify the cluster is stable | `asadm -e 'info network'`

Using the [`info network`](https://aerospike.com/docs/database/tools/asadm/live-mode/#network) asadm command, ensure there are no migrations, all the nodes are in the cluster, and all nodes show the same key.

```plaintext
Admin> info network

\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~Network Information (2021-05-27 01:21:44 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~

                Node|         Node ID|             IP|    Build|Migrations|\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~Cluster\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~|Client|  Uptime

                    |                |               |         |          |Size|         Key|Integrity|      Principal| Conns|

aero-cluster1_1:3000| BB9211EF53E1600|10.0.3.196:3000|E-5.6.0.3|   0.000  |   4|39815062484B|True     |BB9D9FC68290C00|     6|28:34:44

aero-cluster1_2:3000| BB9211EF53E1600|10.0.3.196:3000|E-5.6.0.3|   0.000  |   4|39815062484B|True     |BB9D9FC68290C00|     4|28:34:44

aero-cluster1_3:3000| BB906A7363E1600|10.0.3.149:3000|E-5.6.0.3|   0.000  |   4|39815062484B|True     |BB9D9FC68290C00|     4|28:34:44

aero-cluster1_4:3000|*BB9D9FC68290C00| 10.0.3.41:3000|E-5.6.0.3|   0.000  |   4|39815062484B|True     |BB9D9FC68290C00|     6|28:34:44

Number of rows: 4
```

### Issue the quiesce command | `manage quiesce`

The [`quiesce`](https://aerospike.com/docs/database/reference/info#quiesce) command can be issued from [`asadm`](https://aerospike.com/docs/database/tools/asadm/live-mode#quiesce) or directly through [`asinfo`](https://aerospike.com/docs/database/tools/asinfo). If using [`asadm`](https://aerospike.com/docs/database/tools/asadm/live-mode#quiesce), it should be directed to the node to be quiesced using the `with` modifier, specifying the IP address or node ID of the node to be quiesced:

::: note
[Tools package 6.0.x](https://aerospike.com/docs/database/tools/release-notes#tools-603) or later is required to use asadm’s [manage quiesce](https://aerospike.com/docs/database/tools/asadm/live-mode#quiesce) command. Otherwise, use the equivalent [asinfo - quiesce](https://aerospike.com/docs/database/reference/info#quiesce) command.
:::

```text
Admin+> manage quiesce with 10.0.3.224

\~\~\~Quiesce Nodes\~\~\~\~

       Node|Response

ubuntu:3000|ok

Number of rows: 1
```

Verify the command has been successful by checking the [`pending_quiesce`](https://aerospike.com/docs/database/reference/metrics#namespace__pending_quiesce) statistic:

```plaintext
Admin+> show statistics like pending_quiesce

\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~bar Namespace Statistics (2021-05-27 00:53:40 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~

Node           |aero-cluster1_1:3000|aero-cluster1_2:3000|aero-cluster1_3:3000|aero-cluster1_4:3000

pending_quiesce|false               |true                |false               |false

Number of rows: 2

\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~test Namespace Statistics (2021-05-27 00:53:40 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~

Node           |aero-cluster1_1:3000|aero-cluster1_2:3000|aero-cluster1_3:3000|aero-cluster1_4:3000

pending_quiesce|false               |true                |false               |false

Number of rows: 2
```

::: note
-   Delaying fill migrations is a good practice in many common situations (independent of quiescing). See the [`migrate-fill-delay`](https://aerospike.com/docs/database/reference/config#service__migrate-fill-delay) configuration parameter for details.
    
-   If the `manage quiesce` command has been issued to the wrong node, use `manage quiesce undo` to revert it.
    
-   Quiescing happens for all namespaces on a node at the same time. Quiescing individual namespaces is not supported.
:::
::: note
If the [`quiesce`](https://aerospike.com/docs/database/reference/info#quiesce) command is inadvertently issued against all the nodes in the cluster, the subsequent [`recluster`](https://aerospike.com/docs/database/reference/info#recluster) command will be ignored:

```plaintext
WARNING (partition): (partition_balance_ee.c:435) {test} can't quiesce all nodes - ignoring
```
:::

### Issue a recluster command | `manage recluster`

This is the step where quiesced masters handoff to other nodes and migrations start.

Issue a [`recluster`](https://aerospike.com/docs/database/tools/asadm/live-mode#recluster) command:

::: note
[Tools package 6.0.x](https://aerospike.com/docs/database/tools/release-notes/#tools-603) or later is required to use asadm’s \`manage recluster“command. Otherwise, use the equivalent [asinfo - recluster](https://aerospike.com/docs/database/reference/info#recluster) command.
:::

```plaintext
Admin+> manage recluster

Successfully started recluster
```

Verify command was successful by checking the [`effective_is_quiesced`](https://aerospike.com/docs/database/reference/metrics#namespace__effective_is_quiesced) and [`nodes_quiesced`](https://aerospike.com/docs/database/reference/metrics#namespace__nodes_quiesced) statistics:

```plaintext
Admin+> show statistics like quiesce

\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~bar Namespace Statistics (2021-05-27 01:08:57 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~

Node                 |aero-cluster1_1:3000|aero-cluster1_2:3000|aero-cluster1_3:3000|aero-cluster1_4:3000

effective_is_quiesced|false               |true                 |false              |false

nodes_quiesced       |1                   |1                    |1                  |1

pending_quiesce      |false               |true                 |false              |false

Number of rows: 4

\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~test Namespace Statistics (2021-05-27 01:08:57 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~

Node                 |aero-cluster1_1:3000|aero-cluster1_2:3000|aero-cluster1_3:3000|aero-cluster1_4:3000

effective_is_quiesced|false               |true                 |false              |false

nodes_quiesced       |1                   |1                    |1                  |1

pending_quiesce      |false               |true                 |false              |false

Number of rows: 4
```

### Check that no more transactions are hitting the quiesced node and proxies counters are as expected

**1) The quiesced node should not be receiving any active traffic**

A few seconds should be enough to be sure all clients have “moved” from the quiesced node to the new masters.

The [`asadm`](https://aerospike.com/docs/database/tools/asadm) `show latencies` command can be used to check that the read and write throughput to the quiesced node is down to zero. Usual metrics can be used to verify other type of transactions, such as batch, have also stopped against the node to be quiesced.

```plaintext
Admin+> show latencies

\~\~\~\~\~\~\~\~\~\~\~Latency  (2021-05-27 01:12:04 UTC)\~\~\~\~\~\~\~\~\~\~

Namespace|Histogram|                Node|ops/sec|>1ms|>8ms|>64ms

test     |read     |aero-cluster1_1:3000|    9.3| 0.0| 0.0|  0.0

test     |read     |aero-cluster1_2:3000|    0.0| 0.0| 0.0|  0.0

test     |read     |aero-cluster1_3:3000|    7.5| 0.0| 0.0|  0.0

test     |read     |aero-cluster1_4:3000|    8.4| 0.0| 0.0|  0.0

         |         |                    |   25.2| 0.0| 0.0|  0.0

test     |write    |aero-cluster1_1:3000|    2.0| 0.0| 0.0|  0.0

test     |write    |aero-cluster1_2:3000|    0.0| 0.0| 0.0|  0.0

test     |write    |aero-cluster1_3:3000|    3.2| 0.0| 0.0|  0.0

test     |write    |aero-cluster1_4:3000|    4.5| 0.0| 0.0|  0.0

         |         |                    |    9.9| 0.0| 0.0|  0.0

Number of rows: 8
```

::: note
For Database 5.7.0 and earlier, client libraries send SI query transactions to all nodes in the cluster. If the workload involves SI queries, a quiesced node shows a value in the ops/sec column of query histograms. This can be safely ignored. The quiesced node owns no partitions and does not return any records, though it tracks query progress. Older client libraries that do not query per partition exhibit the same behavior against Database 6.0.0 and later.
:::

**2) The quiesced node should no longer be doing proxy**

There would typically be a second or two of proxy transactions on the node that was quiesced as clients retrieve the updated partition map and start directing transactions to the new master nodes for the partitions previously owned by the quiesced nodes. It is a good practice to also monitor for proxies transactions to stop on the quiesced node prior to shutting it down.

On the quiesced node, confirm that the following statistics are not incrementing. For details regarding the metrics, refer to the [Metrics Reference](https://aerospike.com/docs/database/reference/metrics) page.

```plaintext
Admin+> show statistics like client_proxy

\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~bar Namespace Statistics (2021-05-27 01:17:48 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~

Node                 |aero-cluster1_1:3000|aero-cluster1_2:3000|aero-cluster1_3:3000|aero-cluster1_4:3000

client_proxy_complete|0                   |20                  |0                   |0

client_proxy_error   |0                   |0                   |0                   |0

client_proxy_timeout |0                   |0                   |0                   |0

Number of rows: 4

\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~test Namespace Statistics (2021-05-27 01:17:48 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~

Node                 |aero-cluster1_1:3000|aero-cluster1_2:3000|aero-cluster1_3:3000|aero-cluster1_4:3000

client_proxy_complete|0                   |10                  |0                   |0

client_proxy_error   |0                   |0                   |0                   |0

client_proxy_timeout |0                   |0                   |0                   |0

Number of rows: 4
```

```plaintext
Admin+> show statistics like batch_sub_proxy

\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~bar Namespace Statistics (2021-05-27 01:17:48 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~

Node                    |aero-cluster1_1:3000|aero-cluster1_2:3000|aero-cluster1_3:3000|aero-cluster1_4:3000

batch_sub_proxy_complete|0                   |20                  |0                   |0

batch_sub_proxy_error   |0                   |0                   |0                   |0

batch_sub_proxy_timeout |0                   |0                   |0                   |0

Number of rows: 4

\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~test Namespace Statistics (2021-05-27 01:17:48 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~

Node                    |aero-cluster1_1:3000|aero-cluster1_2:3000|aero-cluster1_3:3000|aero-cluster1_4:3000

batch_sub_proxy_complete|0                   |20                  |0                   |0

batch_sub_proxy_error   |0                   |0                   |0                   |0

batch_sub_proxy_timeout |0                   |0                   |0                   |0

Number of rows: 4
```

**3) The non-quiesced nodes should not be a destination of proxy transactions**

On the nodes that were not quiesced, check statistics for the transactions that confirm that the destination of proxy transactions. These statistics would be the ones that are beginning with `from_proxy`.

```plaintext
Admin > show statistics like from_proxy_read

Admin > show statistics like from_proxy_write

Admin > show statistics like from_proxy_batch_sub
```

You can monitor the proxy transactions on the [client transaction metric log line](https://aerospike.com/docs/database/reference/logs). You can also dynamically enable the [proxy](https://aerospike.com/docs/database/observe/latency#proxy-transaction-analysis) histogram and monitor their throughput using the [Log latency tool](https://github.com/aerospike-examples/asloglatency).

### Take down the quiesced node and proceed with the upgrade or maintenance

```plaintext
$ sudo systemctl stop aerospike
```

Verify the node has stopped and the cluster is now showing one less node:

```plaintext
Admin> info network

\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~Network Information (2021-05-27 01:21:44 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~

                Node|         Node ID|             IP|    Build|Migrations|\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~Cluster\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~|Client|  Uptime

                    |                |               |         |          |Size|         Key|Integrity|      Principal| Conns|

aero-cluster1_1:3000| BB9211EF53E1600|10.0.3.196:3000|E-5.6.0.3|   0.000  |   3|39815062484B|True     |BB9D9FC68290C00|     6|28:34:44

aero-cluster1_3:3000| BB906A7363E1600|10.0.3.149:3000|E-5.6.0.3|   0.000  |   3|39815062484B|True     |BB9D9FC68290C00|     4|28:34:44

aero-cluster1_4:3000|*BB9D9FC68290C00| 10.0.3.41:3000|E-5.6.0.3|   0.000  |   3|39815062484B|True     |BB9D9FC68290C00|     6|28:34:44

Number of rows: 3
```

The [`nodes_quiesced`](https://aerospike.com/docs/database/reference/metrics#namespace__nodes_quiesced) statistic is now back to 0:

```plaintext
Admin+> show statistics like quiesce

\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~bar Namespace Statistics (2021-05-27 01:08:57 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~

Node                 |aero-cluster1_1:3000|aero-cluster1_3:3000|aero-cluster1_4:3000

effective_is_quiesced|false               |false                 |false

nodes_quiesced       |0                   |0                     |0

pending_quiesce      |false               |false                 |false

Number of rows: 4

\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~test Namespace Statistics (2021-05-27 01:08:57 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~

Node                 |aero-cluster1_1:3000|aero-cluster1_3:3000|aero-cluster1_4:3000

effective_is_quiesced|false               |false                 |false

nodes_quiesced       |0                   |0                     |0

pending_quiesce      |false               |false                 |false

Number of rows: 4
```

**Proceed with the upgrade or other maintenance needed.**

### Bring the quiesced node back up

Bring the quiesced node back up, and make sure it joins the cluster:

```plaintext
$ sudo systemctl start aerospike
```

```plaintext
Admin> info network

\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~Network Information (2021-05-27 01:21:44 UTC)\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~

                Node|         Node ID|             IP|    Build|Migrations|\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~Cluster\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~|Client|  Uptime

                    |                |               |         |          |Size|         Key|Integrity|      Principal| Conns|

aero-cluster1_1:3000| BB9211EF53E1600|10.0.3.196:3000|E-5.6.0.3|   1.015 K|   4|39815062484B|True     |BB9D9FC68290C00|     6|28:34:44

aero-cluster1_2:3000| BB912EA003E1600|10.0.3.224:3000|E-5.6.0.3|   2.798 K|   4|39815062484B|True     |BB9D9FC68290C00|     6|00:00:00

aero-cluster1_3:3000| BB906A7363E1600|10.0.3.149:3000|E-5.6.0.3|   828.000|   4|39815062484B|True     |BB9D9FC68290C00|     4|28:34:44

aero-cluster1_4:3000|*BB9D9FC68290C00| 10.0.3.41:3000|E-5.6.0.3|   953.000|   4|39815062484B|True     |BB9D9FC68290C00|     6|28:34:44
```

### Wait for migrations to complete | `asinfo -v 'cluster-stable:...`

Make sure migrations have completed prior to moving on to the next node. This is done using the [cluster-stable](https://aerospike.com/docs/database/reference/info#cluster-stable) command. The command should be run on each node in the cluster. The command should return the same cluster key for every node in the cluster. The [cluster-stable](https://aerospike.com/docs/database/reference/info#cluster-stable) can be scripted and the results compared programmatically.

\*\*For most common cases, migrations at this point would only consist of [lead migrations](https://aerospike.com/docs/database/reference/metrics#namespace__migrate_tx_partitions_lead_remaining) and the time required for completion would be proportional to how long the node has been quiesced.

For situations where the stored data is deleted, or when there is no persisted storage, migrations must repopulate the data on the node, which would usually take longer. A [cold restart](https://aerospike.com/docs/database/manage/database/cold-start) occurring could also have impact on how long migrations would take as a node would take longer to cold restart and could also, in some cases, resurrect previously deleted records.

::: note
In Database 7.2.0 and later, `ignore-migrations` takes only true/false values in the following command. Prior to Database 7.2.0, `ignore-migrations=` can take yes/no or true/false.
:::

```plaintext
Admin+> asinfo -v 'cluster-stable:size=4;ignore-migrations=false'

aero-cluster1_4:3000 (10.0.3.41) returned:

5EDF7C44A664

aero-cluster1_2:3000 (10.0.3.224) returned:

5EDF7C44A664

aero-cluster1_3:3000 (10.0.3.149) returned:

5EDF7C44A664

aero-cluster1_1:3000 (10.0.3.196) returned:

5EDF7C44A664
```

::: note
Waiting for migrations to complete in this step is necessary to ensure that when quiescing the next node, the master ownership change happens as soon as the recluster is done. Quiescing the next node without waiting for migrations to complete is possible, but then requires to wait for migrations to complete prior to shutting the node to avoid abrupt client and fabric connections cut offs.
:::

### Move to the next node

Repeat those steps on the next node.

## Using Quiescence for Extra Durability

Quiescence may also be used to provide extra durability in various scenarios.

For example, in an AP cluster replication factor 2 (RF2) in which a node must be taken down, if it is quiesced, a rebalance triggered, and migrations do complete before removing the quiesced node, two full copies will be present when the node is removed. The cluster will then still have all data available if another node accidentally goes down before the first node returns.

## Quiescing multiple nodes

Quiescing multiple nodes at the same time can be useful in [rack-aware](https://aerospike.com/docs/database/manage/namespace/rack-aware) clusters. In such cases, quiescing a whole rack at the same time can speed up maintenance procedures.

In SC enabled namespaces, quiescing the [`replication-factor`](https://aerospike.com/docs/database/reference/config#namespace__replication-factor) number of nodes or more will force masterhood handover (after migrations complete) but will result in unavailability when the nodes are eventually shut down (unless if all nodes quiesced are in the same rack).

## Quiescing nodes for namespaces with RF1

For namespaces configured with RF1, it is necessary to wait for migrations to complete prior to shutting down the server in order for the single copy of each partition owned by that node to have migrated to another node.