XDR record shipment lifecycle
Overviewโ
This page describes the stages of the XDR record shipment lifecycle, the configuration parameters that control that lifecycle, and the metrics that monitor the stages.
You can start writing records as soon as you enable a namespace. The system logs progress and metrics to the in-memory XDR transaction queue even if the datacenter (DC) is not yet connected. The system starts shiiping as soon as the connection to the DC is established.
XDR architecture is described on the Cross-Datacenter Replication (XDR) architecture page.
XDR record shipment lifecycle, with metricsโ
The XDR shipment lifecycle begins when a record is successfully written in the source partition, then submitted to the per-partition, in-memory XDR transaction queue of each DC.
XDR transaction queueโ
The size of the transaction queue is controlled by transaction-queue-limit
.
If the transaction queue fills faster than XDR can ship to the remote destination, Aerospike switches to recovery mode. After XDR catches up, it switches back to using the transaction queue.
There is a distinct XDR transaction queue for each DC, namespace and partition permutation. This means that each record modified in a namespace enabled for XDR shipping to a remote DC destination is placed according to its partition in the correct XDR transaction queues. Because a namespace may be configured to ship to multiple remote DCs, the record's metadata may be placed in multiple XDR transaction queues simultaneously.
Transaction threadsโ
The various transaction threads manage a record throught the lifecycle. The threads include:
- DC thread - sequentially processes all the partitions at the source node for a specific remote DC. Processes all pending entries in the XDR transaction queues and retry threads.
- service thread - receives record from DC thread, reads it locally and prepares it for shipment
- recovery thread - handles reading updates from the primary index when XDR is in recovery mode
XDR recovery modeโ
When the XDR transaction queue gets full, the queue is dropped and XDR goes into recovery mode. In this mode, XDR scans the full primary index to compare a record's last ship time (LST) to its last update time (LUT)) in order to ship instead of using the transaction queue.
Diagram explanationโ
The following diagram shows the lifecycle of a record from a single partition in a single namespace in a single DC. These processes occur for all partitions and namespaces in a live environment.
- Dashed boxes indicate a process stage.
- Bullets indicate how the metric or stage names are logged.
Each stage is described immediately following the diagram.
Fig. XDR record shipment lifecycle, with metrics
Stage 1 - Write to source partitionโ
After a successful write in the source partition, the record is submitted to the per-partition, in-memory transaction queue of each DC.
- If a duplicate key is found in the XDR transaction queue within the time specified in
hot-key-ms
, the duplicate is not inserted into the queue. This reduces the number of times the same version of a hot key ships.
Stage 2 - Read from queueโ
The DC thread picks the record from the transaction queue and forwards it to the service thread.
- The run period of the DC thread is controlled by
period-ms
. If there is a delay in shipping a write transaction,period-ms
is the likely source. - If the recordโs last update time (LUT) is within the time specified in
delay-ms
from the current time, the DC thread skips it, and does not yet hand it to a service thread. - (strong consistency mode) The
sc-replication-wait-ms
parameter provides a default delay in strong consistency mode (SC mode) to prevent XDR from attempting to ship records before their initial replication is complete.
Stage 3 - Send to service thread and remote DCโ
The service thread reads the record locally, prepares the record for shipment, and ships it to the remote DC. The max-throughput
to the remote destination is applied on a per-DC, per-namespace basis.
- In SC mode, if the record is unreplicated, XDR triggers a re-replication by inserting a transaction into the internal transaction queue. A service thread picks up the transaction from the internal transaction queue and checks if the transaction timed out.
- If it timed out, it does not re-replicate and XDR does not ship the record to the destination. A future client read/write will trigger re-replication and may succeed in shipping to destination.
- If not timed out, the service thread re-replicates the record. The re-replication makes an entry into the XDR in-memory transaction queue. This re-replication also may time out and leave the record unreplicated. However, the entry in the XDR in-memory transaction queue triggers another round of re-replication immediately.
Stage 4 - Send responseโ
The remote DC attempts to write the record and returns the completion state of the transaction to the source DC. The completion state can be success, temporary failure (key busy or device overload), or permanent error (such as record too big).
Stage 5 - Retry if necessaryโ
If there is a timeout or temporary error, the service thread sends the record to a retry queue.
Transaction delaysโ
XDR configuration parameters control various stages of the record lifecycle where delays may be necessary because XDR is not able to keep up with writes.
The following table lists the configuration parameters that control the record lifecycyle.
Parameter | Description | Notes |
---|---|---|
transaction-queue-limit | Maximum number of elements allowed in XDR's in-memory transaction queue per partition, per namespace, per DC. | Default: 16*1024 = 16384 Minimum: 1024 Maximum: 1048576 |
hot-key-ms | Period in milliseconds to wait in between shipping hotkeys. Controls the frequency at which hot-keys are processed. Good in situations where the XDR transaction queue is potentially large. This avoids having to check across the whole queue for a potential previous entry corresponding to the current incoming transaction. | Minimum: 0, Maximum: 5000. |
period-ms | Period in milliseconds at which the DC-thread processes partitions for XDR shipment. | Default: 100ms. |
delay-ms | Period in milliseconds as an artificial delay on shipment of all records, including hotkeys and records that are not hot. Forces records to wait the configured period before processing. | Minimum: 0, Maximum: 5000. Must be less than or equal to the value of hot-key-ms . |
sc-replication-wait-ms | Number of milliseconds that XDR waits before dequeuing a record from the in-memory transaction queue of a namespace configured for SC. Prevents records from shipping before fully replicated. | Minimum: 5, Maximum: 1000 |
max-throughput | Number of records per second to ship using XDR. | Must be in increments of 100 (such as 100, 200, 1000) |
Metrics of XDR processesโ
Lifecycle stage | Metric name | Description |
---|---|---|
Waiting for processing and processing started | in_queue in_progress | The record is in the XDR in-memory queue and is awaiting shipment.-XDR is actively shipping the record. |
Retrying | retry_conn_reset retry_dest retry_no_node | A transient stage. The shipment is being retried due to some error. Retries continue until one of the completed states is achieved. |
Recovering | recoveries recoveries_pending | A transient stage that indicates internal-to-XDR "housekeeping". If XDR cannot find record keys in its in-memory transaction queue, it reads the primary index to recover those keys for processing. |
Completed | success abandoned not_found filtered_out | Shipment is complete. The state of shipment is either success or one that indicates a non-success result. |