XDR record shipment lifecycle
This page describes the stages of Aerospike's Cross Datacenter Replication (XDR) record shipment lifecycle, the configuration parameters that control that lifecycle, and the metrics that monitor the stages.
Overviewโ
XDR delivers data asynchronously from an Aerospike source cluster to a variety of destinations.
- You can start writing records as soon as you enable a namespace.
- The system logs and collects statistics on the progress through the in-memory XDR transaction queue even if the datacenter is not yet connected.
- The system starts shipping as soon as the connection to the datacenter is established.
For more details see the XDR architecture page.
Version changesโ
- XDR does not impose any version requirement across datacenters.
- Starting with Database 7.2,
ship-versions-policy
controls how XDR ships versions of modified records when facing lag between the source cluster and a destination. ship-versions-interval
specifies a time window in seconds within which XDR is allowed to skip versions.- Starting with Database 5.7, the XDR subsystem always accepts dynamic configuration commands because it is always initialized even if XDR is not configured statically.
- Prior to Database 5.7, the XDR subsystem must be configured statically with at least one datacenter. For information, see Static XDR configuration.
XDR record shipment lifecycleโ
The XDR shipment lifecycle begins when a record is successfully written in the source partition, then submitted to the per-partition, in-memory XDR transaction queue of each datacenter.
XDR transaction queueโ
The size of the transaction queue is controlled by transaction-queue-limit
.
If the transaction queue fills faster than XDR can ship to the remote destination, Aerospike switches to recovery mode. After XDR catches up, it switches back to using the transaction queue.
There is a distinct XDR transaction queue for each datacenter, namespace and partition permutation. This means that each record modified in a namespace enabled for XDR shipping is placed according to its partition in the correct XDR transaction queues. Because a namespace may be configured to ship to multiple remote destinations, the record's metadata may be placed in multiple XDR transaction queues simultaneously.
Transaction threadsโ
The various transaction threads manage a record throughout the lifecycle. The threads include:
dc
thread - sequentially processes all the partitions at the source node for a specific remote datacenter. This thread processes all pending entries in the XDR transaction queues and retry threads.- service thread - receives record from
dc
thread, reads it locally and prepares it for shipment. - recovery thread - reads updates from the primary index when XDR is in recovery mode.
XDR recovery modeโ
When the XDR transaction queue is full, the queue is dropped and XDR goes into recovery mode. In this mode, XDR scans the full primary index to compare a record's last ship time (LST) to its last update time (LUT) instead of using the transaction queue.
Diagram explanationโ
The following diagram shows the lifecycle of a record from a single partition in a single namespace in a single datacenter. These processes occur for all partitions and namespaces in a live environment.
- Dashed boxes indicate a process stage.
- Bullets indicate how the metric or stage names are logged.
Each stage is described immediately following the diagram.
Fig. 1 - XDR record shipment lifecycle, with metrics
Stage 1 - Write to source partitionโ
After a successful write in the source partition, the record is submitted to the per-partition, in-memory transaction queue of each datacenter.
- If a duplicate key is found in the XDR transaction queue within the time specified in
hot-key-ms
, the duplicate is not inserted into the queue. This reduces the number of times the same version of a hot key ships.
Stage 2 - Read from queueโ
The dc
thread picks the record from the transaction queue and forwards it to the service thread.
- The run period of the
dc
thread is controlled byperiod-ms
. If there is a delay in shipping a write transaction,period-ms
is the likely source. - If the recordโs last update time (LUT) is within the time specified in
delay-ms
from the current time, thedc
thread skips it, and does not yet hand it to a service thread. - Strong consistency mode - The
sc-replication-wait-ms
parameter provides a default delay in strong consistency mode (SC mode) to prevent XDR from attempting to ship records before their initial replication is complete.
Stage 3 - Send to service thread and remote destinationโ
The service thread reads the record locally, prepares the record for shipment, and ships it to the remote destination. The max-throughput
to the remote destination is applied on a per-datacenter, per-namespace basis.
- In SC mode, if the record is unreplicated, XDR triggers a re-replication by inserting a transaction into the internal transaction queue. A service thread picks up the transaction from the internal transaction queue and checks if the transaction timed out.
- If it timed out, it does not re-replicate and XDR does not ship the record to the destination. A future client read/write will trigger re-replication and may succeed in shipping to destination.
- If not timed out, the service thread re-replicates the record. The re-replication makes an entry into the XDR in-memory transaction queue. This re-replication also may time out and leave the record unreplicated. However, the entry in the XDR in-memory transaction queue triggers another round of re-replication immediately.
Stage 4 - Send responseโ
The remote destination attempts to write the record and returns the completion state of the transaction to the source datacenter. The completion state can be success, temporary failure (key busy or device overload), or permanent error (such as record too big).
Stage 5 - Retry if necessaryโ
If there is a timeout or temporary error, the service thread sends the record to a retry queue.
Transaction delaysโ
XDR configuration parameters control various stages of the record lifecycle where delays may be necessary because XDR is not able to keep up with writes.
The following table lists the configuration parameters that control the record lifecycyle.
Parameter | Description | Notes |
---|---|---|
transaction-queue-limit | Maximum number of elements allowed in XDR's in-memory transaction queue per partition, per namespace, per datacenter. | Default: 16*1024 = 16384 Minimum: 1024 Maximum: 1048576 |
hot-key-ms | Period in milliseconds to wait in between shipping hotkeys. Controls the frequency at which hot-keys are processed. Good in situations where the XDR transaction queue is potentially large. This avoids having to check across the whole queue for a potential previous entry corresponding to the current incoming transaction. | Minimum: 0, Maximum: 5000. |
period-ms | Period in milliseconds at which the dc -thread processes partitions for XDR shipment. | Default: 100ms. |
delay-ms | Period in milliseconds as an artificial delay on shipment of all records, including hotkeys and records that are not hot. Forces records to wait the configured period before processing. | Minimum: 0, Maximum: 5000. Must be less than or equal to the value of hot-key-ms . |
sc-replication-wait-ms | Number of milliseconds that XDR waits before dequeuing a record from the in-memory transaction queue of a namespace configured for SC. Prevents records from shipping before fully replicated. | Minimum: 5, Maximum: 1000 |
max-throughput | Number of records per second to ship using XDR. | Must be in increments of 100 (such as 100, 200, 1000) |
XDR version shipping controlโ
Database 7.2 introduces two configuration parameters to control dynamically how XDR ships versions of records.
ship-versions-policy
controls how XDR ships versions of modified records between the source cluster and a destination.ship-versions-interval
specifies a time window in seconds within which XDR is allowed to skip versions.
XDR keeps track of the unique identifier (digest) of records that are modified (inserted, updated, deleted), expire, or are evicted. When it's time to ship a record, XDR reads it from storage and ships it to the destination. If there are multiple updates to the same record in a short period of time, XDR achieves maximum throughput by shipping only the latest version of the record. While this keeps two remote Aerospike Database clusters in sync, it also prevents XDR from shipping all versions of a record, which may be necessary when Aerospike is shipping data to an Aerospike or a non-Aerospike destination through a connector, such as Kafka, Pulsar, or JMS.
ship-versions-policy: latestโ
This is the default policy. It allows the latest write to be shipped to the destination. This is the behavior in server versions prior to 7.2.
ship-versions-policy: allโ
This policy ships all writes to the destination. Subsequent writes of the record are delayed or blocked until the
previous write is shipped to the destination. For blocked writes, the server returns AS_ERR_XDR_KEY_BUSY 32
error to the client.
Similar to the interval policy, in low-throughput use cases this policy can ensure that the destination receives every update to every record. The operator does need to consider the backpressure implications when lag between source and destination increases.
If ship-versions-policy
is all
, delay-ms
must be 0. delay-ms
cannot exceed the time window specified by ship-versions-interval
.
This configuration guarantees that there are no two writes on the record at the same time, so there won't be any write hotkeys in the system.
ship-versions-policy: intervalโ
This policy guarantees shipping of at least one version in this interval. If there are multiple writes within an interval, the last write in the interval is definitely shipped.
The value of the interval is set in ship-versions-interval
in the XDR's DC namespace
subsection. ship-versions-interval
takes value between 1 and 3600 seconds. The default value is 60 seconds.
ship-versions-policy
is a relaxation over the all policy. It allows the write on the record to proceed even if the previous write is not shipped to the destination, if they belong to the same interval. The interval policy is a good balance between the latest
and all
policies. It allows the writes to proceed even if the previous write is not shipped to the destination if they belong to the same interval.
Limitationsโ
Writes may be blocked or delayedโ
Writes may be blocked or delayed until the previous version of a record is shipped. This is more noticeable with hot keys when the ship-versions-policy
is set to all
, as XDR needs to ship the prior version of the record before allowing a subsequent write to the same key.
As a workaround, you can change the ship-versions-policy
to interval
with an appropriate interval granularity.
Rewind may have blocking impactโ
If ship-versions-policy
is set during rewind of the namespace, it can block all the writes until the first round of recovery is complete. See Rewind a shipment for details.
Impact of ship-versions-policy on multiple DCsโ
When configuring multiple datacenters for XDR, writes may be blocked if the ship-versions-policy
is set to either all
or interval
in any of the datacenters. This occurs because the Aerospike server processes writes globally, rather than independently for each datacenter.
If the previous version of a record has not been shipped to one datacenters due to the configured ship-versions-policy
, subsequent writes to that record are blocked until the pending version is successfully shipped. As a result, even if shipping to only one datacenter is delayed, it can impact the shipping process to all other datacenters.
Rescueโ
In certain scenarios, XDR may fail to ship deletes even when this feature is enabled. For example, if a record is expired or evicted, and then the record is rewritten before the NSUP thread can convert the deleted record into an XDR tombstone. When that happens the primary index element of the expired or evicted record will be reused by the new write. If the tombstone is not created, the non-durable delete caused by expiry or eviction is not shipped.
Metrics of XDR processesโ
Lifecycle stage | Metric name | Description |
---|---|---|
Waiting for processing and processing started | in_queue in_progress | The record is in the XDR in-memory queue and is awaiting shipment.-XDR is actively shipping the record. |
Retrying | retry_conn_reset retry_dest retry_no_node | A transient stage. The shipment is being retried due to some error. Retries continue until one of the completed states is achieved. |
Recovering | recoveries recoveries_pending | A transient stage that indicates internal-to-XDR "housekeeping". If XDR cannot find record keys in its in-memory transaction queue, it reads the primary index to recover those keys for processing. |
Completed | success abandoned not_found filtered_out | Shipment is complete. The state of shipment is either success or one that indicates a non-success result. |