XDR operations
Cross Datacenter Replication (XDR) enables data synchronization across Aerospike clusters. While powerful, it introduces additional resource considerations that vary based on traffic, topology, and configuration.
How does XDR impact the network?
XDR introduces network traffic between clusters, resulting in the following:
- The amount of traffic can be much higher than the incoming network load from clients recovering after a back off.
- Every write, replicated only in strong consistency mode, is sent across the network to the destination DCs.
- Bandwidth usage increases with write volume and the number of destinations.
Does XDR increase storage load?
XDR typically does not increase storage I/O load under normal conditions. It leverages the post-write-cache
to read records from memory without hitting storage. However, if XDR falls behind or is recovering, it may do the following:
- Read records from storage as it scans partitions as part of the recovery process
- Cause unexpected I/O load during backlog catch-up
How does XDR consume memory?
XDR temporarily stores a record’s digest and LUT (Last Updated Time) in memory queues, using 25 bytes per entry. Memory is consumed as follows:
- Per partition (cluster size and replication factor determine the number of partitions that each node owns)
- Per namespace
- Per destination DC
These queues are capped by transaction-queue-limit
.
Does XDR affect CPU usage?
- XDR typically compresses data to optimize network usage, increasing CPU load.
- When you have multiple DCs for a single namespace, each destination DC receives its own independently compressed stream. Adding another DC to the same namespace increases CPU usage.
- CPU pressure may be noticeable when:
- High write throughput is combined with multiple DCs
- A new DC is added
Best practices summary
- Monitor network throughput and ensure sufficient capacity for inter-DC traffic.
- Watch for CPU utilization spikes when adding DCs or increasing replication volume.
- Use
post-write-cache
to avoid storage read amplification, but be aware of recovery scenarios. - Tune
transaction-queue-limit
if queue pressure is observed, but be aware of memory limits.