Aerospike Cross Datacenter Replication (XDR) for Real-time Mission Critical IoT Data Management

Shahed Mazumder
Global Director, Telco Solutions
September 22, 2022|8 min read

Aerospike’s Cross-Datacenter Replication (XDR) is a highly flexible and magical feature that enables real-time active-active data replication between geographically separated clusters.

In the context of IoT data management, it can eliminate the dependency between sensor data collection points (e.g. at tactical edge closer to the end devices) and data analysis point/s which can be either edge (same as collection point) or the core – an aggregation point of data coming from different edge sites.

The key value proposition of the XDR feature includes:

  • Ease of setup and use

  • Bi-directional filtering capability

  • Ultra-low latency operation

All of which make mission-critical IoT data management simple and efficient.

XDR – What it is and why Aerospike customers love it

The Aerospike XDR feature replicates data asynchronously between two or more clusters that are located at different, geographically distributed sites. Here, a site can refer to a physical rack in a data center, an entire data center, an availability zone in a cloud region, or a cloud region itself.

XDR’s different deployment topologies include one-way shipping cluster configuration, two-way shipping cluster configuration, and three-way shipping cluster configuration as detailed here.


Now, let’s look at what XDR brings to the table.

XDR makes data management simple

Cross-datacenter replication enables dynamic data center addition and configuration – the process of adding/configuring new data centers to the cluster is dynamic and can be performed while the remote clusters are running.

With record shipping based on Last Update Time (LUT), XDR allows data center resynchronizing starting at a user-specified point in time.

Last but not least, by utilizing a connector linked to Aerospike’s change notification mechanism, XDR facilitates data transfer to non-Aerospike data repositories (e.g. ERP, CRM, and inventory data).

XDR is very efficient

Dynamic, fine-grain data control with selective shipment of record components makes XDR very efficient.

Cross-datacenter replication can be configured to ship only the changes to sub-parts (bins) of a record.

Additionally, expressions can be used for filtering updates and inserts and applied to individual records being shipped. Importantly, this can be set up bidirectional – thus making the feature more powerful.

As a direct benefit, this saves on precious network connection costs between sites. This also helps enterprises to comply with global and local data privacy regulatory requirements by allowing control over what data attributes go out vs. what stays in.

XDR enables low-latency execution

With its low latency execution, XDR acts as a key pillar of Aerospike’s “Right Now” ecosystem of offerings.

Cross datacenter replication is an active-active feature that replicates data asynchronously – this means that the master node commits to the “write” transaction right away and then updates the replica node/s via XDR, as fast as it is realistically possible. It’s done this way so Aerospike’s ingestion/write transactions can be performed in the same ultra-low latency manner as with its query/read transactions.

There is no holdup as the replication happens in the background.

The update (of replica node) time is dictated by the physical distance between the sites. In most deployments, it takes less than 25 milliseconds (under the human “real-time perception” threshold).

For a US east coast-west coast deployment scenario, that can be ~100 milliseconds. We have commercial deployments supporting XDR between two global sites across APAC and North America in a couple of hundred milliseconds.

XDR can also be deployed over satellite communication links (more about that in the next section) where the round-trip latency is ~600 milliseconds.

The point here is- whatever latency you encounter for updating the replica node, that’s purely dictated by the law of physics as nobody can overcome the time required to communicate between two geographically dispersed sites.

Cross-datacenter replication does not add any noticeable latency to the process. Rather, it facilitates ultra-low latency “write” transactions by not withholding the commit till the replica site gets that update.

Proxy and rewind features

There are a few other popular features that are game-changing for our customers including Aerospike XDR proxy for configuration in containerized/cloud environments and a variety of rewind options for reliable disaster management.

For further enhancements implemented in XDR, check out our blog post on Aerospike Database 6.1

XDR is mission-critical for most IoT use cases

XDR capabilities are extremely critical for mission-critical IoT use cases, many of which can be found in the autonomous vehicle/drone/ship and Federal/DoD segments.

Depending on the use case, XDR can be deployed between primary and replica sites, between 2 primary edge sites, and between primary edge and core sites.

In the edge-to-core scenario, even if you run your data analysis (including AI/ML modules) at the core, in practical terms, that really is run on real-time data collected at your tactical edge sites and brought to the core without any processing delay.

Thus, XDR enables insight generation at the core from real-time data collected at the edge and then pushes back that insight to the edge to trigger an immediate response there.

In a nutshell, cross datacenter replication dynamically routes data captured at the edge to wherever needed. And, its filtering capabilities allow mission-critical operation at reduced bandwidth which not only saves network cost but also allows less robust communication paths (e.g. satellite links) between sites to address more versatile use cases.

XDR for IoT in action: a Ship in the middle of the ocean

Talking about versatile IoT use cases – let’s consider a cruise ship in the middle of the ocean without any terrestrial network connectivity.

On board the ship, you would need an edge data center and an IoT network that would collect data from the hundreds of thousands of IoT sensors.

Where applicable, you would also process that data on top of available data platform/s there and trigger subsequent action, all on board.

But, for certain other use cases, you would bring that data back to onshore and cross-analyze it against a much larger core dataset that aggregates data coming from all relevant edge sites (e.g. other ships).

In this scenario, XDR would come into play in a few ways.

XDR “just works”

Ships rely on satellite links to transmit data back and forth with onshore command centers that sit at the core. Geostationary orbit (GEO) satellite links incur ~600 msec of round-trip latency to communicate from the earth’s surface to the satellite and back.

In terms of latency tolerance, Aerospike’s XDR feature has already been deployed in production to support a global enterprise’s sites between North America and APAC with about half that latency.

The feature can stand twice as much (or more) latency, which means XDR over GEO satellite links would not be an issue. Neither would it be an issue if the ship uses Low Earth Orbit Satellites (LEOS) links instead, with <100 msec round-trip latency.

XDR transmits are pipelined with 10-sec timeouts; so, the time window required for even GEO satellite communication is tolerable here

XDR reduces network costs

When taking advantage of XDR’s bi-directional filtering capability to forward only relevant data attributes between data clusters, the bit rate (bps) can be kept at a minimum level.

Thus, camera/sensor/telematics/other data gathered on board the ship (edge site) can be transmitted to the command center (core site), which then can push the generated intelligence back to the ship in the lowest possible latency and the highest optimized network cost.

XDR solves data integrity issues

For any mission-critical IoT application, maintaining data integrity is a crucial feature. If there is malfunctioning such as lost connectivity between the IoT endpoints and the edge data site/s (on board), the Aerospike instance deployed in the edge site needs to keep track of that and reintegrate that data once the connection is re-established.

Cross-datacenter replication picks it up from there and offers a mechanism to ensure that the data integrity is maintained during the transmission process as well.

XDR is critical to any IoT use case with moving assets like ships, vehicles, drones, airplanes and submarines

The use case above can be expanded to cover Naval ships, other DoD assets as well as civilian applications like connected vehicles.

Right now, we are developing a PoC to demonstrate Aerospike’s data integrity aspect in a connected vehicle use case.

In the PoC, we intend to show that even if the car loses connectivity while running through a tunnel, data generated inside the car is stored in the Aerospike client and not thrown away.

Once the car comes out of the tunnel and the network connection is reestablished, then the data is transmitted back to the network edge (nearby radio base stations).

From there, through XDR, the data can go back and forth to a core site which can analyze the data and push the intelligence back to the edge.

Any mission-critical IoT use case would basically require all of its moving assets (vehicles, drones, fighter jets, ships, submarines) to constantly be a part of the “right now” ecosystem – from data collection to aggregation to insight generation to triggering action – making everything happen in near real-time.

To realize this vision, the data transactions (both ingestions and queries) need to happen in the sub to low msec range, allowing you to budget only for the unavoidable physical network latency.

Cross-datacenter replication will continue to be a key enabler for this vision and will help drive Aerospike’s adoption in time-sensitive IoT use cases.

Learn more about: