Aerospike Vector is now live with LangChain integrationTell me more
Blog

Aerospike Database 5.3: XDR Filtering with Expressions & Expanded Multi-Site Clustering

Aerospike is pleased to announce Aerospike 5.3, which will be available to customers in Q4 of 2020.

headshot Paul Jensen
Paul Jensen
Vice President, Engineering Operations
November 18, 2020|5 min read

Expression-based XDR Filtering

Aerospike 5.3 allows records shipped to XDR destinations to be filtered with the Expressions feature of Aerospike (introduced in 5.2). XDR filters are dynamic, and are defined for each destination data center (DC) on a per namespace basis. They can be set using the new info command ‘xdr-set-filter‘. They can also be set programmatically via a new client API: the next Aerospike Client release will support C, C#, and Java.

XDR filtering confers multiple advantages. Foremost is reducing the volume of data shipped to destinations, which reduces network traffic as well as storage and processing requirements on the destination DC. In a hub and spoke XDR topology (as shown below), the savings from avoiding overprovisioning of the destination clusters can be significant. Furthermore, lower data volumes reduce egress costs when moving data across or from public clouds.

blog-diagram-XDR-Filtering

Expressions in Aerospike have a rich syntax with access to both record data and metadata, permitting precisely crafted filtering policies. A major use case will be for GDPR compliance and similar user privacy regulations. For example, the following Expression demonstrates the enforcement of geographic restrictions on where a record is stored. This pseudocode expression will only ship records to a remote DC if they are tagged as originating from the Netherlands:

Expression exp = Exp.build(Exp.eq(Exp.stringBin("ISOrgn"),Exp.val("NL")))

XDR filtering can be combined with bin projection, which will be applied to the records that pass the filter. This can achieve further reduction of network bandwidth and destination DC storage requirements.

Expanded Multi-Site Clustering

Multi-Site clustering was introduced in Aerospike Database 5. It supports strongly consistent, globally distributed transactions in an Active-Active configuration that provides low-latency reads and guarantees writes will not be lost.

Prior to Aerospike 5.3, the round-trip time (RTT) between Multi-Site Cluster nodes was required to be less than 100 milliseconds. Translating this into a useful metric requires a little math. The speed of light in single-mode fiber is approximately 68% of the speed of light in a vacuum. Rounding this down to 65% to be conservative, a 100 ms round-trip time works out to sites being required to be no more than 6,045 miles (9,728 km) apart. Keeping in mind that long-distance fiber runs won’t follow great circle routes, it is easy to come up with city pairs that fall outside this limit. For example, Singapore and Frankfurt are 6,389 miles apart along the great circle path, a round-trip distance of 12,778 miles (20,564 km), or a minimum 106 ms RTT.

In Aerospike 5.3, the maximum RTT latency is specified with the new dynamic configuration parameter ‘connect-timeout-ms’. This parameter lies in the heartbeat clause and defaults to a value of 500 milliseconds, which corresponds to a distance of 60,450 miles (97,284 km). It must be at least 50 ms and cannot be larger than one-third the product of the heartbeat ‘interval‘ and ‘timeout‘ config values.

The best way to estimate RTT is by direct measurement, for example using the Linux ‘ping’ command. For new clusters where the routes are not yet in place, the following table is provided to make rough estimates of RTT latency, assuming the route between the sites is known. It assumes that land-based links are based on single-mode fiber. Note that geosynchronous satellite links can be assumed to travel at the speed of light, but that the minimum distance is 22,236 miles (35,786 km) each way.

µs/mile

µs/km

vacuum

5.38

3.34

single-mode fiber

8.27

5.14

Minor Features

Aerospike 5.3 includes a number of minor features, the most notable of which are described below. As always, refer to the 5.3 release notes for complete details and restrictions. Some of these features are available only in the Aerospike Enterprise Edition.

  • Fetching Configuration Items from Environment Variables: The following configuration items may now be specified by setting environment variables prior to starting Aerospike server: ‘cert-file‘, ‘key-file‘, ‘key-file-password‘, ‘encryption-key-file‘, ‘query-user-password-file‘, and ‘auth-password-file‘. If the configuration item is a string, then it can be expressed via an environment variablename conforming to the format ‘env:variable_name>’; if binary, it should be encoded as a base64 string and set in an environment variable conforming to the format ‘ env-b64:<variable_name>’.

  • Programmatic Access to Record Size: New methods have been introduced to obtain the current in-memory size in bytes of a record. In an Expression, this is available through the

    as_exp_memory_size() operation. In a user-defined function (UDF), this is available through the record.memory_size() function.

  • max-used-service-threads: Aerospike clusters that have many XDR destination DCs, many service threads, or destination DCs with a large node count can result in an excessive (tens of thousands) number of connections to each node. This new configuration parameter allows the number of service threads used for XDR destinations to be limited. It is specified in the DC sub-clause of the XDR clause, so the limit is applied on a per-DC basis.