Aerospike Database 6.1: Moving forward on query and data distribution

August 30, 2022 | 6 min read

Director of Product

Secondary index improvements and enhanced throughput for XDR

Aerospike is proud to announce Aerospike Database 6.1, providing additional secondary index features and enhanced Cross-Datacenter Replication (XDR) throughput for rehydration and recovery use cases, is now Generally Available (GA). This release builds on our 6.0 and 5.X releases, where secondary indexes were re-architected and the XDR subsystem improved to enable fine-grained control and active-active datacenter configurations.

This release is a big step forward in support of more complex queries for real-time analytics as well as further establishing our industry-leading global data distribution capabilities.

Secondary indexes on nested elements of documents

Database 6.1 brings secondary index support to nested elements within a Map Collection Data Type (CDT), traditionally used to store JSON documents. This enhances Aerospike’s query capability when using a document modeling approach. Along with new cardinality statistics (see below), Aerospike SQL Powered by Starburst will be even more powerful.

We significantly enhanced our query capabilities with the release of Aerospike Database 6, rearchitecting secondary indexes in alignment with the design of our primary index. Database 6.1 removes the limitation to only index the top-level elements of a JSON document contained in a Map CDT. This allows developers to accelerate queries for documents where the predicate is matched against elements nested at arbitrary depths.

Warm start support for secondary indexes

Database 6.1 also significantly reduces the operational impact associated with using secondary indexes. In Aerospike Database Enterprise Edition (EE), secondary indexes will now be stored in shared memory, similar to the primary index. This means that secondary index use in EE will no longer slow down warm restarts. (This is an EE feature and is not available to users of Community Edition.)

This new secondary index functionality allows for wider application of secondary indexes at scale. With multiple secondary indexes added to clusters, a problem arises when restarting nodes through a rolling upgrade results in the rebuilding of all of the secondary indexes. By placing the secondary indexes in shared memory, the indexes can be restored from that shared memory rather than rebuilt, resulting in significant savings in the time to restart a node when a warm start is being performed. However, on a complete node shutdown and restart, all secondary indexes must be rebuilt through a data scan. Secondary indexes in shared memory are an EE feature.

Support for indexing a whole namespace

Previously, a secondary index call without a set name would apply to the “set of things, not in a set.” With the release of Database 6.1, a secondary index will now be preferred if it exists and matches when querying on a set. If not, a whole namespace secondary index will be used (should the index exist and match). This aligns secondary index behavior with the primary index. (Please read the special upgrade instructions for Database 6.1 if you have secondary indexes that do not include a set name.)

Support for index cardinality information through info call

We constantly review our architecture for improvements, and in Database 6.1, we have added cardinality information on our indexes. This is available through an info command, allowing our Aerospike Presto/Trino Connector, Aerospike SQL Powered by Starburst, and our Spark Connector to use the cardinality of each applicable secondary index to the planning/optimization modules of Presto and Spark SQL for use in their SQL planning and optimization in situations where multiple secondary indexes are available.

To further understand how one might take advantage of the cardinality statistics when writing a query using the Aerospike APIs, we point out the following:

asadm -e 'enable;manage sindex create numeric occurred_idx ns sandbox set ufodata bin occurred'

asadm -e 'show statistics sindex for sandbox occurred_idx'

Similarly, asadm -e "asinfo -v 'sindex-stat:ns=sandbox; indexname=occured_idx' "
entries_per_bval – the ratio of entries to unique bin values for a given secondary index on the node
entries_per_rec – the ratio of entries to unique records for a given secondary index. Note that this will always be 1 if it is not a list or map index.

Values are integers (rounded to the nearest integer) and calculated using hyperloglog estimates for the unique bvals and recs, respectively. A background process generates the statistics. Zero values (0) mean the statistic has not been generated. The process runs at startup and every hour thereafter and upon creation and population of a secondary index.

The primary motivation for the statistic is to choose which secondary indexes are based upon using the lower entries_to_bval ratio as an indication of the stronger filter within the query.By providing information on the cardinality of indexes, Aerospike connectors, as well as developers, can optimize query performance in complex queries or analytics.

Secondary Index Names Limited to 64 characters

Starting with Database 6.1, a secondary index name cannot exceed 64 characters in length. Such an index will fail to be created by the 6.1 nodes after an upgrade and restart. You will need to recreate the secondary index with a shorter name. (Please read the special upgrade instructions for Database 6.1 if you are using secondary indexes.)

Enhanced throughput for XDR

Aerospike Database 6.1 improves XDR throughput when XDR enters recovery mode to catch up after network disruptions or when a rewind of a namespace is triggered from a specified last-update-time (LUT). This is particularly beneficial when hydrating new clusters or transferring data between clusters when there is considerable write activity.

XDR uses multiple threads per node to service the queue of changes and route them along to the other side. When network disruptions or bursts of activity happen, XDR can fall behind and switch to recovery mode in order to catch up. Note that this mode is the same as when you are doing a rewind from the Last Update Time (LUT) or are rehydrating from one cluster to another.

In either mode, read lock contention can become an issue. In Database 6.1, we have optimized the recovery/rehydration code path by reducing the number of threads and thereby reducing the lock contention. This results in significant improvement in the performance of up to an order of magnitude for typical deployment topologies.

For the use of the Aerospike Database as a global data distribution point when rehydrating from a LUT, this is a significant improvement.

Summary

We think you’ll agree that Aerospike Database 6.1 demonstrates our continued focus on delivering the highest performance at the lowest cost and the ability to scale from gigabytes to petabytes and from thousands of transactions to millions of transactions per second. Aerospike Database 6.1 provides significant new functionality in support of queries across large data sets with the lowest latency and highest concurrent throughput of any non-relational database. And with the new support for cardinality statistics, Aerospike SQL Powered by Starburst delivers even more efficient SQL-based reporting and analytic functionality to leverage your real-time data. 6.1’s increase in XDR throughput delivers even more power to our customers’ data distribution needs providing unprecedented ability to move and replicate ever-larger amounts of data at high speeds.

For more information:

See Release notes

Try in our Code Sandbox (6.1 features coming very soon)