Aerospike is pleased to announce Aerospike 5.7, which will be available mid-September. This is a feature-laden release focused on operational and performance improvements, with special attention to query performance. Customers deploying 5.7 are strongly encouraged to review the release notes, as some configuration parameter defaults have changed. As with prior releases, some features are available only with Aerospike Enterprise Edition.
Secondary Indexes
Aerospike records have one or more bins, which are analogous to fields in SQL databases. Secondary indexes on non-primary-key bins are heavily used by Aerospike applications. They provide efficient queries to retrieve records based on a designated bin containing numeric, string, or geospatial data. In many cases, this is even more performant than scanning an entire set or namespace with filter expression. Secondary indexes have evolved over time to accommodate new data types and hardware capabilities. The latest round of functionality and performance enhancements has commenced with Aerospike 5, and will be rolled out incrementally over the next several releases.
In this first set of changes, record references in the secondary index now use 5-byte direct references instead of a digest. This significantly reduces the server memory footprint, and eliminates the need to map from digest to the record (computationally expensive). This change also benefits secondary index garbage collection, which runs on a separate thread, particularly for use cases (common in AdTech) where records are deleted and recreated en masse.As a side effect, the sizing and configuration of secondary indexes has become simpler. The configuration parameters query-bufpool-size
, query-pre-reserve-partitions
, and sindex-gc-max-rate
are no longer needed and have been removed.Several new secondary index metrics have been added, and a few have been obsoleted. For example, the query_ops_bg_failure
metric has been deleted and replaced by new query_ops_bg_error
, query_ops_bg_abort
, and query_false_positives
metrics.Future Aerospike releases will build on this foundation to offer additional secondary index enhancements, including PMem/Flash support, fast start (no rebuilding queries after a restart), and pagination support.
Cloud Resilience
When deployments were confined to single datacenters with dedicated hardware, it was straightforward to bound resource usage to guarantee SLAs were not violated. Now applications are increasingly deployed in public clouds, often spanning multiple datacenters and geographies. At hyperscale, complex topologies (for example multi-cloud and cloud/on-prem hybrids) are becoming common.Aerospike 5.7 extends and improves resilience capabilities in cloud environments. Resilience has been a core Aerospike design philosophy that has enabled the product to deliver real-time performance and uptime SLAs in mission-critical deployments for over a decade. Our continuing work in this area ensures that these stringent SLAs can also be made available to applications even in cloud environments where virtualization can cause somewhat unpredictable behaviors at the processing, network and storage layers from time to time.
Smoothening over these glitches while providing the real-time performance and uptime SLAs is the main purpose of our work. Resilience is more a design philosophy than a specific list of features, based on the recognition that resource availability is less predictable in these environments, so the software architecture must adapt by being more tolerant to unexpected, transient behavior. Many of the resilience improvements described below are based on Aerospike’s learnings from the field. Resilience changes for this release fall into two categories.
Configuration and Sanity Checks
Starting with 5.7, on startup Aerospike performs a series of sanity checks to detect and flag misconfigurations. Often these don’t manifest immediately, but later on under a heavy production load when recovery is more complicated. The checks performed look for Aerospike- and Linux-specific anti-patterns. Examples of these checks include making sure that the Service Thread configuration is reasonable, and that Transparent Huge Pages (THP) are disabled. When a violation is detected, the default behavior is to enter a warning in the log, but continue. Strict conformity can be enabled by setting the new enforce-best-practice
configuration parameter to true
in the service stanza (default is false
).
Run-time Circuit Breakers
Another tranche of resilience features in 5.7 fall under the category of “circuit breakers.” These are mechanisms that temporarily suspend or throttle certain load-generating mechanisms when Aerospike comes under unexpected stress. These are speculative adaptations that anticipate that when the source of stress is transient (e.g. network congestion) it is more prudent to ease off than press ahead and risk a greater disruption of service. Where and when to throttle is non-trivial, and were arrived at after extensive research and testing. Here are some examples found in 5.7:
Migrations are throttled if they are overwhelming the write queue
Switching write transactions from
write-commit-level ==
“master” to “all” if fabric send queues are overwhelmed
Rejecting replica writes if they are overwhelming the write queue
Migration throttling is an illustrative example of the benefit of these new mechanisms. Cloud traffic shaping (throttling) policies (or interference from a “noisy neighbor”) can cause migration of records to another cluster node to back up. Under load, the pending writes will eventually cause the server to run short on memory, eventually leading to a node shutdown.
The loss of the server will then cause additional migrations when other cluster nodes rebalance the impacted records, further exacerbating network pressure. Under 5.7, when stop-writes are triggered (by default when only 10% of available memory remains), the server will temporarily throttle migrations.
If the impairment is indeed transient, when full bandwidth is restored the migrations will catch up and the system will recover. However this is not a panacea. If impairment is persistent the node will drag down the entire cluster, so health monitoring remains important.
Security Enhancements
Security should be thought of as a process, not an endpoint that gets “done” at some point. This is not just because bad actors are mounting more sophisticated attacks, but equally because in response, applications are evolving more elaborate frameworks to manage security. The enhancements in 5.7 are a mixture of new defenses, and better integration with the enterprise-wide frameworks larger companies are standing up.
A small but significant enhancement is that in 5.7, files created by Aerospike are accessible only by the (Linux) user associated with the server. This can be overridden with the configuration parameter os-group-perms
: it is false
by default, and when set to true
group-level access will also be granted to created files. Another change is addition of a separate ‘audit’ logging context for “security” context logs to keep them from cluttering the regular log sink.
PKI Authentication
By default, the Aerospike user model is based on username/password pairs. Increasingly, even mid-sized companies are implementing enterprise-wide Public Key Infrastructure (PKI) to manage users, roles, and credentials centrally. A key requirement for Aerospike to integrate with such systems is to support X.509 certificate-based authentication, which is more secure than password-based schemes.
Aerospike 5.7 allows client applications to be issued an X.509 certificate that carries the user name in the Distinguished Name (DN) field. This certificate can then be presented to a PKI server for authorization. A good example of this can be found in the blog post PKI as a Service with HashiCorp Vault. It gives details on how Vault can be set up to serve as the authenticating authority between Aerospike clients and an Aerospike cluster.
Fast Key Rotation
Prior to 5.7, changing (rotating) the Encryption-at-Rest key required reformatting storage. At scale, this becomes a disruptive, time-consuming operation that takes cluster nodes out of service for extended periods. Starting in 5.7, the Encryption-at-Rest key can be changed without reformatting storage, resulting in an orders-of-magnitude reduction in time and complexity. This in turn increases data security by allowing a more frequent cadence of key rotation.
Fast key rotation is provided via the new encryption-old-key-file
configuration parameter, working in concert with the existing encryption-key-file
parameter. To change the password of an Aerospike cluster node, it must be rebooted with the following configuration parameters:
namespace
<namespace ID> {
...
storage-engine device {
...
encryption-old-key-file
<old-key-file>
encryption-key-file
<new-key-file>
}
...
}
With this information, the server can use the old key to decrypt the (separate) key protecting the namespace records, and then re-encrypt it using the new key. After rotating the key, the encryption-old-key-file
parameter should be removed from the configuration file. However if it is not, no harm is done. The second and subsequent times the configuration file is processed, the server will do nothing if it discovers the old key is invalid and the current key is valid. For more information about the encryption-at-rest implementation, see the Technical Details section of Aerospike documentation.
Minor Enhancements
Aerospike 5.7 introduces numerous minor features, the most notable of which are described below. As always, refer to the 5.7 release notes for complete details and restrictions.
Storage compression library updates: the
lz4
codec has been upgraded to v1.9.3 from v1.8.3, and thezstd
codec has been upgraded to v1.5.0 from v1.3.7. The developers of these (open-source) codecs claim improved performance with the newer versions.New
unreplicated_records
statistic: this indicates the current number of unreplicated records in the namespace. This statistic is also appended to the re-replication log ticker line (e.g. “…unreplicated-records
<integer>”).New
max-record-size
configuration parameter: this appears in the namespace stanza, and specifies the maximum record size in bytes. Writes that exceed the maximum will be rejected. This is useful when increasingwrite-block-size
through a rolling upgrade. During the transition, migrations and replication would otherwise be problematic because nodes still running the old block size won’t be able to deal with larger records. This parameter defaults to 0, which means to impose no constraints. Non-zero values are interpreted based on the storage engine type:for device namespaces,
max-record-size
cannot be larger thanwrite-block-size
for Persistent Memory (PMEM) namespaces,
max-record-size
cannot be larger than 8 MiB, the PMEM write block sizefor in-memory namespaces,
max-record-size
cannot be larger than 128 MiB.
“info” command changes: Replaced the
scan-list
andquery-list
info commands withscan-show
andquery-show
. Also renamed thequery-kill
info command toquery-abort
. Thejobs
info command is now deprecated, and will be removed in a future release.