Blog

Aerospike Database 6.3: Smoother Operations at Scale

Ronen Botzer
Ronen Botzer
Director of Product
March 31, 2023|9 min read

We are pleased to announce that Aerospike Database 6.3, a new release delivering operational improvements, is now Generally Available (GA). This blog will cover some of the many improvements and new capabilities that make up the 6.3 release. As always, there are many additional quality, performance, and stability improvements that are not covered here but are described in the 6.3 release notes.

Support for OpenSSL 3

Aerospike Database 6.3 adds support for OpenSSL 3, which means that it runs well on Red Hat Enterprise Linux 9 (RHEL) and its many clones, as well as Ubuntu 22.04. We continue to support RHEL and Debian-compatible Linux distributions that come with OpenSSL 1. One caveat is that support for Debian 10 on ARM64 hardware has been discontinued, however, server 6.3 still runs on Debian 10 on x86_64.Ubuntu 18.04 was removed from the list of supported Linux distributions, starting with server 6.3.

Aerospike Database 6.3 new operational features: the details add up

With the tight release schedule of Aerospike Database 6 versions in 2022, we deferred many smaller feature requests and improvements in order to meet our deadlines. But good ideas come back around, and server 6.3 intentionally collects them.Let’s review some of these new features.

Runtime resource protection

Aerospike automatically responds under heavy load, according to its data retention configuration parameters. Server 6.3 adds two dynamically configurable parameters to help protect cluster nodes from running out of memory or storage when data is added at a rate that risks exceeding node capacity.

  • max-used-pct
    stops client writes when (by default) the namespace device usage hits 70%

  • stop-writes-sys-memory-pct
    stops the client from writing to the namespace when (by default) 90% of the node’s memory is used. If this configuration parameter is not explicitly set to different values per-namespace, all namespaces stop writes when the threshold is crossed.

Deletions, replica writes, and migration writes are still allowed when the namespace is in stop-writes mode. These configuration parameters help avoid lacking enough headroom to recover from being filled to capacity. Be aware of your current resource consumption, and consider adjusting these values as part of your upgrade process.The new

compression-acceleration
configuration parameter trades off CPU consumption and compression ratio (storage) in systems with LZ4 storage compression.

Multitenancy

Many users deploy Aerospike as a multi-tenant data service, with distinct internal and external customers separated by namespace (comparable to an RDBMS database) and sets (comparable to RDBMS tables). Multitenancy leans on Aerospike enterprise features such as scoped role-based access control (RBAC) permissions, and rate quotas on operations.

Site reliability engineers (SREs) can now limit the storage used by an Aerospike set to a specified number of bytes with the set-level

stop-writes-size
configuration parameter. For data-in-memory namespaces, this limits
memory_data_bytes
. Otherwise, this is a limit for
device_data_bytes
. Read more about capping the size of a set.

Logging to a syslog socket

Satisfying a long-standing feature request, we added a

syslog
log sink so that log messages can be streamed to syslog-compatible Unix domain sockets, such as logstash. This allows log messages to be easily sent off-node to centralized logging facilities that serve the entire cluster. In container-based deployments, such as Kubernetes, the new log sink simplifies access to the server logs of these relatively isolated cluster nodes.

Until server 6.3, the only type of log messages that could be sent to syslog were the ones in the audit trail logging context. As audit trail isn’t a special case any more, if you are using this feature, be aware that you’ll need to modify your configuration file before upgrading.

All Flash Improvements

Some improvements go beyond All Flash; they help with any deployment of Aerospike. However, they tend to have a larger impact when the primary index is stored on a Flash device.

When the namespace supervisor (NSUP) falls behind on its task of cleaning up a node’s expired records, that can cause problems after the node restarts. First, the node delays joining the cluster in order to clean up the expired records from the primary index.

Subsequently, defragmentation has to work harder to deal with the spike in write-blocks that need to be freed up. In an All Flash deployment, the defragmentation process can have an outsized impact on latencies if the cluster doesn’t have an appropriate overhead of disk IO capacity. A new server log warning message and corresponding nsup_cycle_delete_pct metric exposes that a cluster node is in such a state. The Monitoring Stack displays a visual warning and alerts based on this metric. Read more about ensuring that NSUP is keeping up.Server 6.3 also avoids duplicate resolution of records that have already been resolved during the relevant period.

Faster truncation

Each

truncate
command now uses separate threads. The namespace context parameter
truncate-threads
controls the number of threads used by a new truncate job. Users that truncate many sets concurrently should consider tuning this config.

Starting with server 6.3, truncate jobs also leverage set indexes, significantly speeding them up. However, if the set contains any tombstones at the beginning of the job, the set index cannot be utilized for truncate.

The namespace stat

truncated_records
has been removed, and new namespace and per-set
truncating
boolean stats have been added, to indicate whether a namespace or set is in the process of being truncated.

Better Vault integration

Server 6.3 adds support for Vault Enterprise namespaces through the new

vault-namespace
configuration parameter, and was tested against HashiCorp Cloud Platform (HCP) Vault.

Before Aerospike Database version 6.3, when a Vault token expired, the only way to apply a new Vault token was to restart the node. To simplify operations, the Vault token file can be modified with a new Vault token, after which dynamically setting the

vault-token-file
configuration parameter to the same path will instruct Aerospike to reload a new Vault token from the file.

Miscellaneous

  • Durable delete enforcement – SREs can enforce durable deletes in a high availability (AP mode) namespace by configuring

    disallow-expunge
    to true.

  • UDF improvement – LuaJIT is back in server 6.3 running on ARM64 machines, improving UDF performance.

  • Geospatial improvement – Server 6.3 boosts the performance of geospatial queries by using the latest S2 geometry library.

  • Log message visibility – Search the server log message reference for ones improved or introduced in version 6.3.

  • Change Notification improvement – the

    ship-bin-luts
    config param also allows sending bin metadata to non-Aerospike connectors from version 6.3 nodes.

  • Specifying a timeout for info commands is now possible with

    info-max-ms
    .

  • Debug capabilities added to speed up support ticket resolution.

New developer API features

Document-oriented modeling gets a couple of new developer API features. One is the ability to compare a map argument to a map bin using Expressions. A developer can create a Filter Expression to query for all the records where a bin contains a specific map.In this simplest example, a record read operation is conditioned on a Filter Expression that checks if a map is present in a given bin.

Policy p = new Policy();
p.filterExp = Exp.build(Exp.eq(Exp.mapBin(binName), Exp.val(inputMap)));
Record record = client.get(p, key);

You’ll also see this capability in an upcoming release of Spring Data for Aerospike, as the findByPOJO query method.The other developer API feature is two new return types for map operations: MapReturnType.ORDERED_MAP and MapReturnType.UNORDERED_MAP.

If we have a bin m with a map data type

{
a: 1,
b: 2,
c: 3,
d: 4,
e: 5
}

And we use map

get_by_rank_range(ORDERED_MAP, 3, 2)We should get back{ d: 4, e: 5 }

A continued focus on secondary indexes and queries

Last year we launched Aerospike Database 6, which was centered on a major rewrite of secondary indexes and queries, and Server 6.3 maintains this momentum, adding the capability to deploy secondary indexes durably on Intel Optane™ Persistent Memory (PMem).

The low cost, high performance, and memory density of PMem (compared to RAM) make it possible for Aerospike users to create multiple secondary indexes on very large datasets. This feature completes the range of uses for PMem – storing data on PMem, storing the primary index on PMem, and now creating secondary indexes on PMem. While Intel plans to end PMem support in favor of Compute Express Link (CXL)-attached memory devices, Aerospike plans to support these features until PMem hardware reaches its end of life.

Server 6.3 also begins to address a potential performance degradation noticed by users of short queries (queries that consistently return a small number of records) using an equality index filter. The

inline-short-queries
configuration parameter runs short queries directly in the service threads to buy back much of the lost performance. A complete solution will appear in a subsequent server release.

Breaking Changes

Starting with server 6.3, four new feature keys enable the following server features:

Enterprise customers of Aerospike have been reissued new feature-key files, which are available from their support portal. The new feature key-files are required when upgrading to server 6.3, but also work with previous Aerospike Database 6 releases.Otherwise,

  • As a result of adding secondary indexes in PMem, the metric

    memory_used
    has been renamed
    used_bytes
    .

  • New stop-writes configuration parameters

    max-used-pct
    and
    stop-writes-sys-memory-pct

    might negatively affect your deployment. Make sure to read the section above.

  • If you are shipping an audit trail to syslog, make sure to adjust your configuration (see above).

  • The

    jobs:
    info command has been removed after being deprecated for 18 months (since server 5.7). Use the query-show info command.

  • The namespace metric

    truncated_records

    has been removed (see above).

  • Server 6.3 removes support for Debian 10 (ARM64) and Ubuntu 18.04.

For more details, read the server 6.3 release notes.