Secondary index

For the complete documentation index see: llms.txt

All documentation pages available in markdown.

This page describes secondary indexes in the Aerospike Database, including their capabilities and the benefits of using them.

Overview

A secondary index is a data structure that locates all the records in a namespace, or a set within it, based on a bin value in the record or on a computed value with expression indexes. When a value is updated in the indexed record, the secondary index automatically updates.

Querying by partition returns correct results when the cluster is stable, and during data migration after a cluster size change. Anytime a partition is unavailable or dead, queries will fail against that partition.

You need compatible clients, such as Java client 6.0.0, to support rebalance-tolerant queries.

Applications that benefit from secondary indexes include rich, interactive business applications and user-facing analytic applications. Secondary indexes also enable Business Intelligence tools and ad-hoc queries on operational datasets.

Bin-based secondary indexes

Aerospike’s bin-based indexing applies secondary indexes on a bin value. The index contains pointers to both master records and replicated records in the node. For List and Map type bins, a context can be added to identify a value of that type to index.

Bin-based secondary indexes:

Are stored by default in memory for fast lookup. It’s also possible to store secondary indexes other ways by setting the sindex-type configuration parameter. See Secondary index storage for more information.
Are built on every node in the cluster and are co-located with the primary index. Each secondary index entry contains only references to records local to the node.
Are specified bin-by-bin such as with Relational Database Management System (RDBMS) columns, for efficient updates and a minimal amount of resources required to store indexes.

Expression indexes

Aerospike’s expression indexes, introduced in Database 8.1.0, apply secondary indexes on computed values rather than raw data. This innovation addresses critical challenges faced by enterprises working with billions of records where traditional indexing approaches create significant memory overhead and performance bottlenecks. The index contains pointers to both master records and replicated records in the node.

Expression indexes:

Are stored by default in memory for fast lookup. It’s also possible to store secondary indexes other ways by setting the sindex-type configuration parameter. See Secondary index storage for more information.
Are built on every node in the cluster and are co-located with the primary index. Each secondary index entry contains only references to records local to the node.
Creation of secondary indexes on computed values (integers, strings, blobs, lists, or maps) derived from expressions, rather than directly on raw bin data.
Selectively indexing only records that meet specific business criteria.
Performing server-side computation atomically during indexing.
Optimized querying using expressions.

Why use a secondary index?

Scanning large amounts of data can take large amounts of time, and can negatively affect performance when PI queries scan every document or entry in a table.

Aerospike secondary indexes enable faster response times depending on selectivity of the secondary index. In many cases, a secondary index provides efficient access to a wider range of data through fields other than the primary key.

Use Aerospike tools or the API to dynamically create and remove indexes based on bins and data types. Index entries are type checked. For instance, a bin can store a user age as a string by one application, and as an integer by another. An integer index excludes records stored as a string, while a string index excludes records stored as integers.

Secondary index metadata

Aerospike tracks which indexes are created in a globally-maintained data structure: the System Metadata (SMD) system. The SMD module resides in the middle of multiple secondary index modules on multiple nodes. Changes made to the secondary index are always triggered from the SMD.

The SMD workflow is as follows:

An info request sent from a client to any node forwards an SMD action, and then to the SMD principal node.
From the SMD principal node, it forwards the action to every node, where the SMD module notifies the sindex module.
All nodes apply the action, that is, it triggers the building of a new sindex.

Secondary index creation

To create a secondary index, specify the namespace and, optionally, the set. If no set is specified, the index covers all records in the namespace. For a bin index, add the bin location, and optionally the context of a map or list bin for the value to be indexed. For an expression index, add the base64-encoded expression to use.

Index metadata is defined by the key data type (AKA ktype or bin type) of the value - integer, string, blob or geospatial. The other part of index metadata is the index type (AKA itype) - none for a single ktype value (e.g. a blob), list for a list of ktype values (e.g. list of integers), map-keys for a list of map key names of the specified ktype (e.g. string map keys) or map-values for a list of map values of the specified ktype (e.g. list of integer map values).

See the sindex-create reference for examples of using this info command to directly create a secondary index. You should dynamically create and delete secondary indexes with the Aerospike admin tool (asadm).

On confirmation by the SMD, each node creates the secondary index in write-only (WO) mode, and starts a background scan to scan all data and insert entries in the secondary index.

Figure 1: Secondary indexes triggered from SMD

Secondary index entries are only created for records matching all of the index specifications.
The scan populates the secondary index and interacts with read/write transactions exactly as a normal scan, except that there is no network component to the index creation. During index creation, all new writes affecting the indexed bins updates the corresponding secondary indexes.
Each node independently marks every secondary index as readable. When building the secondary index finishes on a node, it is marked as read active on that node.
If a node with data joins the cluster, but is missing index definitions in its SMD file, indexes are created and populated based on the latest SMD information. During index population, queries are not allowed to ensure that data on the incoming node is clean before it is available.
Optionally, you can provide a context within the bin to index the fields of a map (document) data type or list data type.

Priority of secondary index creation

The index creation scan only reads records already committed by transactions, that is, no dirty reads are allowed. This means that scans can execute at full speed, provided there are no record updates to block reads. To ensure that the index creation scan does not adversely affect the latencies of ongoing read and write transactions, the default settings suffice because they balance long running tasks such as data rebalancing and backup, against low-latency read/write transactions. If necessary, you can control resource utilization for the index creation scan.

Modify sindex-builder-threads to configure the number of threads for building secondary indexes. The maximum value is 32.

Writing data with secondary indexes

On data writes, the current indexes are checked. For all records affected by declared indexes, a secondary index update-insert-delete operation is performed. Aerospike is a flex schema system: If no value exists in the specified bin, or if the bin value does not match the declared data type and index type, or if an expression fails to compute a value of the declared data type, then the corresponding secondary action is not performed. All changes to the secondary index are performed atomically with the record changes under single-lock synchronization. Updating a secondary index has minimal impact on write latency unless the write is a replace. Also replica writes are more costly, especially with compression or encryption.

Garbage collection

For non-durable deletes triggered by a client delete, expiry, eviction, truncation, or migration operations that drop partitions, records are not read from disk when deleting the entry from the secondary index. This avoids unnecessary burden on the I/O subsystem. The remaining entries in the secondary index are deleted by a background thread, which wakes up at regular intervals and performs cleanup. For namespaces with data in memory or primary index in flash, garbage collection is currently only used for dropped partitions.

Distributed SI queries

Every cluster node receives the query to retrieve results from the secondary index. When the query executes:

Requests “scatter” to all nodes.
Records are read in parallel from secondary indexes and data storage.
Results are aggregated on each node.
Client “gathers” results from all nodes and returns them to the application.

An SI query can evaluate a long list of primary key records. This is why Aerospike performs SI queries in small batches.

Batching also occurs on client responses, so if a space threshold is reached, the response is immediately flushed to the network, much like return values in an Aerospike batch request. This keeps space usage of an individual SI query to a constant size, regardless of selectivity.

SI queries progress on a moving target. Data on the cluster might be undergoing modification by clients while the query is progressing. Although the SI query process ensures that results sync with actual data every time an SI query executes and the record is scanned, if records are undergoing change, the following scenarios may happen and should be taken into consideration when designing the data model.

In strong consistency mode, SI query may return records that have not yet completed replication.
When doing an integer range query, if value in the indexed bin happens to increase within the range, SI query may return the record twice.
When doing an integer range query, if value in the indexed bin happens to decrease within the range, SI query may miss returning the record.

SI query execution during migrations

Getting accurate SI query results is complicated during data migrations. When a cluster node is added or removed, it invokes the Data Migration Module to transition data to and from nodes as appropriate for the new configuration. During the migration operation, a partition may be available in different versions on many nodes.

When a node is added or removed from a Database 6.0.0 cluster, the client works with the server to ensure that records aren’t repeated or missed due to a partition migration. The client tracks the cluster state, and will reschedule accessing partitions that are migrating, returning to query them when they’ve fully migrated.

Aggregation

SI query records can feed into the aggregation framework to perform filtering, aggregation, and so on. Each node sends the SI query result to the User-Defined Function (UDF) sub-system to start results processing as a stream of records. Stream UDFs are invoked and the sequence of operations defined by the user are applied to the SI query results. Results from each node are collected by the client application, which can then perform additional operations on the data.

Performance techniques

To ensure aggregation does not affect overall database performance, Aerospike uses these techniques:

Global queues manage records fed through the different processing stages, and thread pools effectively utilize CPU parallelism.
The SI query state is shared across the entire thread pool so that the system can manage the Stream UDF pipeline.

Except for the initial data fetch portion, every stage in aggregation is a CPU-bound operation. It is important that processes finish quickly and optimally. To facilitate this, Aerospike batches records and caches UDF states to minimize system overhead.

For namespace operations where data is stored in-memory and no storage fetch is required, Aerospike implements stream processing in a single thread context. Even with this optimization, the system can parallelize operations across data partitions because Aerospike natively divides data as a fixed number of partitions.

Secondary index storage

The storage of a secondary index is determined by the sindex-type configuration parameter. The following options are available:

Type	Description
`shmem`	Linux shared memory.
`flash`	A block storage device, typically NVMe SSD.
`pmem`	Persistent memory, such as Intel Optane DC Persistent Memory.

Starting with Database 6.1.0 Enterprise Edition (EE) and Standard Edition (SE), secondary indexes are stored by default in shared memory (shmem). All secondary index storage types take advantage of Aerospike’s fast restart capability.

For more information about secondary index storage methods, see Configure the secondary index.