Skip to content

Secondary index queries

Queries matching a (relatively) small subset of records within a large Aerospike namespace or set can be accelerated by optionally indexing often-queried values with a secondary index. Such secondary index (SI) queries run faster than equivalent primary index (PI) queries. This is a trade-off between (typically in-memory) index storage and performance.

Secondary indexes can index the value of a specified record bin or the computed value of an expression applied to the record.

PI and SI queries are served by the same subsystem and share the same policies and features. The SI query is the same as a PI query with the additional ability to use an index filter - a simple predicate leveraging the secondary index. The SI query can then apply a filter expression to the records first matched by the index filter predicate.

Types of queries

There are four types of SI queries:

  • Basic - read-only, returns records to the client.
  • Aggregation - read-only, returns aggregated results to the client. Aggregations use stream UDFs with map/reduce functions to perform aggregation.
  • Background UDF - write-only, returns nothing to client. Uses a Record UDF to modify records.
  • Background ops - write-only, returns nothing to client.

All types use an index filter to define a predicate used directly against a secondary index:

  • Equals comparison against indexed string, blob or integer values.
  • Range comparisons against indexed integer values. Range query results are inclusive, that is both low end and high end boundary values are included in the results.
  • Point-In-Region or Region-Contain-Point comparisons against geospatial indexes.

A filter expression can be added to apply further predicates to records matched by the index filter. With the filter expressions, you can:

  • Filter out records based on record metadata, such as last update time or storage size.
  • Filter out records based on bin data, such as integer size thresholds or regular expressions on string bins.
  • Note: aggregation queries do not support filter expressions. Stream UDFs have their own filter function mechanism.

You can fine-tune the Aerospike query system for your use case based on:

  • Query selectivity - the average number of records returned. When multiple secondary indexes are available, the application should query the most selective index (meaning the index with the lowest entries_per_bval). The purpose of the secondary index is to reduce, as much as possible, the number of records read from namespace data storage. A filter expression can be applied to these records, in order to further exclude records from being streamed back to the client waiting for the query results.
  • Throughput and latency.
  • Query load vs. read/write load — the frequency that the secondary index updates.
  • SSD parallelism — the IOPS required to support query throughput.

Key data type

One part of a secondary index definition is the expected data type of the value(s) being indexed. This is termed the key type, or ktype. The data type of the values to Secondary indexes key on bin values (RDBMS columns), similar to a WHERE clause in SQL. Secondary index keys can only be created on the following primitive data types:

In Aerospike, distinct secondary indexes are created on a combination of namespace, set name, bin name, key data type, and index type. You can also create a secondary index on a CDT context of a map (document) or list collection data type (CDT) bin to identify the path to the value that should be indexed.

Index type

Aerospike supports four index types (AKA itype):

  • Default where a single value of the given ktype is expected.
  • List where a list of values of the given ktype is expected.
  • Map keys where the ktype-matching keys of a map CDT are indexed.
  • Map values where the ktype-matching values of a map CDT are indexed.

Use cases

Common uses for secondary index queries include:

  • Modeling and querying many-to-one-relationships - find all records containing a certain value or range of values.
  • Geospatial queries - finding all the points within a specified geoJSON polygon; querying for all polygons that contain a given point.

Caveats

Secondary indexes are built independently on each cluster node, a structure for each data partition, which means that a secondary index query hits all the nodes. For better performance, secondary indexes should not be used when key-based commands are an alternative.

For example, secondary indexes should not be built on a “primary key”. The record already has an entry in the primary index for direct, low latency access with read, upsert and delete commands.

Similarly, secondary indexes are not the most efficient way to implement a unique index query (where the entries_per_bval index statistic is equal to 1). Instead, a lookup table provides significantly better performance and throughput. A distinct set can be used to insert all the unique identifiers as record keys with a digest stored in a record bin. A single read operation against the lookup table retrieves the digest of the actual record, which is then used to access that record directly in a subsequent read command. This two-phase lookup pattern is one or two orders of magnitude faster than a secondary index query to find zero or one record.

Limitations

  • Aerospike supports up to 256 secondary indexes per namespace.
  • String and blob/bytes data types are limited to 2KiB - larger values are not indexed.
  • Migration-tolerant queries were introduced in Aerospike Database 6 and require fully compatible clients.

Use an SI query in your application

To use a secondary index query in your application:

  1. Create secondary indexes on a bin value (in the case of List or Map bins it can be a nested CDT context) or computed value of an expression.
  2. Insert/update records. These write operations will trigger synchronous indexing.
  3. Construct a query with a secondary index filter predicate, optionally adding more predicates with a filter expression. Execute the SI query and process the records streaming back to the client.

Create an index

You create and manage secondary indexes with the manage sindex commands of the Aerospike admin tool (asadm). Alternatively, use the sindex-create and other sindex-* info commands. Indexes can also be created programmatically with the createIndex and dropIndex equivalent client API calls in each language client. This requires the sindex-admin privilege if security is turned on.

When you create a secondary index on a bin value:

  • If no set-name is specified the sindex includes records across the entire namespace.
  • The disallow-null-setname parameter controls whether null set names are allowed. By default, the value is false, meaning that null set names are accepted.
Create a bin index
Admin+> manage sindex create numeric dnr-age-idx ns test set donor bin age
Use 'show sindex' to confirm 'dnr-age-idx' was created successfully.

Indexing a value embedded inside a list or map requires a CDT context describing the path to the element.

Create a bin index with a CDT context
Admin+> manage sindex create numeric dnr-zip-idx ns test set donor bin address ctx map_key(zip)
Use 'show sindex' to confirm 'dnr-zip-idx' was created successfully.

Alternatively, this same index can be created with the base64-encoded CDT context

Terminal window
Admin+> manage sindex create numeric dnr-zip-idx ns test set donor bin address ctx_base64 kiKkA3ppcA==

To create an expression index you don’t specify a bin name or context; instead you provide a base64-encoded expression. The computed value must adhere to the specified ktype and itype. To skip indexing the record send back an unknown() from the expression.

Assemble a base64-encoded expression
Expression campaignTotal = Exp.build(
Exp.add(Exp.intBin("campaign1"), Exp.intBin("campaign2"), Exp.intBin("campaign3"));
);
System.out.println(campaignTotal.getBase64());
// lBSTUQKpY2FtcGFpZ24xk1ECqWNhbXBhaWduMpNRAqljYW1wYWlnbjM=

The base64-encoded expression is used to create the expression index

Create an expression index
Admin+> manage sindex create numeric dnr-sum-idx ns test set donor exp_base64 lBSTUQKpY2FtcGFpZ24xk1ECqWNhbXBhaWduMpNRAqljYW1wYWlnbjM=
Use 'show sindex' to confirm 'dnr-sum-idx' was created successfully.

After the secondary index creation command reaches one node in the cluster, the cluster propagates the request to all cluster nodes. Each node begin building the secondary index on its local data.

In asadm, show sindex describes all the existing secondary indexes assembled on the namespace and its sets.

Terminal window
Admin+> show sindex
~~~~~~Secondary Indexes (2025-06-11 07:11:35 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Index Name|Namespace| Set| Bin| Bin| Index|State| Context| Expression
| | | | Type| Type| | |
dnr-age-idx|test |donor| age|NUMERIC|NONE |RW |-- |
dnr-zip-idx|test |donor|address|NUMERIC|NONE |RW |[map_key(<string#3)]|
dnr-sum-idx|test |donor| null|NUMERIC|NONE |RW |-- |add(bin_int("campaign1"), bin_int("campaign2"), bin_int("campaign3"))
Number of rows: 3

Insert data

Insert data programmatically using one of the Aerospike client libraries or a data browser tool like aql.

Example record in the set 'donor'
{
name: "Austin Albertson",
age: 57,
campaign1: 150,
campaign2: 100,
campaign3: 75,
address: {
state: "CA",
city: "Los Altos",
zip: 94023,
}
}

Query a bin index

The bin index dnr-age-idx allows us to query for an integer age range

index filter: age >= 18

filter expression: campaign1 > 200

stmt.setNamespace("test");
stmt.setSetName("donor");
// Create an index filter
stmt.setFilter(Filter.range("age", 18, Integer.MAX_VALUE));
QueryPolicy queryPolicy = new QueryPolicy();
queryPolicy.filterExp = Exp.build(
Exp.gt(Exp.intBin("campaign1"), Exp.val(200))
);
// Execute the query
RecordSet recordSet = client.query(queryPolicy, stmt);

In this case, we have a secondary index to use for the age >= 18 predicate, but no index built on the integer value of the campaign bin. The index filter predicate finds records matching our age criteria, and then the filter expression is applied to each of them.

Query a bin index using CDT context

The bin index dnr-zip-idx indexes the integer value of an embedded field zip of map bin address.

index filter: address.zipcode = 94023

filter expression: age >= 18

stmt.setNamespace("test");
stmt.setSetName("donor");
// Create an index filter
stmt.setFilter(Filter.equal("address", 94023, CTX.mapKey("zip")));
// Add a filter expression
QueryPolicy queryPolicy = new QueryPolicy();
queryPolicy.filterExp = Exp.build(
Exp.ge(Exp.intBin("age"), Exp.val(18))
);
// Execute the query
RecordSet recordSet = client.query(queryPolicy, stmt);

We now have two potential secondary indexes to use. Using entries_per_bval in either asadm’s info sindex or the sindex-stat info command, we see that dnr-zip-idx has fewer records on average per indexed value, so we choose it for the index filter. Then the age condition is applied through a filter expression onto the records matched by the secondary index. If we were to reverse the index used, the server would have read (on average) 25x more records from storage.

Terminal window
Admin+> info sindex
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Secondary Index Information (2025-08-01 21:23:45 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Index Name|Namespace| Set| Node| Bin| Bin| State|~~~~~~~~Entries~~~~~~~~|~~~~Storage~~~~| Context| Expression
| | | | | Type| | Total|Avg Per| Avg Per| Type| Used| |
| | | | | | | | Rec| Bin Val| | | |
dnr-sum-idx |test |donor|1.0.0.127:3100|null |numeric|Read-Write|1.000 M|1.000 |1.000 |shmem|16.000 MB|-- |add(bin_int("campaign1"), bin_int("campaign2"), bin_int("campaign3"))
dnr-age-idx |test |donor|1.0.0.127:3100|age |numeric|Read-Write|1.000 M|1.000 |25.000 K|shmem|16.000 MB|-- |--
dnr-zip-idx |test |donor|1.0.0.127:3100|address|numeric|Read-Write|1.000 M|1.000 |1.000 K|shmem|16.000 MB|[map_key(<string#3>)]|--
|test |donor| | | | |3.000 M| | | |48.000 MB| |
Number of rows: 3
Admin+> asinfo -v 'sindex-stat:namespace=test;indexname=dnr-zip-idx'
1.0.0.127:3100 (127.0.0.1) returned:
entries=1000000
used_bytes=16777216
entries_per_bval=1000
entries_per_rec=1
Admin+> asinfo -v 'sindex-stat:namespace=test;indexname=dnr-age-idx'
1.0.0.127:3100 (127.0.0.1) returned:
entries=1000000
used_bytes=16777216
entries_per_bval=25000
entries_per_rec=1

Query an expression index by index name

The expression index dnr-sum-idx can be queried using its index name.

index filter: dnr-sum-idx BETWEEN 200 AND 2000

filter expression: age >= 18

Statement stmt = new Statement();
stmt.setNamespace("test");
stmt.setSetName("donor");
// This query uses the expression index dnr-sum-idx
stmt.setFilter(Filter.byIndexRange("dnr-sum-idx", 200, 2000));
// Add a filter expression
QueryPolicy queryPolicy = new QueryPolicy();
queryPolicy.filterExp = Exp.build(
Exp.ge(Exp.intBin("age"), Exp.val(18))
);
// Execute the query
RecordSet recordSet = client.query(queryPolicy, stmt);

Query an expression index using the expression

The expression index dnr-sum-idx can be queried using a matching expression.

index filter: add(bin_int(“campaign1”), bin_int(“campaign2”), bin_int(“campaign3”)) BETWEEN 200 AND 2000

filter expression: age >= 18

Statement stmt = new Statement();
stmt.setNamespace("test");
stmt.setSetName("donor");
// Query with a matching expression to that of the expression index
Expression exp = Exp.build(Exp.add(Exp.intBin("campaign1"),
Exp.intBin("campaign2"), Exp.intBin("campaign3")));
stmt.setFilter(Filter.range(exp, 200, 2000));
// Add a filter expression
QueryPolicy queryPolicy = new QueryPolicy();
queryPolicy.filterExp = Exp.build(
Exp.ge(Exp.intBin("age"), Exp.val(18))
);
// Execute the query
RecordSet recordSet = client.query(queryPolicy, stmt);

Note that you can get the expression used for secondary index through asadm’s show sindex command, and get either the human-readable expression or its base64-encoded format using the sindex-list info command.

Background queries

A client application can issue an asynchronous background query to modify records in-place on the server. This is similar to an UPDATE statement in a relational database.

Background queries apply either multiple native bin operations or a user-defined function (UDF) written in Lua to the records matched by the query. Using bin operations (AKA background ops) is often more efficient and higher performing than using a Lua UDF (AKA background UDF).

Background secondary index queries have the following features:

  • Filter records by set name.
  • Filter records with an index filter (required).
  • Filter records with filter expressions such as last_update() > X.
  • Clients can poll for progress and completion of a background query.

Background queries are not migration-tolerant and might miss records when a partition is migrating during data rebalancing.

Feedback

Was this page helpful?

What type of feedback are you giving?

What would you like us to know?

+Capture screenshot

Can we reach out to you?