Secondary index queries

Overview

Secondary Index (SI) queries use secondary indexes to speed up queries, compared to primary index (PI) queries. While this is a trade-off between memory and performance, an SI query typically performs much better than a PI query on a large set. When you have two indexes on the same data (set or namespace) the higher cardinality index gives better results.

Prior to Aerospike Database 6.0, primary index (PI) queries were called scans and secondary index (SI) queries were called queries. See the Queries feature guide.

As of Database 6.0, PI and SI queries are served by the same subsystem and share the same policies and features. The SI query is the same as PI query, with an additional index filter - a simple predicate targeted at the secondary index. The SI query can then apply a filter expression to the records first matched by the index filter predicate.

You can make the following index filter queries in an SI query:

Equals comparison against string or numeric indexes.
Range comparisons against numeric indexes. Range result sets are inclusive, that is, both specified values are included in the results.
Point-In-Region or Region-Contain-Point comparisons against geo indexes.

There are four types of SI queries:

Basic - read-only, returns records to the client.
Aggregation - read-only, returns aggregation results to the client.
Background UDF - write-only, returns nothing to client.
Background ops - write-only, returns nothing to client.

All types except aggregations can optionally use Aerospike Filter Expressions. Using Aerospike filter expressions (Database 5.2 and later), you can:

Filter out records based on record metadata, such as last-update-time or storage-size.
Filter out records based on bin data, such as integer size thresholds or regular expressions on string bins.

Two types of queries use Aerospike UDFs:

Aggregations use stream UDFs with map/reduce functions to perform aggregation.
Background (write-only) queries use Record UDFs to touch records.

You can fine-tune the Aerospike query system for your use case based on:

Query selectivity - the average number of records returned.
Throughput and latency.
Query load vs. read/write load—the frequency that the secondary index updates.
SSD parallelism—the IOPS required to support query throughput.

See the Operations Guide for details.

Secondary index

Key data type

Secondary indexes key on bin values (RDBMS columns), similar to a WHERE clause in SQL. Secondary index keys can only be created on the following primitive data types:

In Aerospike, unique secondary indexes are created on a combination of namespace, set name, bin name, key data type, and key source.

You can also create a secondary index on a context of a Map (document) or List type bin to identify a value of that type to index.

Key source type

Aerospike supports four index key sources:

Basic where key is the bin value.
List where key is elements in the list.
Mapkeys where key is keys in the map.
Mapvalues where key is value in the map.

Use cases

Common SI queries include:

Top scoreboards, top performers, top earners, and so on.
The most-recent search/query/purchase/activity information from the website front end.
Data sets, for example, people with the same employer or who belong to my club.
Multiple IDs such as Facebook name, Twitter handle, and so on.
Points of interest near my location.

Limitations

Aerospike supports up to 256 secondary indexes per namespace.
For string data-type, only string size <= 2048 can be indexed.
Only Aerospike Database 6.0-compatible clients, such as Java client 6.0.0 and later, support rebalance-tolerant queries.

Use an SI query in your application

To use an SI query in an application:

Create secondary indexes on a bin, or on a context of your Map (document) bin or List bin.
Insert records within an indexed bin.
Construct a predicate (a WHERE clause), make the SI query request, and process returned records.

Create an index

Use asadm to create and manage secondary indexes in an Aerospike cluster. asadm provides a manage sindex command to manage secondary indexes.

The following example asadm command creates an integer index on namespace user_profile, set-name west, and bin-name last_activity. Starting with Database 6.1, if no set-name is specified during index creation, the secondary index includes records across the entire namespace. In previous versions, omitting the set name created an index that only included records without any set name, also called the null set name. The disallow-null-setname parameter controls whether null set names are allowed. By default, the value is false, meaning that null set names are accepted.

Run help manage sindex to see all optional parameters.

Admin+> manage sindex create numeric ix1 ns user_profile set west bin last_activity
Successfully created sindex ix1.

After the secondary index creation request reaches one node in the cluster, cluster propagates the request to all cluster nodes and begins background secondary index creation on all nodes. Each cluster node can only manage secondary indexes for resident data.

Create a secondary index on a nested context (Introduced: 6.1.0)

Database 6.1 added the ability to create a secondary index on a nested element of a Map or List data type bin. See sindex-create.

The example for show sindex described in Secondary indexes describes a bin report that has a Map type, within which there’s a map key shape holding a list of strings:

Collectinfo-analyzer> show sindex
~~~~~~~~~~~~~~~~Secondary Indexes (2023-01-11 00:58:31 UTC)~~~~~~~~~~~~~~~~
    Index|Namespace|    Set|   Bin|   Bin|Index|State|              Context
     Name|         |       |      |  Type| Type|     |
shape-idx|sandbox  |ufodata|report|string|list |RW   |[map_key(<string#5>)]
Number of rows: 1:

The context base64 encoded into a string corresponds to the following in Python:

from aerospike_helpers import cdt_ctx
ctx = [cdt_ctx.cdt_ctx_map_key("shape")]

Checking secondary index status

To check the status of a secondary index:

Admin+> info sindex
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Secondary Index Information (2021-09-07 22:19:17 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Index|   Namespace| Set|            Node|         Bins|    Bin|State|Keys|  Entries|  Memory|~~~~Queries~~~~~|~~~~~Updates~~~~~
 Name|            |    |                |             |   Type|     |    |         |    Used|Requests|Avg Num|   Writes|Deletes
     |            |    |                |             |       |     |    |         |        |        |   Recs|         |
ix1  |user_profile|west|aerospike-a:3000|last_activity|NUMERIC|WO   |   5|100.000 K|3.150 MB| 0.000  |0.000  |100.000 K|0.000
ix1  |user_profile|west|aerospike-b:3000|last_activity|NUMERIC|WO   |   4| 80.000 K|2.800 MB| 0.000  |0.000  | 80.000 K|0.000
     |            |    |                |             |       |     |   9|180.000 K|5.950 KB| 0.000  |0.000  |180.000 K|0.000
Number of rows: 2

In this example, the command output indicates:

There are two nodes (two tables display) in the cluster.
Both nodes show that secondary index creation is in the write-only state (WO).

Inserting data

This example uses the aql tool to insert records in preparation for an SI query.

aql> INSERT INTO user_profile.west (PK,location,last_activity) VALUES ('cookie_100','MA',342)
OK, 1 record affected.
aql> INSERT INTO user_profile.west (PK,location,last_activity) VALUES ('cookie_101','AZ',345)
OK, 1 record affected.
aql> INSERT INTO user_profile.west (PK,location,last_activity) VALUES ('cookie_102','CA',345)
OK, 1 record affected.
aql> INSERT INTO user_profile.west (PK,location,last_activity) VALUES ('cookie_103','AL',340)
OK, 1 record affected.
aql> INSERT INTO user_profile.west (PK,location,last_activity) VALUES ('cookie_104','TX',347)
OK, 1 record affected.
aql> INSERT INTO user_profile.west (PK,location,last_activity) VALUES ('cookie_105','MA',323)
OK, 1 record affected.

See the Java Client - Synchronous database write example for the Java equivalent.

aql is not intended for use by applications.

Range SI query

To run an SI query for a range of last_activity:

aql> SELECT * FROM user_profile.west WHERE last_activity BETWEEN 340 AND 345
+----------+---------------+
| location | last_activity |
+----------+---------------+
| "AL"     | 340           |
| "MA"     | 342           |
| "CA"     | 345           |
| "AZ"     | 345           |
+----------+---------------+
4 rows in set (0.001 secs)

Range SI query result sets are inclusive; both specified values are included in the results (340 and 345 in the above example).

If a cluster node goes down, in-progress SI queries return an error and partial results.

Other tools

Other ways of creating secondary indexes are:

the aql tool uses an SQL-like interface to send index management commands to the Aerospike cluster.
createIndex and dropIndex equivalent client API calls in each language binding.

Aerospike does not support SQL as a query or management language.

References

See these topics for language-specific examples: