Secondary index queries
Overview
Secondary Index (SI) queries use secondary indexes to speed up queries, compared to primary index (PI) queries. While this is a trade-off between memory and performance, an SI query typically performs much better than a PI query on a large set. When you have two indexes on the same data (set or namespace) the higher cardinality index gives better results.
Prior to Aerospike Database 6.0, primary index (PI) queries were called scans and secondary index (SI) queries were called queries. See the Queries feature guide.
As of Database 6.0, PI and SI queries are served by the same subsystem and share the same policies and features. The SI query is the same as PI query, with an additional index filter - a simple predicate targeted at the secondary index. The SI query can then apply a filter expression to the records first matched by the index filter predicate.
An index name is optional when performing read SI queries.
You can make the following index filter queries in an SI query:
- Equals comparison against string or numeric indexes.
- Range comparisons against numeric indexes. Range result sets are inclusive, that is, both specified values are included in the results.
- Point-In-Region or Region-Contain-Point comparisons against geo indexes.
There are four types of SI queries:
- Basic - read-only, returns records to the client.
- Aggregation - read-only, returns aggregation results to the client.
- Background UDF - write-only, returns nothing to client.
- Background ops - write-only, returns nothing to client.
All types except aggregations can optionally use Aerospike Filter Expressions. Using Aerospike filter expressions (Database 5.2 and later), you can:
- Filter out records based on record metadata, such as last-update-time or storage-size.
- Filter out records based on bin data, such as integer size thresholds or regular expressions on string bins.
Two types of queries use Aerospike UDFs:
- Aggregations use stream UDFs with map/reduce functions to perform aggregation.
- Background (write-only) queries use Record UDFs to touch records.
You can fine-tune the Aerospike query system for your use case based on:
- Query selectivity - the average number of records returned.
- Throughput and latency.
- Query load vs. read/write load—the frequency that the secondary index updates.
- SSD parallelism—the IOPS required to support query throughput.
See the Operations Guide for details.
Secondary index
Key data type
Secondary indexes key on bin values (RDBMS columns), similar to a WHERE
clause in SQL. Secondary index keys can only be created on the following primitive data types:
In Aerospike, unique secondary indexes are created on a combination of namespace, set name, bin name, key data type, and key source.
You can also create a secondary index on a context
of a Map (document) or List type bin to identify a value of that type to index.
Key source type
Aerospike supports four index key sources:
- Basic where key is the bin value.
- List where key is elements in the list.
- Mapkeys where key is keys in the map.
- Mapvalues where key is value in the map.
You can use a wildcard when searching for a collection data type, but you can't use a wildcard to specify the key source for a secondary index.
Use cases
Common SI queries include:
- Top scoreboards, top performers, top earners, and so on.
- The most-recent search/query/purchase/activity information from the website front end.
- Data sets, for example, people with the same employer or who belong to my club.
- Multiple IDs such as Facebook name, Twitter handle, and so on.
- Points of interest near my location.
Limitations
- Aerospike supports up to 256 secondary indexes per namespace.
- For string data-type, only string size <= 2048 can be indexed.
- Only Aerospike Database 6.0-compatible clients, such as Java client 6.0.0 and later, support rebalance-tolerant queries.
Before Database 6.1, fast restart was not supported. On daemon restart, secondary indexes were rebuilt based on record data.
Use an SI query in your application
To use an SI query in an application:
- Create secondary indexes on a bin, or on a
context
of your Map (document) bin or List bin. - Insert records within an indexed bin.
- Construct a predicate (a
WHERE
clause), make the SI query request, and process returned records.
Create an index
Secondary Index Capacity Planning describes how to plan and schedule index creation on production systems. Background index creation can take a substantial amount of resources because an index consumes RAM for every index entry.
Use asadm to create and manage secondary indexes in an Aerospike cluster. asadm provides a manage sindex
command to manage secondary indexes.
The following example asadm command creates an integer index on namespace
user_profile, set-name
west, and bin-name
last_activity.
Starting with Database 6.1, if no set-name
is specified during index creation, the secondary index includes records across the entire namespace. In previous versions, omitting the set name created an index that only included records without any set name, also called the null set name. The disallow-null-setname parameter controls whether null set names are allowed. By default, the value is false
, meaning that null set names are accepted.
Run help manage sindex
to see all optional parameters.
Admin+> manage sindex create numeric ix1 ns user_profile set west bin last_activity
Successfully created sindex ix1.
After the secondary index creation request reaches one node in the cluster, cluster propagates the request to all cluster nodes and begins background secondary index creation on all nodes. Each cluster node can only manage secondary indexes for resident data.
Create a secondary index on a on a nested context (Introduced: 6.1.0)
Database 6.1 added the ability to create a secondary index on a nested element of a Map or List data type bin. See sindex-create
.
The example for show sindex
described in Secondary indexes describes a bin report that has a Map type, within which there’s a map key shape holding a list of strings:
Collectinfo-analyzer> show sindex
~~~~~~~~~~~~~~~~Secondary Indexes (2023-01-11 00:58:31 UTC)~~~~~~~~~~~~~~~~
Index|Namespace| Set| Bin| Bin|Index|State| Context
Name| | | | Type| Type| |
shape-idx|sandbox |ufodata|report|string|list |RW |[map_key(<string#5>)]
Number of rows: 1:
The context base64 encoded into a string corresponds to the following in Python:
from aerospike_helpers import cdt_ctx
ctx = [cdt_ctx.cdt_ctx_map_key("shape")]
Checking secondary index status
To check the status of a secondary index:
Admin+> info sindex
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Secondary Index Information (2021-09-07 22:19:17 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Index| Namespace| Set| Node| Bins| Bin|State|Keys| Entries| Memory|~~~~Queries~~~~~|~~~~~Updates~~~~~
Name| | | | | Type| | | | Used|Requests|Avg Num| Writes|Deletes
| | | | | | | | | | | Recs| |
ix1 |user_profile|west|aerospike-a:3000|last_activity|NUMERIC|WO | 5|100.000 K|3.150 MB| 0.000 |0.000 |100.000 K|0.000
ix1 |user_profile|west|aerospike-b:3000|last_activity|NUMERIC|WO | 4| 80.000 K|2.800 MB| 0.000 |0.000 | 80.000 K|0.000
| | | | | | | 9|180.000 K|5.950 KB| 0.000 |0.000 |180.000 K|0.000
Number of rows: 2
In this example, the command output indicates:
- There are two nodes (two tables display) in the cluster.
- Both nodes show that secondary index creation is in the write-only state (
WO
).
Nodes can only be queried when the index is in read-write state (RW
). If all nodes are not in the RW
state and a secondary index query is made against the cluster, an error with no data is returned to the caller.
Inserting data
This example uses the aql tool to insert records in preparation for an SI query.
aql> INSERT INTO user_profile.west (PK,location,last_activity) VALUES ('cookie_100','MA',342)
OK, 1 record affected.
aql> INSERT INTO user_profile.west (PK,location,last_activity) VALUES ('cookie_101','AZ',345)
OK, 1 record affected.
aql> INSERT INTO user_profile.west (PK,location,last_activity) VALUES ('cookie_102','CA',345)
OK, 1 record affected.
aql> INSERT INTO user_profile.west (PK,location,last_activity) VALUES ('cookie_103','AL',340)
OK, 1 record affected.
aql> INSERT INTO user_profile.west (PK,location,last_activity) VALUES ('cookie_104','TX',347)
OK, 1 record affected.
aql> INSERT INTO user_profile.west (PK,location,last_activity) VALUES ('cookie_105','MA',323)
OK, 1 record affected.
See the Java Client - Synchronous database write example for the Java equivalent.
aql is not intended for use by applications.
Range SI query
To run an SI query for a range of last_activity:
aql> SELECT * FROM user_profile.west WHERE last_activity BETWEEN 340 AND 345
+----------+---------------+
| location | last_activity |
+----------+---------------+
| "AL" | 340 |
| "MA" | 342 |
| "CA" | 345 |
| "AZ" | 345 |
+----------+---------------+
4 rows in set (0.001 secs)
Range SI query result sets are inclusive; both specified values are included in the results (340 and 345 in the above example).
If a cluster node goes down, in-progress SI queries return an error and partial results.
Other tools
Other ways of creating secondary indexes are:
- the aql tool uses an SQL-like interface to send index management commands to the Aerospike cluster.
createIndex
anddropIndex
equivalent client API calls in each language binding.
Aerospike does not support SQL as a query or management language.
References
See these topics for language-specific examples: