Skip to main content
Loading

Primary Index (PI) Queries

info

Prior to Server 6.0, primary index (PI) queries were called scans and secondary index (SI) queries were called queries. See the Queries feature guide.

Common uses of PI queries

You can use PI queries to:

  • Retrieve all or a specified count of records in a namespace or set. This is called a read-only PI query.
  • Filter for records that have been updated since a specific Last Update Time (LUT).
  • Do regular database maintenance by querying all records in a set or namespace and selectively updating records via a User-Defined Function (UDF) or an array of multi-ops. This is called a read-write PI query (or background query).

Applications can send query requests to all partitions in the cluster, to specific partitions, or to specific digests within partitions.

Read-only PI queries

A client application executes a command to start a read-only query. This command initiates parallel requests to each node in the cluster. As the query iterates through each partition, it returns the current version of each record to the client.

Many database tasks such as index creation and backups also use data scan as the underlying mechanism.

Read-only PI queries have the following features:

  • Filter records by set name.
  • Filter records by Filter Expressions such as last-update-time > X.
  • Filter by count. The servers return the specified number of records. This can be used for pagination.
  • Return only record digests and metadata (generation and TTL).
  • Return specified bins.

Background read-write query

A client application can also issue an asynchronous background query to the database and apply either a Lua UDF or a series of write multi-ops to each record. This is more efficient than the client-side query for cases where data needs to be manipulated. Multi-ops are typically more efficient than using Lua UDFs because the server doesn't need to translate internal objects to another language. Many client libraries also provide an API to poll for the completion of a background query.

Like read-only PI queries, read-write PI queries are often used for database maintenance, and can rely on arbitrary rules for grooming your data. For example, you can use a UDF to compare the last_visited value of a record to some specified date/time. If the value is too old, which implies that that the record has not been updated for a long time, the application can delete the record. The application can apply a combination of such rules to fine-tune the query.

The application can also use generic grooming functions and pass parameters when the query executes. This approach is powerful because cleanup processing is done as close to the data source as possible.

Read-write PI queries have the following features:

  • Filter records by set name.
  • Filter records by Filter Expressions such as last-update-time > X.
  • Read/Update using a User-Defined Function (UDF).
  • Update using write multi-ops; that is, operations on bins.

In the event of disruptions to the cluster, a background query might not process all of the records.

Quotas on PI queries

Rate quotas, introduced in Aerospike Enterprise Edition 5.6, can be used to limit the disk IO performed by a specific user, including their query operations. See Rate Quotas for more.

Client application examples

The Aerospike Developer Hub contains client library code examples for all supported client libraries.

Historical evolution of scan features

Server 6.0.0 Changes

  • The query and scan subsystems were unified. Scans are deprecated in the clients, and the Query API handles both types, primary index (PI) queries (AKA scans) and secondary index (SI) queries.

Server 5.6.0 Changes

  • Optional configurable set indexes added to speed up PI queries of sets which are small compared to their namespace.

Server 4.9.0 Changes

  • Read/Write multi-op support added to background read-write PI queries.
  • PI queries are now issued per partition instead of per node. This resolves the issue where a scan could return duplicated records or not return some records during cluster state changes. Clients indeed leverage the 'scan per partition' feature to make sure each partition is processed once exactly, even during cluster change events where partition ownership shifts between cluster nodes.

Server 4.7.0 Changes

In server 4.7.0 and later, PI queries (scans) have the following characteristics:

  • PI queries do not have a priority.
  • Concurrent PI queries each use their own dedicated threads, so they will not interfere with each other.
  • PI queries can be independently throttled, via a specified requested records per second (rps) value.
  • single-scan-threads specifies the maximum number of threads allowed for a single scan.
  • scan-threads-limit specifies the maximum number of threads allowed for all concurrent PI queries. If this limit is reached any new scan requests will fail with an AS_ERR_FORBIDDEN error.
  • background-scan-max-rps specifies the maximum rps value allowed for a background scan.
  • PI queries begin with a single thread and add threads as needed (up to the configured limits) to achieve the specified rps. If a single thread would exceed the specified rps, throttling will occur within the thread to achieve the specified rps.
  • If no rps is specified, the scan will run as quickly as possible. That is, the scan will not be throttled, and will use as many threads as allowed by the configured limits.
  • A scan's requested rps cannot be changed dynamically.
  • If the client specifies an rps but doesn't consume responses from the server fast enough, the overall measured rate on the server can fall behind the target rps. In such cases, the server reduces or stops throttling, and/or adds threads to try to catch up with the target. Once the maximum number of allowed threads per scan is reached with no throttling, the client consumption governs the rps. If the client starts consuming again, superfluous threads are retained and the target rps can then be exceeded for the remainder of the scan.

See Manage Scan for more information.

Server 3.6.0 Enhancements

In Server 3.6.0 through Server 4.6.0, scans have the following enhancements:

  • Concurrent scans are interlaced so that all scans can simultaneously progress.
  • Only scan-max-active number of scans can be active at anytime. New scans are rejected and the AS_PROTO_RESULT_FAIL_FORBIDDEN error returns.
  • For read-only scans, increasing the number of threads dedicated to can subsystem to achieve more parallelism. Use this method with caution and monitor system performance.
  • For scans that execute a UDF on each record (update scans), the scan-max-udf-transactions parameter controls parallelism and how many records the scan can concurrently process.
  • Expedite active scans expedited by changing their priority. Higher priority scans interrupt lower priority scans until all higher priority scans complete.