Expressions
This page describes Aerospike expressions.
Overview
Aerospike expressions is a strongly typed, functional, intentionally non-turing complete domain specific language. designed for manipulating and comparing bins and record metadata. Expressions are used to filter records, filter whether a record operation will occur, filter data from being shipped cross-datacenter, and to extend the functionality of transactions.
Types of expressions
Aerospike supports four types of expressions:
- Secondary index expression (introduced in Database 8.1)
- XDR Filter expressions (introduced in Database 5.3)
- Operation expressions (introduced in Database 5.6)
- Filter expressions (introduced in Database 5.2)
Secondary index expression
A secondary index can index
either the value of a specific bin or the computed value of an expression. When indexing
very large data sets, you can create more memory-efficient secondary indexes by indexing
on the computed value of an expression rather than bin data. Use the Aerospike admin tool
(asadm
) to create secondary
indexes. Queries can use an expression index
either by matching the expression used by the index, or by referring to the index name in
a predicate.
Operation expressions
Operation expressions (read and write expressions) are bin operations that can atomically compute a value from information within the record or provided by the expression. The resulting value is either returned to the client as is the case with read expressions, or written to a specified bin as is the case with write expressions. Operation expressions enable atomic, cross-bin operations, which were previously only available through UDF.
XDR filter expressions
Aerospike allows records shipped with XDR to remote
destinations to be filtered with expressions. XDR filters are dynamic and you
define them per namespace per destination datacenter (DC). You can set filter
expressions by using the info command xdr-set-filter
. You can also set them
programmatically using a client API.
XDR filtering lets you reduce the volume of data that you replicate. When you reduce the volume of replicated data, you also:
- Reduce network traffic.
- Reduce storage and processing requirements at destination datacenters, which avoids the costs of overprovisioning, most significantly in hub-and-spoke XDR topologies.
- Reduce the cost of moving data across or from public clouds.
Filter expressions
Filtering selects records that satisfy a boolean expression. Filter expressions support a variety of metadata functions and all applicable data type functions: the full List and Map APIs (including at a nested element context), bitwise functions on Bytes (Blobs), Geo-spatial queries on GeoJSON, and HyperLogLog functions. Filter expressions are only executed when the record exists, meaning that they do not execute when a read does not find the record or when a write creates the record.
Filters can be used with the following single record commands:
- read
- write
- record UDFs
- delete
- transactions
- batched commands
- primary index queries (FKA scans)
- secondary index queries
Prior to Database 5.2, Aerospike used the Predicate Expression (PredExp) system and language. Filter expressions replaced Predicate Expressions in Database 5.2.
Language
Aerospike expressions has a Polish Notation (PN) syntax with strict typing that expands the scope of what can be used to select records. Within an Expression, all data is immutable. This means that bin modifications occurring within an expression operate on an ephemeral copy and are not saved to the bin when the expression terminates.
Aerospike expressions does not include syntax for iteration or recursion.
Types
The type system is split into two type classes: value and bin. All expressions return a sub-type of these two types and all parameters to expressions use these types. Parameters that accept only values are described herein as ‘t_value’ or ‘library_specific’. Parameters that accept only bin expressions are described herein as ‘t_bin_expr’ or ‘bin_expr’. Parameters that accept either bin or value are described herein as ‘t_expr’ or ‘expr’. Where ‘library_specific’ means that it will be a type specific to the language library in use and where ‘t’ is one of the following:
- nil: value for
null
. - boolean: value only type which may be
true
orfalse
. - integer: 64-bit signed integer.
- float: 64-bit floating point.
- blob: Binary data.
- string: UTF-8 encoded string.
- geojson: GeoJSON.
- list: CDT List.
- map: CDT Map.
- hll: HyperLogLog.
- AUTO: Some libraries may implement type inference for certain single-result CDT read expressions when the expr_type can be deduced by the result_type.
Execution model
Metadata resolution is a performance critical component of Aerospike expressions. Metadata resides in the primary index and does not require a disk load (for namespaces with data on disk). Therefore, expressions that can be fully resolved using metadata will be able to forgo disk access, thereby gaining an order of magnitude in performance. Aerospike expressions achieves this using a two phase execution model. If an expression can be made to satisfy the necessary logic for a given operation with only metadata operations, doing so will result in large performance gains.
Metadata phase
The expressions system starts with the metadata phase where storage-data
evaluates to unknown
. Expressions with unknown
as input generally also
output unknown
with the exception of logical expressions that evaluate using
trilean logic. If the result is unknown
, then it will proceed to storage-data
phase. If the result is false
, then the record is filtered out without
accessing storage. If the result if true
, then the operation proceeds, and
storage will only be accessed if required by the operation.
Storage-data phase
Loads the record and executes the expression a second time. If the record
resides on disk, physical IO will be incurred. This phase always resolves to a
definite true
or false
answer.