This tutorial describes how to implement certain SQL SELECT statements
in Aerospike.
This notebook requires the Aerospike Database running locally with Java kernel and Aerospike Java Client. To create a Docker container that satisfies the requirements and holds a copy of Aerospike notebooks, visit the Aerospike Notebooks Repo.
Introduction
In this notebook, we will see how specific SELECT statements in SQL can
be implemented in Aerospike.
SQL is a widely known data access language. If you have used SQL, the
examples in this notebook will make it easier to implement specific SQL
SELECT statements.
This notebook is the first in the SQL Operations series that consists of
the following notebooks:
Implementing SQL Operations: UPDATE, CREATE, and DELETE
Implementing SQL Operations: Aggregates
The specific topics and SQL SELECT statements we discuss include:
Components of the SELECT statement
Single record retrieval
Batch retrieval
Predicate based retrieval
Query operation using an index
Query operation using an expression filter
Scan operation using an expression filter
Computed fields
The purpose of this notebook is to provide Aerospike API equivalents for
specific SQL operations. Not all SQL operations (such as JOIN) are
directly supported in Aerospike API. The SQL syntax in this notebook
while largely accurate is meant only to convey the semantics of the
operations. Check out Aerospike Presto
Connector
for ad-hoc SQL access to Aerospike data.
While Aerospike provides both synchronous and asynchronous execution
modes for many operations, we use mostly synchronous execution examples
in this notebook, leaving asynchronous execution as a separate topic for
a future tutorial.
Prerequisites
This tutorial assumes familiarity with the following topics:
The test data has ten records with user-key “id-1” through “id-10”, two
integer bins (fields) “bin1” and “bin2”, in the namespace “test” and
sets “sql-select-small”and null, and similarly structured 1000 records
in set “sql-select-large”.
System.out.println("Initialized the client and connected to the cluster.");
StringNamespace="test";
StringSmallSet="sql-select-small";
StringLargeSet="sql-select-large";
StringNullSet="";
WritePolicywpolicy=newWritePolicy();
wpolicy.sendKey=true;
for (inti=1; i <=10; i++) {
Keykey=newKey(Namespace, SmallSet, "id-"+i);
Binbin1=newBin(new String("bin1"), i);
Binbin2=newBin(new String("bin2"), 1000+i);
client.put(wpolicy, key, bin1, bin2);
}
for (inti=1; i <=10; i++) {
Keykey=newKey(Namespace, NullSet, "id-"+i);
Binbin1=newBin(new String("bin1"), i);
Binbin2=newBin(new String("bin2"), 1000+i);
client.put(wpolicy, key, bin1, bin2);
}
for (inti=1; i <=1000; i++) {
Keykey=newKey(Namespace, LargeSet, "id-"+i);
Binbin1=newBin(new String("bin1"), i);
Binbin2=newBin(new String("bin2"), 1000+i);
client.put(wpolicy, key, bin1, bin2);
}
System.out.format("Test data populated");;
Output:
Initialized the client and connected to the cluster.
Test data populated
Mapping Components of SELECT Statement
Columns and tables
In Aerospike, a relational database or schema maps to a namespace, a
table maps to a set, and a column maps to a bin. Thus a query
SELECT columns FROM table WHERE condition can be written in Aerospike
terminology as SELECT bins FROM namespace.set WHERE condition.
Record id
Records are stored in a namespace, organized in sets, and each record is
uniquely identified by a key or id of the record that consists of a
triple: (namespace, set, user-key) where user-key is a unique user
specified id within the set. The key is closely identified with a
record, and can be seen either as a metadata or a primary key field, and
is returned in all retrieval APIs.
Record metadata
Each record has generation (or version) and expiration (or time-to-live
in seconds) associated with it. This metadata is returned in all
retrieval operations. It is possible to retrieve only the metadata
without the record’s bins through “getHeader” operation explained below.
A note on Policy
All APIs take a Policy argument. A policy specifies many request
parameters such as timeout and maximum retries, as well as operations
modifiers such as an expression filter.
Single Record Get
Let’s start with a simple example of a single record retrieval using its
key. You can either get the entire record or specific bins.
SELECT * FROM namespace.set WHERE id = key
Record Client::get(Policy policy, Key key)
SELECT bins FROM namsepace.set WHERE id = key
Record Client::get(Policy policy, Key key, String... binNames)
importcom.aerospike.client.Record;
// Read all bins of the record with user-key "id-3" from the small set
A batch operation operates on a list of records identified by the keys
provided. This works similar to a single record retrieval, except
multiple records are returned.
It is possible to obtain header info or metadata consisting of
generation (or version) and expiration time (time-to-live in seconds)
for a specified set of records.
SELECT generation, expiration FROM namespace.set WHERE id IN key-list
System.out.format("Key not found: key=%s\n", keys[i].userKey);
}
}
Output:
Batch metadata results:
Key not found: key=id-0
key=id-1 generation=1 expiration=355535130
key=id-2 generation=1 expiration=355535130
key=id-3 generation=1 expiration=355535130
Union of Batch Retrievals
A more general form of batch reads is also available that provides a
union of simple batch results with different namespace, set, and bin
specification. It populates the argument “records” on return.
(SELECT bins1 FROM namespace1.set1 WHERE id IN key-list1) UNION (SELECT bins2 FROM namespace2.set2 WHERE id IN key-list2) UNION ...
System.out.format("Key not found: set='%s' key=%s\n", key.setName, key.userKey);
}
}
Output:
Union of multi batch results:
set='sql-select-small' key=id-1 bins={bin2=1001}
set='' key=id-1 bins={bin1=1, bin2=1001}
set='sql-select-small' key=id-2 bins=null
set='sql-select-small' key=id-3 bins=null
Key not found: set='sql-select-small' key=no-such-key
Predicate Based Retrieval
In these operations, records matching a general predicate (or a
condition) are retrieved.
SELECT bins FROM namespace.set WHERE condition
There are multiple ways of performing this SQL query in Aerospike. They
involve query and scan operations.
Query operation using an index and/or expression filter
Scan operation using an expression filter
The query operation must be used when an index is involved, but may be
used without an index. The scan operation can only be used without an
index.
Query Based on Index
In SQL, an index if applicable is used automatically. In Aerospike, one
must know the index and specify it explicitly in the statement argument
in a query operation.
To use the query API with index based filter, a secondary index must
exist on the filter bin. Here we create a numeric index on “bin1” in
“sql-select-small” set.
importcom.aerospike.client.policy.Policy;
importcom.aerospike.client.query.IndexType;
importcom.aerospike.client.task.IndexTask;
importcom.aerospike.client.AerospikeException;
importcom.aerospike.client.ResultCode;
StringIndexName="test_small_bin1_number_idx";
Policypolicy=newPolicy();
policy.socketTimeout=0; // Do not timeout on index create.
Here are some key points to remember about query and scan operations in
Aerospike.
To leverage an index, one must use a query operation.
A query takes either or both: an index predicate and an expression
filter.
An expression filter may be used instead of an index predicate, but
it will not perform as well.
When only an expression filter is needed, either a query or a scan
may be used (as shown above).
A null set value when an index predicate is used works on the null
set (records belonging to no set), but without an index predicate
works on the entire namespace.
An expression filter is specified within the policy, and is applied
generally for filtering records beyond query and scan. You can find
examples of this outside of this tutorial.
Some of these are illustrated with examples in the following cells.
To leverage an index, one must use a query operation.
If an index predicate is used on an unindexed bin, it results in an
error.
Statementstmt=newStatement();
stmt.setNamespace(Namespace);
stmt.setSetName(SmallSet);
// try to use an index predicate on bin2 which has no index
stmt.setFilter(Filter.range("bin2", 1004, 1007));
try {
RecordSetrs=client.query(null, stmt);
System.out.format("Query with index predicate on unindexed bin results:\n");
Query based on index predicate and expression results
key=id-2 bins={bin1=2, bin2=1002}
key=id-3 bins={bin1=3, bin2=1003}
A null set value when an index predicate is used works records belonging to no (null) set, but without an index predicate works on the entire namespace.
The scope of an index is a set. An index must exist on the null set when
an index predicate is used with the null set.
// query with a null set
Statementstmt=newStatement();
stmt.setNamespace(Namespace);
stmt.setSetName(NullSet);
// the filter selects records with bin1=3 in all sets
An arbitrary function registered on the server (UDF) is invoked on the
specified record. In this tutorial, we deal with a single record
oriented functions as opposed to “stream oriented” functions. The latter
will be discussed in a subsequent notebook on Aggregates in this series.
The API returns a generic Object which can be anything like a single
value or a dictionary. Note, UDFs may not be appropriate for performance
sensitive applications; for record-oriented functions, simply retrieving
the record and computing the function on the client site may be faster.
A read-write function may be alternatively implemented atomically on the
client side using the read-modify-write
pattern.
Create User Defined Function (UDF)
Examine the following Lua code that takes two bins and returns their sum
and product. Create a “udf” directory under “java” and create a file
“computed_fields.lua” with this Lua code.
Add the following code to the file “computed_fields.lua” in the
sub-directory “udf”:
<pre>
-- computed_fields.lua - return sum and product of specified bins
function sum_and_product(rec, binName1, binName2)
local ret = map() -- Initialize the return value (a map)
Many developers that are familiar with SQL would like to see how SQL
operations translate to Aerospike. We looked at how to implement various
SELECT statements. This should be generally useful irrespective of the
reader’s SQL knowledge. While the examples here use synchronous
execution, many operations can also be performed asynchronously.
Visit Aerospike notebooks
repo to
run additional Aerospike notebooks. To run a different notebook,
download the notebook from the repo to your local machine, and then
click on File->Open, and select Upload.