Examples of Querying Aerospike Databases
The following examples illustrate querying Aerospike data using the Trino Command Line Interface (CLI).
Simple select queryโ
trino:test> select * from trino where bin1>9881 and bin1<9900;
bin1 | __key | id | list | map | int
------+-------+---------+--------------+------+------
9893 | NULL | id_9893 | NULL | NULL | NULL
9883 | NULL | id_9883 | NULL | NULL | NULL
9899 | NULL | id_9899 | NULL | NULL | NULL
9891 | NULL | id_9891 | ["str_9891"] | NULL | NULL
9885 | NULL | id_9885 | ["str_9885"] | NULL | NULL
9895 | NULL | id_9895 | NULL | NULL | NULL
9887 | NULL | id_9887 | NULL | NULL | NULL
9897 | NULL | id_9897 | ["str_9897"] | NULL | NULL
9889 | NULL | id_9889 | NULL | NULL | NULL
(9 rows)
Query 20201020_155406_00002_umepu, FINISHED, 1 node
Splits: 20 total, 20 done (100.00%)
0:01 [9 rows, 168B] [7 rows/s, 132B/s]
Secondary Index (sindex) queryโ
In this example, secondary indexes were created on the two bins โcase_last_weekโ and โprobable_ddโ in the Aerospike database. The secondary indexes are named case_last_week_idx
and probable_dd_idx
respectively.
Step 1 (OPTIONAL)โ
This step lists all the sindexes that are available for your query.
Secondary indexes are listed per Aerospike namespace or Trino schema. You can either use a USE statement to select the schema or prepend the schema name to "sindex". For example, `test.sindex, where
test` is the name of the schema.
You can optionally change the name from the default name "__sindex" to the name of your choice using the aerospike.index-table-name
property.
trino:test> select * from __sindex;
column_name | sindex_name | sindex_type | schema_name | table_name
----------------+--------------------+-------------+-------------+-------------
case_last_week | case_last_week_idx | NUMERIC | test | covid_stats
probable_dd | probable_dd_idx | NUMERIC | test | covid_stats
(2 rows)
Query 20220420_190120_00005_78a96, FINISHED, 1 node
Splits: 9 total, 9 done (100.00%)
0.21 [2 rows, 102B] [9 rows/s, 477B/s]
Step 2 (OPTIONAL)โ
In this step, provide a hint to the Trino connector about which sindex you plan to query. Use the sindex_name
session property. Aerospike supports only one sindex per query. If you have more than one sindex, for the best performance, choose the one with the highest cardinality.
If you skip this step, the connector will choose one based on its lexical order in the list of sindexes. There is a possibility that this may lead to an inefficient query if a sindex with lower cardinality is ranked higher in the lexical order.
We recommend you provide a hint about the sindex wherever possible.
trino:test> set session aerospike.sindex_name='probable_dd_idx';
SET SESSION
Step 3โ
Although you can specify one or more sindex in your query, only one sindex is used per query. However, there can be multiple matching indexes for the query predicates. If your query includes multiple sindexes, the one which ranks higher in the lexical order is selected for the query. We recommend that you not skip the previous two steps if information about the sindex cardinality is available to you a priori.
trino:test> select state_ter, total_cases, case_last_week from covid_stats where probable_dd < 300;
state_ter | total_cases | case_last_week
--------------------------------+-------------+----------------
North Dakota | 54305 | 8290
Wisconsin | 281910 | 40487
South Carolina | 185688 | 7665
New Hampshire | 12488 | 1274
Oregon | 50448 | 5019
Vermont | 2392 | 213
Hawaii | 15834 | 759
Florida | 832525 | 35723
Texas | 956234 | 51379
Virgin Islands | 1405 | 27
North Carolina | 293339 | 16647
Oklahoma | 146692 | 16151
Missouri | 209197 | 23662
Rhode Island | 36380 | 3506
The sindex selected for the above query will be used for subsequent queries in this session unless you reset the property in the session or set another sindex within the same session. You can reset the sindex in the session using:
trino:test> reset session aerospike.sindex_name;
RESET SESSION
Hereโs another example that shows multiple matching sindexes for the same query predicate. Although there is a sindex available for the โCase_last_weekโ column, the sindex for probable_dd
will be used because that is what you provided as a hint.
trino:test> select state_ter, total_cases, case_last_week from covid_stats where probable_dd < 300 and case_last_week > 1000;
state_ter | total_cases | case_last_week
----------------+-------------+----------------
Wyoming | 17310 | 3587
Pennsylvania | 230894 | 20958
Alaska | 18716 | 3094
Minnesota | 180862 | 30190
Tennessee | 281851 | 20425
Utah | 132621 | 16111
Montana | 39679 | 6184
Querying namespaces that do not include setsโ
Suppose that you want to run queries against a namespace that does not include a set. You can use the __default
keyword, which specifies to query the null set. The schema is inferred.
Here is an example which returns all records that are in the namespace test
:
trino:test> select * from __default
The configuration property aerospike.default-set-name
lets you name the null set. For example, if you set the value to set0
, you can run this statement against your namespace:
trino:test> select * from set0
Using JSON functions for CDT accessโ
trino:test> select json_extract(map, '$.k2') as k2 from trino where json_extract(map, '$.k2') is not null limit 2;
k2
-----
"2"
"2"
(2 rows)
Query 20201020_155215_00001_umepu, FINISHED, 1 node
Splits: 37 total, 35 done (94.59%)
0:01 [8.31K rows, 177KB] [6.42K rows/s, 136KB/s]
Insert queryโ
If the list of column names is specified, they must exactly match the list of columns produced by the query. Each column in the table not present in the column list will be filled with a null
value. Otherwise, if the list of columns is not specified, the columns produced by the query must exactly match the columns in the table being inserted into.
The connector only supports READ UNCOMMITTED
transaction isolation level.
trino:test> insert into trino (__key, id, int, bin1) values ('abcd', null, null, 11111);
INSERT: 1 row
Query 20201103_160726_00004_3pbwu, FINISHED, 1 node
Splits: 51 total, 51 done (100.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]
Federated queryโ
trino:test> select aerospike.test.trino.id, aerospike2.test.trino.id from aerospike.test.trino inner join aerospike2.test.trino on aerospike.test.trino.bin1=aerospike2.test.trino.bin1 where aerospike.test.trino.bin1<5;
id | id
------+------
id_1 | id_1
id_3 | id_3
(2 rows)
Query 20201020_155814_00003_umepu, FINISHED, 1 node
Splits: 73 total, 73 done (100.00%)
0:03 [4 rows, 64B] [1 rows/s, 24B/s]
Aggregate functionsโ
The Trino connector supports Trino's aggregate functions. The example that follows uses the count()
aggregate function. For a full list of Trino's aggregate functions, see this page of the documentation for Trino.
trino:test> select trino.id, count(trino2.id) from trino2, trino where trino2.bin1=trino.bin1 group by trino.id order by count(trino2.id) desc;
id | _col1
------+-------
id_7 | 1
id_3 | 1
id_5 | 1
id_1 | 1
id_9 | 1
(5 rows)
Query 20201014_165146_00000_jfivj, FINISHED, 1 node
Splits: 106 total, 106 done (100.00%)
0:04 [10K rows, 213KB] [2.27K rows/s, 48.5KB/s]
Cost-Based Optimization (CBO)โ
Cost-Based Optimization (CBO) enabledโ
The Trino connector supports the row-count statistic only.
trino:test> explain select trino.id, count(trino2.id) from trino2, trino where trino2.bin1=trino.bin1 group by trino.id order by count(trino2.id) desc;
------------------------------------------------------------------------------------
Fragment 0 [SINGLE]
Output layout: [id_1, count_5]
Output partitioning: SINGLE []
Stage Execution Strategy: UNGROUPED_EXECUTION
Output[id, _col1]
โ Layout: [id_1:varchar, count_5:bigint]
โ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
โ id := id_1
โ _col1 := count_5
โโ RemoteMerge[1]
Layout: [id_1:varchar, count_5:bigint]
Fragment 1 [ROUND_ROBIN]
Output layout: [id_1, count_5]
Output partitioning: SINGLE []
Stage Execution Strategy: UNGROUPED_EXECUTION
LocalMerge[count_5 DESC_NULLS_LAST]
โ Layout: [id_1:varchar, count_5:bigint]
โ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
โโ PartialSort[count_5 DESC_NULLS_LAST]
โ Layout: [id_1:varchar, count_5:bigint]
โโ RemoteSource[2]
Layout: [id_1:varchar, count_5:bigint]
Fragment 2 [HASH]
Output layout: [id_1, count_5]
Output partitioning: ROUND_ROBIN []
Stage Execution Strategy: UNGROUPED_EXECUTION
Project[]
โ Layout: [id_1:varchar, count_5:bigint]
โ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
โโ Aggregate(FINAL)[id_1][$hashvalue]
โ Layout: [id_1:varchar, $hashvalue:bigint, count_5:bigint]
โ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
โ count_5 := count("count_15")
โโ LocalExchange[HASH][$hashvalue] ("id_1")
โ Layout: [id_1:varchar, count_15:bigint, $hashvalue:bigint]
โ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
โโ RemoteSource[3]
Layout: [id_1:varchar, count_15:bigint, $hashvalue_16:bigint]
Fragment 3 [SOURCE]
Output layout: [id_1, count_15, $hashvalue_21]
Output partitioning: HASH [id_1][$hashvalue_21]
Stage Execution Strategy: UNGROUPED_EXECUTION
Aggregate(PARTIAL)[id_1][$hashvalue_21]
โ Layout: [id_1:varchar, $hashvalue_21:bigint, count_15:bigint]
โ count_15 := count("id")
โโ Project[]
โ Layout: [id_1:varchar, id:varchar, $hashvalue_21:bigint]
โ Estimates: {rows: ? (?), cpu: ?, memory: 730B, network: 730B}
โ $hashvalue_21 := combine_hash(bigint '0', COALESCE("$operator$hash_code"
โโ InnerJoin[("bin1_0" = "bin1")][$hashvalue_17, $hashvalue_18]
โ Layout: [id_1:varchar, id:varchar]
โ Estimates: {rows: ? (?), cpu: ?, memory: 730B, network: 730B}
โ Distribution: REPLICATED
โ dynamicFilterAssignments = {bin1 -> df_382}
โโ ScanFilterProject[table = aerospike:aerospike:test:trino, grouped = f
โ Layout: [bin1_0:bigint, id_1:varchar, $hashvalue_17:bigint]
โ Estimates: {rows: 10000 (712.89kB), cpu: 625k, memory: 0B, network
โ $hashvalue_17 := combine_hash(bigint '0', COALESCE("$operator$hash
โ bin1_0 := AerospikeColumnHandle{name=bin1, type=bigint, hidden=fal
โ id_1 := AerospikeColumnHandle{name=id, type=varchar, hidden=false}
โโ LocalExchange[HASH][$hashvalue_18] ("bin1")
โ Layout: [bin1:bigint, id:varchar, $hashvalue_18:bigint]
โ Estimates: {rows: 10 (730B), cpu: 2.05k, memory: 0B, network: 730B
โโ RemoteSource[4]
Layout: [bin1:bigint, id:varchar, $hashvalue_19:bigint]
Fragment 4 [SOURCE]
Output layout: [bin1, id, $hashvalue_20]
Output partitioning: BROADCAST []
Stage Execution Strategy: UNGROUPED_EXECUTION
ScanProject[table = aerospike:aerospike:test:trino2, grouped = false]
Layout: [bin1:bigint, id:varchar, $hashvalue_20:bigint]
Estimates: {rows: 10 (730B), cpu: 640, memory: 0B, network: 0B}/{rows: 10 (
$hashvalue_20 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("b
bin1 := AerospikeColumnHandle{name=bin1, type=bigint, hidden=false}
id := AerospikeColumnHandle{name=id, type=varchar, hidden=false}
trino:test> select trino.id, count(trino2.id) from trino2, trino where trino2.bin1=trino.bin1 group by trino.id order by count(trino2.id) desc;
id | _col1
------+-------
id_7 | 1
id_3 | 1
id_5 | 1
id_9 | 1
id_1 | 1
(5 rows)
Query 20201014_165011_00003_qmngg, FINISHED, 1 node
Splits: 90 total, 90 done (100.00%)
0:03 [10K rows, 214KB] [3.06K rows/s, 65.4KB/s]
Cost-Based Optimization (CBO) disabledโ
trino:test> explain select trino.id, count(trino2.id) from trino2, trino where trino2.bin1=trino.bin1 group by trino.id order by count(trino2.id) desc;
------------------------------------------------------------------------------------
Fragment 0 [SINGLE]
Output layout: [id_1, count_5]
Output partitioning: SINGLE []
Stage Execution Strategy: UNGROUPED_EXECUTION
Output[id, _col1]
โ Layout: [id_1:varchar, count_5:bigint]
โ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
โ id := id_1
โ _col1 := count_5
โโ RemoteMerge[1]
Layout: [id_1:varchar, count_5:bigint]
Fragment 1 [ROUND_ROBIN]
Output layout: [id_1, count_5]
Output partitioning: SINGLE []
Stage Execution Strategy: UNGROUPED_EXECUTION
LocalMerge[count_5 DESC_NULLS_LAST]
โ Layout: [id_1:varchar, count_5:bigint]
โ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
โโ PartialSort[count_5 DESC_NULLS_LAST]
โ Layout: [id_1:varchar, count_5:bigint]
โโ RemoteSource[2]
Layout: [id_1:varchar, count_5:bigint]
Fragment 2 [HASH]
Output layout: [id_1, count_5]
Output partitioning: ROUND_ROBIN []
Stage Execution Strategy: UNGROUPED_EXECUTION
Project[]
โ Layout: [id_1:varchar, count_5:bigint]
โ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
โโ Aggregate(FINAL)[id_1][$hashvalue]
โ Layout: [id_1:varchar, $hashvalue:bigint, count_5:bigint]
โ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
โ count_5 := count("count_15")
โโ LocalExchange[HASH][$hashvalue] ("id_1")
โ Layout: [id_1:varchar, count_15:bigint, $hashvalue:bigint]
โ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
โโ RemoteSource[3]
Layout: [id_1:varchar, count_15:bigint, $hashvalue_16:bigint]
Fragment 3 [HASH]
Output layout: [id_1, count_15, $hashvalue_22]
Output partitioning: HASH [id_1][$hashvalue_22]
Stage Execution Strategy: UNGROUPED_EXECUTION
Aggregate(PARTIAL)[id_1][$hashvalue_22]
โ Layout: [id_1:varchar, $hashvalue_22:bigint, count_15:bigint]
โ count_15 := count("id")
โโ Project[]
โ Layout: [id:varchar, id_1:varchar, $hashvalue_22:bigint]
โ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
โ $hashvalue_22 := combine_hash(bigint '0', COALESCE("$operator$hash_code"
โโ InnerJoin[("bin1" = "bin1_0")][$hashvalue_17, $hashvalue_19]
โ Layout: [id:varchar, id_1:varchar]
โ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
โ Distribution: PARTITIONED
โ dynamicFilterAssignments = {bin1_0 -> df_387}
โโ RemoteSource[4]
โ Layout: [bin1:bigint, id:varchar, $hashvalue_17:bigint]
โโ LocalExchange[HASH][$hashvalue_19] ("bin1_0")
โ Layout: [bin1_0:bigint, id_1:varchar, $hashvalue_19:bigint]
โ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: ?}
โโ RemoteSource[5]
Layout: [bin1_0:bigint, id_1:varchar, $hashvalue_20:bigint]
Fragment 4 [SOURCE]
Output layout: [bin1, id, $hashvalue_18]
Output partitioning: HASH [bin1][$hashvalue_18]
Stage Execution Strategy: UNGROUPED_EXECUTION
ScanFilterProject[table = aerospike:aerospike:test:trino2, grouped = false, fi
Layout: [bin1:bigint, id:varchar, $hashvalue_18:bigint]
Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu
$hashvalue_18 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("b
bin1 := AerospikeColumnHandle{name=bin1, type=bigint, hidden=false}
id := AerospikeColumnHandle{name=id, type=varchar, hidden=false}
Fragment 5 [SOURCE]
Output layout: [bin1_0, id_1, $hashvalue_21]
Output partitioning: HASH [bin1_0][$hashvalue_21]
Stage Execution Strategy: UNGROUPED_EXECUTION
ScanProject[table = aerospike:aerospike:test:trino, grouped = false]
Layout: [bin1_0:bigint, id_1:varchar, $hashvalue_21:bigint]
Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu
$hashvalue_21 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("b
bin1_0 := AerospikeColumnHandle{name=bin1, type=bigint, hidden=false}
id_1 := AerospikeColumnHandle{name=id, type=varchar, hidden=false}
trino:test> select trino.id, count(trino2.id) from trino2, trino where trino2.bin1=trino.bin1 group by trino.id order by count(trino2.id) desc;
id | _col1
------+-------
id_7 | 1
id_3 | 1
id_5 | 1
id_1 | 1
id_9 | 1
(5 rows)
Query 20201014_165146_00000_jfivj, FINISHED, 1 node
Splits: 106 total, 106 done (100.00%)
0:04 [10K rows, 213KB] [2.27K rows/s, 48.5KB/s]
With CBO turned on, the same query ran more quickly because 15% fewer splits were processed and 1 less fragment for a 10K record table was executed. Had the table been larger, the gain in performance would have been greater.