Examples of querying Aerospike Database
The following examples illustrate querying Aerospike data using the Trino Command Line Interface (CLI).
Simple select query
trino:test> select * from trino where bin1>9881 and bin1<9900; bin1 | __key | id | list | map | int------+-------+---------+--------------+------+------ 9893 | NULL | id_9893 | NULL | NULL | NULL 9883 | NULL | id_9883 | NULL | NULL | NULL 9899 | NULL | id_9899 | NULL | NULL | NULL 9891 | NULL | id_9891 | ["str_9891"] | NULL | NULL 9885 | NULL | id_9885 | ["str_9885"] | NULL | NULL 9895 | NULL | id_9895 | NULL | NULL | NULL 9887 | NULL | id_9887 | NULL | NULL | NULL 9897 | NULL | id_9897 | ["str_9897"] | NULL | NULL 9889 | NULL | id_9889 | NULL | NULL | NULL(9 rows)
Query 20201020_155406_00002_umepu, FINISHED, 1 nodeSplits: 20 total, 20 done (100.00%)0:01 [9 rows, 168B] [7 rows/s, 132B/s]
Secondary Index (sindex) query
In this example, secondary indexes were created on the two bins “case_last_week” and “probable_dd” in the Aerospike database. The secondary indexes are named case_last_week_idx
and probable_dd_idx
respectively.
Step 1 (OPTIONAL)
This step lists all the sindexes that are available for your query.
Secondary indexes are listed per Aerospike namespace or Trino schema. You can either use a USE statement to select the schema or prepend the schema name to “__sindex”. For example, test.__sindex
, where test
is the name of the schema.
You can optionally change the name from the default name “__sindex” to the name of your choice using the aerospike.index-table-name
property.
trino:test> select * from __sindex; column_name | sindex_name | sindex_type | schema_name | table_name----------------+--------------------+-------------+-------------+------------- case_last_week | case_last_week_idx | NUMERIC | test | covid_stats probable_dd | probable_dd_idx | NUMERIC | test | covid_stats(2 rows)
Query 20220420_190120_00005_78a96, FINISHED, 1 nodeSplits: 9 total, 9 done (100.00%)0.21 [2 rows, 102B] [9 rows/s, 477B/s]
Step 2 (OPTIONAL)
In this step, provide a hint to the Trino connector about which sindex you plan to query. Use the sindex_name
session property. Aerospike supports only one sindex per query. If you have more than one sindex, for the best performance, choose the one with the highest cardinality.
If you skip this step, the connector will choose one based on its lexical order in the list of sindexes. There is a possibility that this may lead to an inefficient query if a sindex with lower cardinality is ranked higher in the lexical order.
We recommend you provide a hint about the sindex wherever possible.
trino:test> set session aerospike.sindex_name='probable_dd_idx';SET SESSION
Step 3
Although you can specify one or more sindex in your query, only one sindex is used per query. However, there can be multiple matching indexes for the query predicates. If your query includes multiple sindexes, the one which ranks higher in the lexical order is selected for the query. We recommend that you not skip the previous two steps if information about the sindex cardinality is available to you a priori.
trino:test> select state_ter, total_cases, case_last_week from covid_stats where probable_dd < 300; state_ter | total_cases | case_last_week--------------------------------+-------------+---------------- North Dakota | 54305 | 8290 Wisconsin | 281910 | 40487 South Carolina | 185688 | 7665 New Hampshire | 12488 | 1274 Oregon | 50448 | 5019 Vermont | 2392 | 213 Hawaii | 15834 | 759 Florida | 832525 | 35723 Texas | 956234 | 51379 Virgin Islands | 1405 | 27 North Carolina | 293339 | 16647 Oklahoma | 146692 | 16151 Missouri | 209197 | 23662 Rhode Island | 36380 | 3506
The sindex selected for the above query will be used for subsequent queries in this session unless you reset the property in the session or set another sindex within the same session. You can reset the sindex in the session using:
trino:test> reset session aerospike.sindex_name;RESET SESSION
Here’s another example that shows multiple matching sindexes for the same query predicate. Although there is a sindex available for the “Case_last_week” column, the sindex for probable_dd
will be used because that is what you provided as a hint.
trino:test> select state_ter, total_cases, case_last_week from covid_stats where probable_dd < 300 and case_last_week > 1000; state_ter | total_cases | case_last_week----------------+-------------+---------------- Wyoming | 17310 | 3587 Pennsylvania | 230894 | 20958 Alaska | 18716 | 3094 Minnesota | 180862 | 30190 Tennessee | 281851 | 20425 Utah | 132621 | 16111 Montana | 39679 | 6184
Querying namespaces that do not include sets
Suppose that you want to run queries against a namespace that does not include a set. You can use the __default
keyword, which specifies to query the null set. The schema is inferred.
Here is an example which returns all records that are in the namespace test
:
trino:test> select * from __default
The configuration property aerospike.default-set-name
lets you name the null set. For example, if you set the value to set0
, you can run this statement against your namespace:
trino:test> select * from set0
Using JSON functions for CDT access
trino:test> select json_extract(map, '$.k2') as k2 from trino where json_extract(map, '$.k2') is not null limit 2; k2----- "2" "2"(2 rows)
Query 20201020_155215_00001_umepu, FINISHED, 1 nodeSplits: 37 total, 35 done (94.59%)0:01 [8.31K rows, 177KB] [6.42K rows/s, 136KB/s]
Insert query
If the list of column names is specified, they must exactly match the list of columns produced by the query. Each column in the table not present in the column list will be filled with a null
value. Otherwise, if the list of columns is not specified, the columns produced by the query must exactly match the columns in the table being inserted into.
trino:test> insert into trino (__key, id, int, bin1) values ('abcd', null, null, 11111);INSERT: 1 row
Query 20201103_160726_00004_3pbwu, FINISHED, 1 nodeSplits: 51 total, 51 done (100.00%)0:00 [0 rows, 0B] [0 rows/s, 0B/s]
Federated query
trino:test> select aerospike.test.trino.id, aerospike2.test.trino.id from aerospike.test.trino inner join aerospike2.test.trino on aerospike.test.trino.bin1=aerospike2.test.trino.bin1 where aerospike.test.trino.bin1<5; id | id------+------ id_1 | id_1 id_3 | id_3(2 rows)
Query 20201020_155814_00003_umepu, FINISHED, 1 nodeSplits: 73 total, 73 done (100.00%)0:03 [4 rows, 64B] [1 rows/s, 24B/s]
Aggregate functions
The Trino connector supports Trino’s aggregate functions. For a full list, see Aggregate functions.
The following example uses the count()
aggregate function.
trino:test> select trino.id, count(trino2.id) from trino2, trino where trino2.bin1=trino.bin1 group by trino.id order by count(trino2.id) desc;id | _col1------+-------id_7 | 1id_3 | 1id_5 | 1id_1 | 1id_9 | 1(5 rows)
Query 20201014_165146_00000_jfivj, FINISHED, 1 nodeSplits: 106 total, 106 done (100.00%)0:04 [10K rows, 213KB] [2.27K rows/s, 48.5KB/s]
Cost-Based Optimization (CBO)
Cost-Based Optimization (CBO) enabled
The Trino connector supports the row-count statistic only.
trino:test> explain select trino.id, count(trino2.id) from trino2, trino where trino2.bin1=trino.bin1 group by trino.id order by count(trino2.id) desc;
------------------------------------------------------------------------------------ Fragment 0 [SINGLE] Output layout: [id_1, count_5] Output partitioning: SINGLE [] Stage Execution Strategy: UNGROUPED_EXECUTION Output[id, _col1] │ Layout: [id_1:varchar, count_5:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?} │ id := id_1 │ _col1 := count_5 └─ RemoteMerge[1] Layout: [id_1:varchar, count_5:bigint]
Fragment 1 [ROUND_ROBIN] Output layout: [id_1, count_5] Output partitioning: SINGLE [] Stage Execution Strategy: UNGROUPED_EXECUTION LocalMerge[count_5 DESC_NULLS_LAST] │ Layout: [id_1:varchar, count_5:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?} └─ PartialSort[count_5 DESC_NULLS_LAST] │ Layout: [id_1:varchar, count_5:bigint] └─ RemoteSource[2] Layout: [id_1:varchar, count_5:bigint]
Fragment 2 [HASH] Output layout: [id_1, count_5] Output partitioning: ROUND_ROBIN [] Stage Execution Strategy: UNGROUPED_EXECUTION Project[] │ Layout: [id_1:varchar, count_5:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?} └─ Aggregate(FINAL)[id_1][$hashvalue] │ Layout: [id_1:varchar, $hashvalue:bigint, count_5:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?} │ count_5 := count("count_15") └─ LocalExchange[HASH][$hashvalue] ("id_1") │ Layout: [id_1:varchar, count_15:bigint, $hashvalue:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?} └─ RemoteSource[3] Layout: [id_1:varchar, count_15:bigint, $hashvalue_16:bigint]
Fragment 3 [SOURCE] Output layout: [id_1, count_15, $hashvalue_21] Output partitioning: HASH [id_1][$hashvalue_21] Stage Execution Strategy: UNGROUPED_EXECUTION Aggregate(PARTIAL)[id_1][$hashvalue_21] │ Layout: [id_1:varchar, $hashvalue_21:bigint, count_15:bigint] │ count_15 := count("id") └─ Project[] │ Layout: [id_1:varchar, id:varchar, $hashvalue_21:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: 730B, network: 730B} │ $hashvalue_21 := combine_hash(bigint '0', COALESCE("$operator$hash_code" └─ InnerJoin[("bin1_0" = "bin1")][$hashvalue_17, $hashvalue_18] │ Layout: [id_1:varchar, id:varchar] │ Estimates: {rows: ? (?), cpu: ?, memory: 730B, network: 730B} │ Distribution: REPLICATED │ dynamicFilterAssignments = {bin1 -> df_382} ├─ ScanFilterProject[table = aerospike:aerospike:test:trino, grouped = f │ Layout: [bin1_0:bigint, id_1:varchar, $hashvalue_17:bigint] │ Estimates: {rows: 10000 (712.89kB), cpu: 625k, memory: 0B, network │ $hashvalue_17 := combine_hash(bigint '0', COALESCE("$operator$hash │ bin1_0 := AerospikeColumnHandle{name=bin1, type=bigint, hidden=fal │ id_1 := AerospikeColumnHandle{name=id, type=varchar, hidden=false} └─ LocalExchange[HASH][$hashvalue_18] ("bin1") │ Layout: [bin1:bigint, id:varchar, $hashvalue_18:bigint] │ Estimates: {rows: 10 (730B), cpu: 2.05k, memory: 0B, network: 730B └─ RemoteSource[4] Layout: [bin1:bigint, id:varchar, $hashvalue_19:bigint]
Fragment 4 [SOURCE] Output layout: [bin1, id, $hashvalue_20] Output partitioning: BROADCAST [] Stage Execution Strategy: UNGROUPED_EXECUTION ScanProject[table = aerospike:aerospike:test:trino2, grouped = false] Layout: [bin1:bigint, id:varchar, $hashvalue_20:bigint] Estimates: {rows: 10 (730B), cpu: 640, memory: 0B, network: 0B}/{rows: 10 ( $hashvalue_20 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("b bin1 := AerospikeColumnHandle{name=bin1, type=bigint, hidden=false} id := AerospikeColumnHandle{name=id, type=varchar, hidden=false}
trino:test> select trino.id, count(trino2.id) from trino2, trino where trino2.bin1=trino.bin1 group by trino.id order by count(trino2.id) desc; id | _col1------+------- id_7 | 1 id_3 | 1 id_5 | 1 id_9 | 1 id_1 | 1(5 rows)
Query 20201014_165011_00003_qmngg, FINISHED, 1 nodeSplits: 90 total, 90 done (100.00%)0:03 [10K rows, 214KB] [3.06K rows/s, 65.4KB/s]
Cost-Based Optimization (CBO) disabled
trino:test> explain select trino.id, count(trino2.id) from trino2, trino where trino2.bin1=trino.bin1 group by trino.id order by count(trino2.id) desc;
------------------------------------------------------------------------------------ Fragment 0 [SINGLE] Output layout: [id_1, count_5] Output partitioning: SINGLE [] Stage Execution Strategy: UNGROUPED_EXECUTION Output[id, _col1] │ Layout: [id_1:varchar, count_5:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?} │ id := id_1 │ _col1 := count_5 └─ RemoteMerge[1] Layout: [id_1:varchar, count_5:bigint]
Fragment 1 [ROUND_ROBIN] Output layout: [id_1, count_5] Output partitioning: SINGLE [] Stage Execution Strategy: UNGROUPED_EXECUTION LocalMerge[count_5 DESC_NULLS_LAST] │ Layout: [id_1:varchar, count_5:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?} └─ PartialSort[count_5 DESC_NULLS_LAST] │ Layout: [id_1:varchar, count_5:bigint] └─ RemoteSource[2] Layout: [id_1:varchar, count_5:bigint]
Fragment 2 [HASH] Output layout: [id_1, count_5] Output partitioning: ROUND_ROBIN [] Stage Execution Strategy: UNGROUPED_EXECUTION Project[] │ Layout: [id_1:varchar, count_5:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?} └─ Aggregate(FINAL)[id_1][$hashvalue] │ Layout: [id_1:varchar, $hashvalue:bigint, count_5:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?} │ count_5 := count("count_15") └─ LocalExchange[HASH][$hashvalue] ("id_1") │ Layout: [id_1:varchar, count_15:bigint, $hashvalue:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?} └─ RemoteSource[3] Layout: [id_1:varchar, count_15:bigint, $hashvalue_16:bigint]
Fragment 3 [HASH] Output layout: [id_1, count_15, $hashvalue_22] Output partitioning: HASH [id_1][$hashvalue_22] Stage Execution Strategy: UNGROUPED_EXECUTION Aggregate(PARTIAL)[id_1][$hashvalue_22] │ Layout: [id_1:varchar, $hashvalue_22:bigint, count_15:bigint] │ count_15 := count("id") └─ Project[] │ Layout: [id:varchar, id_1:varchar, $hashvalue_22:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?} │ $hashvalue_22 := combine_hash(bigint '0', COALESCE("$operator$hash_code" └─ InnerJoin[("bin1" = "bin1_0")][$hashvalue_17, $hashvalue_19] │ Layout: [id:varchar, id_1:varchar] │ Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?} │ Distribution: PARTITIONED │ dynamicFilterAssignments = {bin1_0 -> df_387} ├─ RemoteSource[4] │ Layout: [bin1:bigint, id:varchar, $hashvalue_17:bigint] └─ LocalExchange[HASH][$hashvalue_19] ("bin1_0") │ Layout: [bin1_0:bigint, id_1:varchar, $hashvalue_19:bigint] │ Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: ?} └─ RemoteSource[5] Layout: [bin1_0:bigint, id_1:varchar, $hashvalue_20:bigint]
Fragment 4 [SOURCE] Output layout: [bin1, id, $hashvalue_18] Output partitioning: HASH [bin1][$hashvalue_18] Stage Execution Strategy: UNGROUPED_EXECUTION ScanFilterProject[table = aerospike:aerospike:test:trino2, grouped = false, fi Layout: [bin1:bigint, id:varchar, $hashvalue_18:bigint] Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu $hashvalue_18 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("b bin1 := AerospikeColumnHandle{name=bin1, type=bigint, hidden=false} id := AerospikeColumnHandle{name=id, type=varchar, hidden=false}
Fragment 5 [SOURCE] Output layout: [bin1_0, id_1, $hashvalue_21] Output partitioning: HASH [bin1_0][$hashvalue_21] Stage Execution Strategy: UNGROUPED_EXECUTION ScanProject[table = aerospike:aerospike:test:trino, grouped = false] Layout: [bin1_0:bigint, id_1:varchar, $hashvalue_21:bigint] Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu $hashvalue_21 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("b bin1_0 := AerospikeColumnHandle{name=bin1, type=bigint, hidden=false} id_1 := AerospikeColumnHandle{name=id, type=varchar, hidden=false}
trino:test> select trino.id, count(trino2.id) from trino2, trino where trino2.bin1=trino.bin1 group by trino.id order by count(trino2.id) desc; id | _col1------+------- id_7 | 1 id_3 | 1 id_5 | 1 id_1 | 1 id_9 | 1(5 rows)
Query 20201014_165146_00000_jfivj, FINISHED, 1 nodeSplits: 106 total, 106 done (100.00%)0:04 [10K rows, 213KB] [2.27K rows/s, 48.5KB/s]