# Examples of querying Aerospike Database

The following examples illustrate querying Aerospike data using the Trino Command Line Interface (CLI).

## Simple select query

``` plaintext
trino:test> select * from trino where bin1>9881 and bin1<9900;
 bin1 | __key |   id    |     list     | map  | int
------+-------+---------+--------------+------+------
 9893 | NULL  | id_9893 | NULL         | NULL | NULL
 9883 | NULL  | id_9883 | NULL         | NULL | NULL
 9899 | NULL  | id_9899 | NULL         | NULL | NULL
 9891 | NULL  | id_9891 | ["str_9891"] | NULL | NULL
 9885 | NULL  | id_9885 | ["str_9885"] | NULL | NULL
 9895 | NULL  | id_9895 | NULL         | NULL | NULL
 9887 | NULL  | id_9887 | NULL         | NULL | NULL
 9897 | NULL  | id_9897 | ["str_9897"] | NULL | NULL
 9889 | NULL  | id_9889 | NULL         | NULL | NULL
(9 rows)


Query 20201020_155406_00002_umepu, FINISHED, 1 node
Splits: 20 total, 20 done (100.00%)
0:01 [9 rows, 168B] [7 rows/s, 132B/s]
```

## Secondary index (sindex) query

In this example, secondary indexes were created on the two bins “case\_last\_week” and “probable\_dd” in the Aerospike database. The secondary indexes are named `case_last_week_idx` and `probable_dd_idx` respectively.

#### Step 1 (OPTIONAL)

This step lists all the sindexes that are available for your query.

Secondary indexes are listed per Aerospike namespace or Trino schema. You can either use a USE statement to select the schema or prepend the schema name to “\_\_sindex”. For example, `test.__sindex`, where `test` is the name of the schema.

You can optionally change the name from the default name “\_\_sindex” to the name of your choice using the `aerospike.index-table-name` property.

``` plaintext
trino:test> select * from __sindex;
  column_name   |    sindex_name     | sindex_type | schema_name | table_name
----------------+--------------------+-------------+-------------+-------------
 case_last_week | case_last_week_idx | NUMERIC     | test        | covid_stats
 probable_dd    | probable_dd_idx    | NUMERIC     | test        | covid_stats
(2 rows)


Query 20220420_190120_00005_78a96, FINISHED, 1 node
Splits: 9 total, 9 done (100.00%)
0.21 [2 rows, 102B] [9 rows/s, 477B/s]
```

#### Step 2 (OPTIONAL)

In this step, provide a hint to the Trino connector about which sindex you plan to query. Use the `sindex_name` session property. Aerospike supports only one sindex per query. If you have more than one sindex, for the best performance, choose the one with the highest cardinality.

If you skip this step, the connector will choose one based on its lexical order in the list of sindexes. There is a possibility that this may lead to an inefficient query if a sindex with lower cardinality is ranked higher in the lexical order.

We recommend you provide a hint about the sindex wherever possible.

``` plaintext
trino:test> set session aerospike.sindex_name='probable_dd_idx';
SET SESSION
```

#### Step 3

Although you can specify one or more sindex in your query, only one sindex is used per query. However, there can be multiple matching indexes for the query predicates. If your query includes multiple sindexes, the one which ranks higher in the lexical order is selected for the query. We recommend that you not skip the previous two steps if information about the sindex cardinality is available to you *a priori*.

``` plaintext
trino:test> select state_ter, total_cases, case_last_week from covid_stats where probable_dd < 300;
           state_ter            | total_cases | case_last_week
--------------------------------+-------------+----------------
 North Dakota                   |       54305 |           8290
 Wisconsin                      |      281910 |          40487
 South Carolina                 |      185688 |           7665
 New Hampshire                  |       12488 |           1274
 Oregon                         |       50448 |           5019
 Vermont                        |        2392 |            213
 Hawaii                         |       15834 |            759
 Florida                        |      832525 |          35723
 Texas                          |      956234 |          51379
 Virgin Islands                 |        1405 |             27
 North Carolina                 |      293339 |          16647
 Oklahoma                       |      146692 |          16151
 Missouri                       |      209197 |          23662
 Rhode Island                   |       36380 |           3506
```

The sindex selected for the above query will be used for subsequent queries in this session unless you reset the property in the session or set another sindex within the same session. You can reset the sindex in the session using:

``` plaintext
trino:test> reset session aerospike.sindex_name;
RESET SESSION
```

The following is another example that shows multiple matching sindexes for the same query predicate. Although there is a sindex available for the *Case\_last\_week* column, the sindex for `probable_dd` will be used because that is what you provided as a hint.

``` plaintext
trino:test> select state_ter, total_cases, case_last_week from covid_stats where probable_dd < 300 and case_last_week > 1000;
   state_ter    | total_cases | case_last_week
----------------+-------------+----------------
 Wyoming        |       17310 |           3587
 Pennsylvania   |      230894 |          20958
 Alaska         |       18716 |           3094
 Minnesota      |      180862 |          30190
 Tennessee      |      281851 |          20425
 Utah           |      132621 |          16111
 Montana        |       39679 |           6184
```

## Querying namespaces that do not include sets

Suppose that you want to run queries against a namespace that does not include a set. You can use the `__default` keyword, which specifies to query the null set. The schema is inferred.

Here is an example which returns all records that are in the namespace `test`:

``` plaintext
trino:test> select * from __default
```

The configuration property `aerospike.default-set-name` lets you name the null set. For example, if you set the value to `set0`, you can run this statement against your namespace:

``` plaintext
trino:test> select * from set0
```

## Using JSON functions for CDT access

``` plaintext
trino:test> select json_extract(map, '$.k2') as k2 from trino where json_extract(map, '$.k2') is not null limit 2;
 k2
-----
 "2"
 "2"
(2 rows)


Query 20201020_155215_00001_umepu, FINISHED, 1 node
Splits: 37 total, 35 done (94.59%)
0:01 [8.31K rows, 177KB] [6.42K rows/s, 136KB/s]
```

## Insert query

If the list of column names is specified, they must exactly match the list of columns produced by the query. Each column in the table not present in the column list will be filled with a `null` value. Otherwise, if the list of columns is not specified, the columns produced by the query must exactly match the columns in the table being inserted into.

::: note
The connector only supports `READ UNCOMMITTED` transaction isolation level.
:::

``` plaintext
trino:test> insert into trino (__key, id, int, bin1) values ('abcd', null, null, 11111);
INSERT: 1 row


Query 20201103_160726_00004_3pbwu, FINISHED, 1 node
Splits: 51 total, 51 done (100.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]
```

## Federated query

``` plaintext
trino:test> select aerospike.test.trino.id, aerospike2.test.trino.id from aerospike.test.trino inner join aerospike2.test.trino on aerospike.test.trino.bin1=aerospike2.test.trino.bin1 where aerospike.test.trino.bin1<5;
  id  |  id
------+------
 id_1 | id_1
 id_3 | id_3
(2 rows)


Query 20201020_155814_00003_umepu, FINISHED, 1 node
Splits: 73 total, 73 done (100.00%)
0:03 [4 rows, 64B] [1 rows/s, 24B/s]
```

## Aggregate functions

The Trino connector supports Trino’s aggregate functions. For a full list, see [Aggregate functions](https://trino.io/docs/current/functions/aggregate.html).

The following example uses the `count()` aggregate function.

``` plaintext
trino:test> select trino.id, count(trino2.id) from trino2, trino where trino2.bin1=trino.bin1 group by trino.id order by count(trino2.id) desc;
id | _col1
------+-------
id_7 | 1
id_3 | 1
id_5 | 1
id_1 | 1
id_9 | 1
(5 rows)


Query 20201014_165146_00000_jfivj, FINISHED, 1 node
Splits: 106 total, 106 done (100.00%)
0:04 [10K rows, 213KB] [2.27K rows/s, 48.5KB/s]
```

## Cost-Based Optimization (CBO)

### Cost-Based Optimization (CBO) enabled

The Trino connector supports the row-count statistic only.

``` plaintext
trino:test> explain select trino.id, count(trino2.id) from trino2, trino where trino2.bin1=trino.bin1 group by trino.id order by count(trino2.id) desc;


------------------------------------------------------------------------------------
 Fragment 0 [SINGLE]
     Output layout: [id_1, count_5]
     Output partitioning: SINGLE []
     Stage Execution Strategy: UNGROUPED_EXECUTION
     Output[id, _col1]
     │   Layout: [id_1:varchar, count_5:bigint]
     │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
     │   id := id_1
     │   _col1 := count_5
     └─ RemoteMerge[1]
            Layout: [id_1:varchar, count_5:bigint]


 Fragment 1 [ROUND_ROBIN]
     Output layout: [id_1, count_5]
     Output partitioning: SINGLE []
     Stage Execution Strategy: UNGROUPED_EXECUTION
     LocalMerge[count_5 DESC_NULLS_LAST]
     │   Layout: [id_1:varchar, count_5:bigint]
     │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
     └─ PartialSort[count_5 DESC_NULLS_LAST]
        │   Layout: [id_1:varchar, count_5:bigint]
        └─ RemoteSource[2]
               Layout: [id_1:varchar, count_5:bigint]


 Fragment 2 [HASH]
     Output layout: [id_1, count_5]
     Output partitioning: ROUND_ROBIN []
     Stage Execution Strategy: UNGROUPED_EXECUTION
     Project[]
     │   Layout: [id_1:varchar, count_5:bigint]
     │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
     └─ Aggregate(FINAL)[id_1][$hashvalue]
        │   Layout: [id_1:varchar, $hashvalue:bigint, count_5:bigint]
        │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
        │   count_5 := count("count_15")
        └─ LocalExchange[HASH][$hashvalue] ("id_1")
           │   Layout: [id_1:varchar, count_15:bigint, $hashvalue:bigint]
           │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
           └─ RemoteSource[3]
                  Layout: [id_1:varchar, count_15:bigint, $hashvalue_16:bigint]


 Fragment 3 [SOURCE]
     Output layout: [id_1, count_15, $hashvalue_21]
     Output partitioning: HASH [id_1][$hashvalue_21]
     Stage Execution Strategy: UNGROUPED_EXECUTION
     Aggregate(PARTIAL)[id_1][$hashvalue_21]
     │   Layout: [id_1:varchar, $hashvalue_21:bigint, count_15:bigint]
     │   count_15 := count("id")
     └─ Project[]
        │   Layout: [id_1:varchar, id:varchar, $hashvalue_21:bigint]
        │   Estimates: {rows: ? (?), cpu: ?, memory: 730B, network: 730B}
        │   $hashvalue_21 := combine_hash(bigint '0', COALESCE("$operator$hash_code"
        └─ InnerJoin[("bin1_0" = "bin1")][$hashvalue_17, $hashvalue_18]
           │   Layout: [id_1:varchar, id:varchar]
           │   Estimates: {rows: ? (?), cpu: ?, memory: 730B, network: 730B}
           │   Distribution: REPLICATED
           │   dynamicFilterAssignments = {bin1 -> df_382}
           ├─ ScanFilterProject[table = aerospike:aerospike:test:trino, grouped = f
           │      Layout: [bin1_0:bigint, id_1:varchar, $hashvalue_17:bigint]
           │      Estimates: {rows: 10000 (712.89kB), cpu: 625k, memory: 0B, network
           │      $hashvalue_17 := combine_hash(bigint '0', COALESCE("$operator$hash
           │      bin1_0 := AerospikeColumnHandle{name=bin1, type=bigint, hidden=fal
           │      id_1 := AerospikeColumnHandle{name=id, type=varchar, hidden=false}
           └─ LocalExchange[HASH][$hashvalue_18] ("bin1")
              │   Layout: [bin1:bigint, id:varchar, $hashvalue_18:bigint]
              │   Estimates: {rows: 10 (730B), cpu: 2.05k, memory: 0B, network: 730B
              └─ RemoteSource[4]
                     Layout: [bin1:bigint, id:varchar, $hashvalue_19:bigint]


 Fragment 4 [SOURCE]
     Output layout: [bin1, id, $hashvalue_20]
     Output partitioning: BROADCAST []
     Stage Execution Strategy: UNGROUPED_EXECUTION
     ScanProject[table = aerospike:aerospike:test:trino2, grouped = false]
         Layout: [bin1:bigint, id:varchar, $hashvalue_20:bigint]
         Estimates: {rows: 10 (730B), cpu: 640, memory: 0B, network: 0B}/{rows: 10 (
         $hashvalue_20 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("b
         bin1 := AerospikeColumnHandle{name=bin1, type=bigint, hidden=false}
         id := AerospikeColumnHandle{name=id, type=varchar, hidden=false}
```

``` plaintext
trino:test> select trino.id, count(trino2.id) from trino2, trino where trino2.bin1=trino.bin1 group by trino.id order by count(trino2.id) desc;
  id  | _col1
------+-------
 id_7 |     1
 id_3 |     1
 id_5 |     1
 id_9 |     1
 id_1 |     1
(5 rows)


Query 20201014_165011_00003_qmngg, FINISHED, 1 node
Splits: 90 total, 90 done (100.00%)
0:03 [10K rows, 214KB] [3.06K rows/s, 65.4KB/s]
```

### Cost-Based Optimization (CBO) disabled

``` plaintext
trino:test> explain select trino.id, count(trino2.id) from trino2, trino where trino2.bin1=trino.bin1 group by trino.id order by count(trino2.id) desc;


------------------------------------------------------------------------------------
 Fragment 0 [SINGLE]
     Output layout: [id_1, count_5]
     Output partitioning: SINGLE []
     Stage Execution Strategy: UNGROUPED_EXECUTION
     Output[id, _col1]
     │   Layout: [id_1:varchar, count_5:bigint]
     │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
     │   id := id_1
     │   _col1 := count_5
     └─ RemoteMerge[1]
            Layout: [id_1:varchar, count_5:bigint]


 Fragment 1 [ROUND_ROBIN]
     Output layout: [id_1, count_5]
     Output partitioning: SINGLE []
     Stage Execution Strategy: UNGROUPED_EXECUTION
     LocalMerge[count_5 DESC_NULLS_LAST]
     │   Layout: [id_1:varchar, count_5:bigint]
     │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
     └─ PartialSort[count_5 DESC_NULLS_LAST]
        │   Layout: [id_1:varchar, count_5:bigint]
        └─ RemoteSource[2]
               Layout: [id_1:varchar, count_5:bigint]


 Fragment 2 [HASH]
     Output layout: [id_1, count_5]
     Output partitioning: ROUND_ROBIN []
     Stage Execution Strategy: UNGROUPED_EXECUTION
     Project[]
     │   Layout: [id_1:varchar, count_5:bigint]
     │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
     └─ Aggregate(FINAL)[id_1][$hashvalue]
        │   Layout: [id_1:varchar, $hashvalue:bigint, count_5:bigint]
        │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
        │   count_5 := count("count_15")
        └─ LocalExchange[HASH][$hashvalue] ("id_1")
           │   Layout: [id_1:varchar, count_15:bigint, $hashvalue:bigint]
           │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
           └─ RemoteSource[3]
                  Layout: [id_1:varchar, count_15:bigint, $hashvalue_16:bigint]


 Fragment 3 [HASH]
     Output layout: [id_1, count_15, $hashvalue_22]
     Output partitioning: HASH [id_1][$hashvalue_22]
     Stage Execution Strategy: UNGROUPED_EXECUTION
     Aggregate(PARTIAL)[id_1][$hashvalue_22]
     │   Layout: [id_1:varchar, $hashvalue_22:bigint, count_15:bigint]
     │   count_15 := count("id")
     └─ Project[]
        │   Layout: [id:varchar, id_1:varchar, $hashvalue_22:bigint]
        │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
        │   $hashvalue_22 := combine_hash(bigint '0', COALESCE("$operator$hash_code"
        └─ InnerJoin[("bin1" = "bin1_0")][$hashvalue_17, $hashvalue_19]
           │   Layout: [id:varchar, id_1:varchar]
           │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}
           │   Distribution: PARTITIONED
           │   dynamicFilterAssignments = {bin1_0 -> df_387}
           ├─ RemoteSource[4]
           │      Layout: [bin1:bigint, id:varchar, $hashvalue_17:bigint]
           └─ LocalExchange[HASH][$hashvalue_19] ("bin1_0")
              │   Layout: [bin1_0:bigint, id_1:varchar, $hashvalue_19:bigint]
              │   Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: ?}
              └─ RemoteSource[5]
                     Layout: [bin1_0:bigint, id_1:varchar, $hashvalue_20:bigint]


 Fragment 4 [SOURCE]
     Output layout: [bin1, id, $hashvalue_18]
     Output partitioning: HASH [bin1][$hashvalue_18]
     Stage Execution Strategy: UNGROUPED_EXECUTION
     ScanFilterProject[table = aerospike:aerospike:test:trino2, grouped = false, fi
         Layout: [bin1:bigint, id:varchar, $hashvalue_18:bigint]
         Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu
         $hashvalue_18 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("b
         bin1 := AerospikeColumnHandle{name=bin1, type=bigint, hidden=false}
         id := AerospikeColumnHandle{name=id, type=varchar, hidden=false}


 Fragment 5 [SOURCE]
     Output layout: [bin1_0, id_1, $hashvalue_21]
     Output partitioning: HASH [bin1_0][$hashvalue_21]
     Stage Execution Strategy: UNGROUPED_EXECUTION
     ScanProject[table = aerospike:aerospike:test:trino, grouped = false]
         Layout: [bin1_0:bigint, id_1:varchar, $hashvalue_21:bigint]
         Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu
         $hashvalue_21 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("b
         bin1_0 := AerospikeColumnHandle{name=bin1, type=bigint, hidden=false}
         id_1 := AerospikeColumnHandle{name=id, type=varchar, hidden=false}
```

``` plaintext
trino:test> select trino.id, count(trino2.id) from trino2, trino where trino2.bin1=trino.bin1 group by trino.id order by count(trino2.id) desc;
  id  | _col1
------+-------
 id_7 |     1
 id_3 |     1
 id_5 |     1
 id_1 |     1
 id_9 |     1
(5 rows)


Query 20201014_165146_00000_jfivj, FINISHED, 1 node
Splits: 106 total, 106 done (100.00%)
0:04 [10K rows, 213KB] [2.27K rows/s, 48.5KB/s]
```

::: note
With CBO turned on, the same query ran more quickly because 15% fewer splits were processed and 1 less fragment for a 10K record table was executed. Had the table been larger, the gain in performance would have been greater.
:::