# Examples of querying Aerospike Database

The following examples illustrate querying Aerospike data using the Trino Command Line Interface (CLI).

## Simple select query

```plaintext
trino:test> select * from trino where bin1>9881 and bin1<9900;

 bin1 | __key |   id    |     list     | map  | int

------+-------+---------+--------------+------+------

 9893 | NULL  | id_9893 | NULL         | NULL | NULL

 9883 | NULL  | id_9883 | NULL         | NULL | NULL

 9899 | NULL  | id_9899 | NULL         | NULL | NULL

 9891 | NULL  | id_9891 | ["str_9891"] | NULL | NULL

 9885 | NULL  | id_9885 | ["str_9885"] | NULL | NULL

 9895 | NULL  | id_9895 | NULL         | NULL | NULL

 9887 | NULL  | id_9887 | NULL         | NULL | NULL

 9897 | NULL  | id_9897 | ["str_9897"] | NULL | NULL

 9889 | NULL  | id_9889 | NULL         | NULL | NULL

(9 rows)

Query 20201020_155406_00002_umepu, FINISHED, 1 node

Splits: 20 total, 20 done (100.00%)

0:01 [9 rows, 168B] [7 rows/s, 132B/s]
```

## Secondary index (sindex) query

In this example, secondary indexes were created on the two bins “case\_last\_week” and “probable\_dd” in the Aerospike database. The secondary indexes are named `case_last_week_idx` and `probable_dd_idx` respectively.

#### Step 1 (OPTIONAL)

This step lists all the sindexes that are available for your query.

Secondary indexes are listed per Aerospike namespace or Trino schema. You can either use a USE statement to select the schema or prepend the schema name to “\_\_sindex”. For example, `test.__sindex`, where `test` is the name of the schema.

You can optionally change the name from the default name “\_\_sindex” to the name of your choice using the `aerospike.index-table-name` property.

```plaintext
trino:test> select * from __sindex;

  column_name   |    sindex_name     | sindex_type | schema_name | table_name

----------------+--------------------+-------------+-------------+-------------

 case_last_week | case_last_week_idx | NUMERIC     | test        | covid_stats

 probable_dd    | probable_dd_idx    | NUMERIC     | test        | covid_stats

(2 rows)

Query 20220420_190120_00005_78a96, FINISHED, 1 node

Splits: 9 total, 9 done (100.00%)

0.21 [2 rows, 102B] [9 rows/s, 477B/s]
```

#### Step 2 (OPTIONAL)

In this step, provide a hint to the Trino connector about which sindex you plan to query. Use the `sindex_name` session property. Aerospike supports only one sindex per query. If you have more than one sindex, for the best performance, choose the one with the highest cardinality.

If you skip this step, the connector will choose one based on its lexical order in the list of sindexes. There is a possibility that this may lead to an inefficient query if a sindex with lower cardinality is ranked higher in the lexical order.

We recommend you provide a hint about the sindex wherever possible.

```plaintext
trino:test> set session aerospike.sindex_name='probable_dd_idx';

SET SESSION
```

#### Step 3

Although you can specify one or more sindex in your query, only one sindex is used per query. However, there can be multiple matching indexes for the query predicates. If your query includes multiple sindexes, the one which ranks higher in the lexical order is selected for the query. We recommend that you not skip the previous two steps if information about the sindex cardinality is available to you _a priori_.

```plaintext
trino:test> select state_ter, total_cases, case_last_week from covid_stats where probable_dd < 300;

           state_ter            | total_cases | case_last_week

--------------------------------+-------------+----------------

 North Dakota                   |       54305 |           8290

 Wisconsin                      |      281910 |          40487

 South Carolina                 |      185688 |           7665

 New Hampshire                  |       12488 |           1274

 Oregon                         |       50448 |           5019

 Vermont                        |        2392 |            213

 Hawaii                         |       15834 |            759

 Florida                        |      832525 |          35723

 Texas                          |      956234 |          51379

 Virgin Islands                 |        1405 |             27

 North Carolina                 |      293339 |          16647

 Oklahoma                       |      146692 |          16151

 Missouri                       |      209197 |          23662

 Rhode Island                   |       36380 |           3506
```

The sindex selected for the above query will be used for subsequent queries in this session unless you reset the property in the session or set another sindex within the same session. You can reset the sindex in the session using:

```plaintext
trino:test> reset session aerospike.sindex_name;

RESET SESSION
```

The following is another example that shows multiple matching sindexes for the same query predicate. Although there is a sindex available for the _Case\_last\_week_ column, the sindex for `probable_dd` will be used because that is what you provided as a hint.

```plaintext
trino:test> select state_ter, total_cases, case_last_week from covid_stats where probable_dd < 300 and case_last_week > 1000;

   state_ter    | total_cases | case_last_week

----------------+-------------+----------------

 Wyoming        |       17310 |           3587

 Pennsylvania   |      230894 |          20958

 Alaska         |       18716 |           3094

 Minnesota      |      180862 |          30190

 Tennessee      |      281851 |          20425

 Utah           |      132621 |          16111

 Montana        |       39679 |           6184
```

## Querying namespaces that do not include sets

Suppose that you want to run queries against a namespace that does not include a set. You can use the `__default` keyword, which specifies to query the null set. The schema is inferred.

Here is an example which returns all records that are in the namespace `test`:

```plaintext
trino:test> select * from __default
```

The configuration property `aerospike.default-set-name` lets you name the null set. For example, if you set the value to `set0`, you can run this statement against your namespace:

```plaintext
trino:test> select * from set0
```

## Using JSON functions for CDT access

```plaintext
trino:test> select json_extract(map, '$.k2') as k2 from trino where json_extract(map, '$.k2') is not null limit 2;

 k2

-----

 "2"

 "2"

(2 rows)

Query 20201020_155215_00001_umepu, FINISHED, 1 node

Splits: 37 total, 35 done (94.59%)

0:01 [8.31K rows, 177KB] [6.42K rows/s, 136KB/s]
```

## Insert query

If the list of column names is specified, they must exactly match the list of columns produced by the query. Each column in the table not present in the column list will be filled with a `null` value. Otherwise, if the list of columns is not specified, the columns produced by the query must exactly match the columns in the table being inserted into.

::: note
The connector only supports `READ UNCOMMITTED` transaction isolation level.
:::

```plaintext
trino:test> insert into trino (__key, id, int, bin1) values ('abcd', null, null, 11111);

INSERT: 1 row

Query 20201103_160726_00004_3pbwu, FINISHED, 1 node

Splits: 51 total, 51 done (100.00%)

0:00 [0 rows, 0B] [0 rows/s, 0B/s]
```

## Federated query

```plaintext
trino:test> select aerospike.test.trino.id, aerospike2.test.trino.id from aerospike.test.trino inner join aerospike2.test.trino on aerospike.test.trino.bin1=aerospike2.test.trino.bin1 where aerospike.test.trino.bin1<5;

  id  |  id

------+------

 id_1 | id_1

 id_3 | id_3

(2 rows)

Query 20201020_155814_00003_umepu, FINISHED, 1 node

Splits: 73 total, 73 done (100.00%)

0:03 [4 rows, 64B] [1 rows/s, 24B/s]
```

## Aggregate functions

The Trino connector supports Trino’s aggregate functions. For a full list, see [Aggregate functions](https://trino.io/docs/current/functions/aggregate.html).

The following example uses the `count()` aggregate function.

```plaintext
trino:test> select trino.id, count(trino2.id) from trino2, trino where trino2.bin1=trino.bin1 group by trino.id order by count(trino2.id) desc;

id | _col1

------+-------

id_7 | 1

id_3 | 1

id_5 | 1

id_1 | 1

id_9 | 1

(5 rows)

Query 20201014_165146_00000_jfivj, FINISHED, 1 node

Splits: 106 total, 106 done (100.00%)

0:04 [10K rows, 213KB] [2.27K rows/s, 48.5KB/s]
```

## Cost-Based Optimization (CBO)

### Cost-Based Optimization (CBO) enabled

The Trino connector supports the row-count statistic only.

```plaintext
trino:test> explain select trino.id, count(trino2.id) from trino2, trino where trino2.bin1=trino.bin1 group by trino.id order by count(trino2.id) desc;

------------------------------------------------------------------------------------

 Fragment 0 [SINGLE]

     Output layout: [id_1, count_5]

     Output partitioning: SINGLE []

     Stage Execution Strategy: UNGROUPED_EXECUTION

     Output[id, _col1]

     │   Layout: [id_1:varchar, count_5:bigint]

     │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}

     │   id := id_1

     │   _col1 := count_5

     └─ RemoteMerge[1]

            Layout: [id_1:varchar, count_5:bigint]

 Fragment 1 [ROUND_ROBIN]

     Output layout: [id_1, count_5]

     Output partitioning: SINGLE []

     Stage Execution Strategy: UNGROUPED_EXECUTION

     LocalMerge[count_5 DESC_NULLS_LAST]

     │   Layout: [id_1:varchar, count_5:bigint]

     │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}

     └─ PartialSort[count_5 DESC_NULLS_LAST]

        │   Layout: [id_1:varchar, count_5:bigint]

        └─ RemoteSource[2]

               Layout: [id_1:varchar, count_5:bigint]

 Fragment 2 [HASH]

     Output layout: [id_1, count_5]

     Output partitioning: ROUND_ROBIN []

     Stage Execution Strategy: UNGROUPED_EXECUTION

     Project[]

     │   Layout: [id_1:varchar, count_5:bigint]

     │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}

     └─ Aggregate(FINAL)[id_1][$hashvalue]

        │   Layout: [id_1:varchar, $hashvalue:bigint, count_5:bigint]

        │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}

        │   count_5 := count("count_15")

        └─ LocalExchange[HASH][$hashvalue] ("id_1")

           │   Layout: [id_1:varchar, count_15:bigint, $hashvalue:bigint]

           │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}

           └─ RemoteSource[3]

                  Layout: [id_1:varchar, count_15:bigint, $hashvalue_16:bigint]

 Fragment 3 [SOURCE]

     Output layout: [id_1, count_15, $hashvalue_21]

     Output partitioning: HASH [id_1][$hashvalue_21]

     Stage Execution Strategy: UNGROUPED_EXECUTION

     Aggregate(PARTIAL)[id_1][$hashvalue_21]

     │   Layout: [id_1:varchar, $hashvalue_21:bigint, count_15:bigint]

     │   count_15 := count("id")

     └─ Project[]

        │   Layout: [id_1:varchar, id:varchar, $hashvalue_21:bigint]

        │   Estimates: {rows: ? (?), cpu: ?, memory: 730B, network: 730B}

        │   $hashvalue_21 := combine_hash(bigint '0', COALESCE("$operator$hash_code"

        └─ InnerJoin[("bin1_0" = "bin1")][$hashvalue_17, $hashvalue_18]

           │   Layout: [id_1:varchar, id:varchar]

           │   Estimates: {rows: ? (?), cpu: ?, memory: 730B, network: 730B}

           │   Distribution: REPLICATED

           │   dynamicFilterAssignments = {bin1 -> df_382}

           ├─ ScanFilterProject[table = aerospike:aerospike:test:trino, grouped = f

           │      Layout: [bin1_0:bigint, id_1:varchar, $hashvalue_17:bigint]

           │      Estimates: {rows: 10000 (712.89kB), cpu: 625k, memory: 0B, network

           │      $hashvalue_17 := combine_hash(bigint '0', COALESCE("$operator$hash

           │      bin1_0 := AerospikeColumnHandle{name=bin1, type=bigint, hidden=fal

           │      id_1 := AerospikeColumnHandle{name=id, type=varchar, hidden=false}

           └─ LocalExchange[HASH][$hashvalue_18] ("bin1")

              │   Layout: [bin1:bigint, id:varchar, $hashvalue_18:bigint]

              │   Estimates: {rows: 10 (730B), cpu: 2.05k, memory: 0B, network: 730B

              └─ RemoteSource[4]

                     Layout: [bin1:bigint, id:varchar, $hashvalue_19:bigint]

 Fragment 4 [SOURCE]

     Output layout: [bin1, id, $hashvalue_20]

     Output partitioning: BROADCAST []

     Stage Execution Strategy: UNGROUPED_EXECUTION

     ScanProject[table = aerospike:aerospike:test:trino2, grouped = false]

         Layout: [bin1:bigint, id:varchar, $hashvalue_20:bigint]

         Estimates: {rows: 10 (730B), cpu: 640, memory: 0B, network: 0B}/{rows: 10 (

         $hashvalue_20 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("b

         bin1 := AerospikeColumnHandle{name=bin1, type=bigint, hidden=false}

         id := AerospikeColumnHandle{name=id, type=varchar, hidden=false}
```

```plaintext
trino:test> select trino.id, count(trino2.id) from trino2, trino where trino2.bin1=trino.bin1 group by trino.id order by count(trino2.id) desc;

  id  | _col1

------+-------

 id_7 |     1

 id_3 |     1

 id_5 |     1

 id_9 |     1

 id_1 |     1

(5 rows)

Query 20201014_165011_00003_qmngg, FINISHED, 1 node

Splits: 90 total, 90 done (100.00%)

0:03 [10K rows, 214KB] [3.06K rows/s, 65.4KB/s]
```

### Cost-Based Optimization (CBO) disabled

```plaintext
trino:test> explain select trino.id, count(trino2.id) from trino2, trino where trino2.bin1=trino.bin1 group by trino.id order by count(trino2.id) desc;

------------------------------------------------------------------------------------

 Fragment 0 [SINGLE]

     Output layout: [id_1, count_5]

     Output partitioning: SINGLE []

     Stage Execution Strategy: UNGROUPED_EXECUTION

     Output[id, _col1]

     │   Layout: [id_1:varchar, count_5:bigint]

     │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}

     │   id := id_1

     │   _col1 := count_5

     └─ RemoteMerge[1]

            Layout: [id_1:varchar, count_5:bigint]

 Fragment 1 [ROUND_ROBIN]

     Output layout: [id_1, count_5]

     Output partitioning: SINGLE []

     Stage Execution Strategy: UNGROUPED_EXECUTION

     LocalMerge[count_5 DESC_NULLS_LAST]

     │   Layout: [id_1:varchar, count_5:bigint]

     │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}

     └─ PartialSort[count_5 DESC_NULLS_LAST]

        │   Layout: [id_1:varchar, count_5:bigint]

        └─ RemoteSource[2]

               Layout: [id_1:varchar, count_5:bigint]

 Fragment 2 [HASH]

     Output layout: [id_1, count_5]

     Output partitioning: ROUND_ROBIN []

     Stage Execution Strategy: UNGROUPED_EXECUTION

     Project[]

     │   Layout: [id_1:varchar, count_5:bigint]

     │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}

     └─ Aggregate(FINAL)[id_1][$hashvalue]

        │   Layout: [id_1:varchar, $hashvalue:bigint, count_5:bigint]

        │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}

        │   count_5 := count("count_15")

        └─ LocalExchange[HASH][$hashvalue] ("id_1")

           │   Layout: [id_1:varchar, count_15:bigint, $hashvalue:bigint]

           │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}

           └─ RemoteSource[3]

                  Layout: [id_1:varchar, count_15:bigint, $hashvalue_16:bigint]

 Fragment 3 [HASH]

     Output layout: [id_1, count_15, $hashvalue_22]

     Output partitioning: HASH [id_1][$hashvalue_22]

     Stage Execution Strategy: UNGROUPED_EXECUTION

     Aggregate(PARTIAL)[id_1][$hashvalue_22]

     │   Layout: [id_1:varchar, $hashvalue_22:bigint, count_15:bigint]

     │   count_15 := count("id")

     └─ Project[]

        │   Layout: [id:varchar, id_1:varchar, $hashvalue_22:bigint]

        │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}

        │   $hashvalue_22 := combine_hash(bigint '0', COALESCE("$operator$hash_code"

        └─ InnerJoin[("bin1" = "bin1_0")][$hashvalue_17, $hashvalue_19]

           │   Layout: [id:varchar, id_1:varchar]

           │   Estimates: {rows: ? (?), cpu: ?, memory: ?, network: ?}

           │   Distribution: PARTITIONED

           │   dynamicFilterAssignments = {bin1_0 -> df_387}

           ├─ RemoteSource[4]

           │      Layout: [bin1:bigint, id:varchar, $hashvalue_17:bigint]

           └─ LocalExchange[HASH][$hashvalue_19] ("bin1_0")

              │   Layout: [bin1_0:bigint, id_1:varchar, $hashvalue_19:bigint]

              │   Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: ?}

              └─ RemoteSource[5]

                     Layout: [bin1_0:bigint, id_1:varchar, $hashvalue_20:bigint]

 Fragment 4 [SOURCE]

     Output layout: [bin1, id, $hashvalue_18]

     Output partitioning: HASH [bin1][$hashvalue_18]

     Stage Execution Strategy: UNGROUPED_EXECUTION

     ScanFilterProject[table = aerospike:aerospike:test:trino2, grouped = false, fi

         Layout: [bin1:bigint, id:varchar, $hashvalue_18:bigint]

         Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu

         $hashvalue_18 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("b

         bin1 := AerospikeColumnHandle{name=bin1, type=bigint, hidden=false}

         id := AerospikeColumnHandle{name=id, type=varchar, hidden=false}

 Fragment 5 [SOURCE]

     Output layout: [bin1_0, id_1, $hashvalue_21]

     Output partitioning: HASH [bin1_0][$hashvalue_21]

     Stage Execution Strategy: UNGROUPED_EXECUTION

     ScanProject[table = aerospike:aerospike:test:trino, grouped = false]

         Layout: [bin1_0:bigint, id_1:varchar, $hashvalue_21:bigint]

         Estimates: {rows: ? (?), cpu: ?, memory: 0B, network: 0B}/{rows: ? (?), cpu

         $hashvalue_21 := combine_hash(bigint '0', COALESCE("$operator$hash_code"("b

         bin1_0 := AerospikeColumnHandle{name=bin1, type=bigint, hidden=false}

         id_1 := AerospikeColumnHandle{name=id, type=varchar, hidden=false}
```

```plaintext
trino:test> select trino.id, count(trino2.id) from trino2, trino where trino2.bin1=trino.bin1 group by trino.id order by count(trino2.id) desc;

  id  | _col1

------+-------

 id_7 |     1

 id_3 |     1

 id_5 |     1

 id_1 |     1

 id_9 |     1

(5 rows)

Query 20201014_165146_00000_jfivj, FINISHED, 1 node

Splits: 106 total, 106 done (100.00%)

0:04 [10K rows, 213KB] [2.27K rows/s, 48.5KB/s]
```

::: note
With CBO turned on, the same query ran more quickly because 15% fewer splits were processed and 1 less fragment for a 10K record table was executed. Had the table been larger, the gain in performance would have been greater.
:::