# Indexing

## Overview

This page describes how to create indexes. Indexes can make graph database queries faster and more efficient. You can create vertex property and label indexes either with the Gremlin `call` API or with [configuration options](https://aerospike.com/docs/graph/reference/config). Configuration options may be specified either with a [properties file](https://aerospike.com/docs/graph/2.5.0/install/docker#specify-ags-configurations-using-environment-variables) or with [command-line options](https://aerospike.com/docs/graph/2.5.0/install/docker#configuration-options).

## Impact of indexes on traversals

A vertex property index affects only the first step of a traversal. Subsequent steps are not affected. However, if a traversal’s initial steps involve both an indexed property and a non-indexed property, AGS reorders the steps automatically to perform the indexed property step first to obtain its benefit.

For maximum benefit, index the vertex properties that a query can use to narrow the dataset down to the fewest vertices possible and start the traversal there. Properties that tend to have distinct values and a low level of duplication throughout the dataset are best to index.

-   [Example traversals](#example-traversals)

## Manage indexes

Aerospike Graph Service (AGS) supports [secondary index](https://aerospike.com/docs/database/learn/architecture/data-storage/secondary-index) management using the Gremlin `call` API. You can create and drop secondary indexes, as well as get index status information.

### Create a secondary index

To create a secondary index on a vertex property, use the following command in Gremlin.

```plaintext
g.call("aerospike.graph.admin.index.create").

     with("element_type", "vertex").

     with("property_key", "<key>").next()
```

-   The `property_key` element must be the name of the property you want to index.
-   You can index any user-defined property or the `~label` field. The `~id` field is indexed automatically.

#### Create index examples

In the following example, a graph contains a user-defined vertex property called `name`. The following command creates a secondary index on the `name` property:

```plaintext
g.call("aerospike.graph.admin.index.create").

     with("element_type", "vertex").

     with("property_key", "name").next()
```

Expected output:

```txt
Vertex index creation of property key 'name' in progress.
```

The following example creates a secondary index on the vertex label:

```plaintext
g.call("aerospike.graph.admin.index.create").

     with("element_type", "vertex").

     with("property_key", "~label").next()
```

Index creation on a property key which already has an index returns an error.

### Drop a secondary index

To drop an existing secondary index, use the following command in the Gremlin console:

```plaintext
g.call("aerospike.graph.admin.index.drop").

     with("element_type", "vertex").

     with("property_key", "<key>").next()
```

-   The `property_key` element must be the name of the property with the index you want to drop.

::: note
When you drop an index, any query which would have used that index is briefly unavailable while AGS rebuilds its index list.
:::

#### Index drop examples

In the following example, a graph contains a user-defined vertex property called `name` with a secondary index. The following command drops the secondary index on the `name` property:

```plaintext
g.call("aerospike.graph.admin.index.drop").

     with("element_type", "vertex").

     with("property_key", "name").next()
```

Expected output:

```txt
Vertex index of property key 'name' dropped."
```

The following example drops a secondary index on the vertex label:

```plaintext
g.call("aerospike.graph.admin.index.drop").

     with("element_type", "vertex").

     with("property_key", "~label").next()
```

### Index status

To get the status of a secondary index on a vertex property, use the following command in the Gremlin console:

```plaintext
g.call("aerospike.graph.admin.index.status").

     with("element_type", "vertex").

     with("property_key", "<key>").next()
```

-   The `property_key` element must be the name of the indexed property to get the status of.

Expected output:

-   `percent_complete`: Percentage from 0-100 of the index to completion. Returns 100 when the index is complete and ready to use.
    
-   `total_entries`: Total number of entries in the index across all Aerospike nodes.
    
-   `total_used_bytes`: Total RAM usage in bytes of index across all Aerospike nodes.
    
-   `load_time`: Time in milliseconds to create index.
    

### List indexed property keys

To get a list of all property keys with existing secondary indexes, use the following command in the Gremlin console:

```plaintext
g.call("aerospike.graph.admin.index.list").next()
```

If successful, AGS returns a list of indexed property keys.

### Cardinality

In general, indexes with higher cardinality are more effective. To see examples of how indexes affect graph queries, see the [Impact of indexes on traversals](#impact-of-indexes-on-traversals) section of this page.

To get the cardinality of existing secondary indexes, use the following command at the Gremlin console:

```plaintext
g.call("aerospike.graph.admin.index.cardinality").next()
```

If successful, AGS returns a list of indexed property keys and the cardinality of each one. The cardinality of an index is the number of unique entries in that index.

## Index creation with configuration options

To create an index on a vertex property or label in AGS with configuration options, you can either:

-   Edit the [properties file](https://aerospike.com/docs/graph/2.5.0/install/docker#use-a-properties-file) you use to start the AGS Docker image.
-   Use the `-e` flag to specify configuration options as command-line arguments in the Docker command you use to start the AGS Docker image.

### Vertex property index creation

To create an index on a vertex property, add the configuration parameter `aerospike.graph.index.vertex.properties` to the file and assign it a comma-separated list of vertex property keys to index. In the following example, vertex properties `property_key1` and `property_key2` are specified for indexing:

Terminal window

```bash
aerospike.graph.index.vertex.properties=property_key1,property_key2
```

Vertex property indexes are taken as a union from all AGS instances. This means that if one AGS instance has an index on vertex property `property_key1` and another has an index on vertex property `property_key2`, AGS creates indexes for both properties. If an index is created on any AGS instance in a cluster, the other instances detect it and leverage it as well.

When a vertex property index is first created on a dataset, the time it takes to create the index is proportional to the amount of data in the Aerospike database. Larger amounts of data take longer to index. You can create a property index either before or after populating the database with data, but before is faster.

::: note
Vertex property indexes have a value limit of 2k bytes. Any property values which are greater than 2k bytes cannot be indexed.
:::

### Vertex label index creation

To create indexes on all vertex labels, add the configuration parameter `aerospike.graph.index.vertex.label.enabled` to the properties file and set it to `true`.

Terminal window

```bash
aerospike.graph.index.vertex.label.enabled=true
```

If you create a label index on one AGS instance, all the other AGS instances in the cluster detect the change and use the same index.

#### Vertex label index creation example

Consider an Aerospike Graph database with the following schema:

```txt
VERTICES:

label: "Person"

{

    "name": "John Doe",

    "age": 30,

    "address": "123 Main St",

    "city": "San Francisco",

    "state": "CA",

    "country": "USA",

    "zip": "94105"

}

EDGES:

label: "knows"

{

}
```

To create an index on the `name` and `age` fields, as well as a vertex label index, add the following line to the [properties file](https://aerospike.com/docs/graph/2.5.0/install/docker#use-a-properties-file):

```txt
aerospike.graph.index.vertex.properties=name,age

aerospike.graph.index.vertex.label.enabled=true
```

### Example traversals

The following traversals use the schema and indexes shown in the [index example](#create-index-examples).

#### Single indexed vertex property

This traversal uses the index on the `name` field:

```txt
______ The first step uses the index, so it is fast and efficient.

       |

       |                      _______________ Subsequent steps do not use

       |                      |    |     |    the index because they are not at the

       |                      |    |     |    start of the traversal.

       v                      V    v     v

g.V().has("name", "Lyndon").out().in().has("name", "Simon").toList()
```

#### Single non-indexed vertex property

This traversal does not use an index and may perform badly.

```txt
______ This step does not use an index and must scan the entire database

       |       for the `country` property.

       |

       |                      __________ These steps do not use the index because they

       |                     |     |     are not at the start of the traversal.

       v                     V     v

g.V().has("country", "USA").out().has("name", "Lyndon").toList()
```

#### One indexed and one unindexed vertex property

This traversal performs two `has` steps, one on the unindexed `country` field and one on the indexed `name` field. AGS compounds the two `has` steps together and runs the indexed one first, improving the traversal’s performance.

```txt
g.V().has("country", "USA").has("name", "Lyndon").out().has("name", "Simon").toList()
```

#### Two indexed vertex properties

This traversal performs two initial `has` steps, both on indexed properties. AGS uses cardinality metadata from the Aerospike database to determine which step to run first for maximum efficiency.

::: note
Cardinality metadata in Aerospike is updated once per hour, so index efficiency information may not always be current.
:::

```txt
g.V().has("age", 29).has("name", "Lyndon").out().has("name", "Simon").toList()
```

#### Label index and indexed vertex property

This traversal’s first two steps are a `hasLabel` step which uses the instance’s label index, and a `has` step which uses the `name` property index. AGS performs the `has` step first, because property indexes usually have higher cardinality than label indexes.

```txt
g.V().hasLabel("Person").has("name", "Lyndon").out().has("name", "Simon").toList()
```

#### Label index and unindexed vertex property

This traversal begins with a `hasLabel` step which uses the instance’s label index, and a `has` step which involves the unindexed `country` property. AGS performs the `hasLabel` step first and uses the index, but the `country` step may be slow and inefficient.

```txt
g.V().hasLabel("Person").has("country", "USA").out().has("name", "Simon").toList()
```