Indexing
Overviewโ
This page describes how to create indexes.
Indexes can make graph database queries faster and more efficient.
You can create vertex property and label indexes either with
the the Gremlin call
API or with
configuration options.
Configuration options may be specified either with a
properties file
or with command-line options.
Manage indexesโ
Aerospike Graph Service (AGS) supports secondary index
management using the Gremlin call
API. You can create and drop secondary indexes,
as well as get index status information.
Create a secondary indexโ
To create a secondary index on a vertex property, use the following command in Gremlin.
g.call("aerospike.graph.admin.index.create").
with("element_type", "vertex").
with("property_key", "<key>").next()
- The
property_key
element must be the name of the property you want to index. - You can index any user-defined property or the
~label
field. The~id
field is indexed automatically.
Create index examplesโ
- In the following example, a graph contains a user-defined vertex property called
name
. The following command creates a secondary index on thename
property:
g.call("aerospike.graph.admin.index.create").
with("element_type", "vertex").
with("property_key", "name").next()
Expected output:
Vertex index creation of property key 'name' in progress.
- The following example creates a secondary index on the vertex label:
g.call("aerospike.graph.admin.index.create").
with("element_type", "vertex").
with("property_key", "~label").next()
Index creation on a property key which already has an index returns an error.
Drop a secondary indexโ
To drop an existing secondary index, use the following command in the Gremlin console:
g.call("aerospike.graph.admin.index.drop").
with("element_type", "vertex").
with("property_key", "<key>").next()
- The
property_key
element must be the name of the property with the index you want to drop.
When you drop an index, any query which would have used that index is briefly unavailable while AGS rebuilds its index list.
Index drop examplesโ
- In the following example, a graph contains a user-defined vertex property called
name
with a secondary index. The following command drops the secondary index on thename
property:
g.call("aerospike.graph.admin.index.drop").
with("element_type", "vertex").
with("property_key", "name").next()
Expected output:
Vertex index of property key 'name' dropped."
- The following example drops a secondary index on the vertex label:
g.call("aerospike.graph.admin.index.drop").
with("element_type", "vertex").
with("property_key", "~label").next()
Index statusโ
To get the status of a secondary index on a vertex property, use the following command in the Gremlin console:
g.call("aerospike.graph.admin.index.status").
with("element_type", "vertex").
with("property_key", "<key>").next()
- The
property_key
element must be the name of the indexed property to get the status of.
Expected output:
percent_complete
: Percentage from 0-100 of the index to completion. Returns 100 when the index is complete and ready to use.total_entries
: Total number of entries in the index across all Aerospike nodes.total_used_bytes
: Total RAM usage in bytes of index across all Aerospike nodes.load_time
: Time in milliseconds to create index.
List indexed property keysโ
To get a list of all property keys with existing secondary indexes, use the following command in the Gremlin console:
g.call("aerospike.graph.admin.index.list").next()
If successful, AGS returns a list of indexed property keys.
Cardinalityโ
In general, indexes with higher cardinality are more effective. To see examples of how indexes affect graph queries, see the Impact of indexes on traversals section of this page.
To get the cardinality of existing secondary indexes, use the following command at the Gremlin console:
g.call("aerospike.graph.admin.index.cardinality").next()
If successful, AGS returns a list of indexed property keys and the cardinality of each one. The cardinality of an index is the number of unique entries in that index.
Index creation with configuration optionsโ
To create an index on a vertex property or label in AGS with configuration options, you can either:
- Edit the properties file you use to start the AGS Docker image.
- Use the
-e
flag to specify configuration options as command-line arguments in the Docker command you use to start the AGS Docker image.
Vertex property index creationโ
To create an index on a vertex property, add the configuration parameter
aerospike.graph.index.vertex.properties
to the file and assign it a
comma-separated list of vertex property keys to index. In the following example,
vertex properties property_key1
and property_key2
are specified for indexing:
aerospike.graph.index.vertex.properties=property_key1,property_key2
Vertex property indexes are taken as a union from all
AGS instances. This means that if one AGS instance has an index on vertex
property property_key1
and another has an index on vertex property
property_key2
, AGS creates indexes for both properties. If an index is created
on any AGS instance in a cluster, the other instances detect it and leverage it as
well.
When a vertex property index is first created on a dataset, the time it takes to create the index is proportional to the amount of data in the Aerospike database. Larger amounts of data take longer to index. You can create a property index either before or after populating the database with data, but before is faster.
Vertex property indexes have a value limit of 2k bytes. Any property values which are greater than 2k bytes cannot be indexed.
Vertex label index creationโ
To create indexes on all vertex labels, add the configuration parameter
aerospike.graph.index.vertex.label.enabled
to the properties file and set it to true
.
aerospike.graph.index.vertex.label.enabled=true
If you create a label index on one AGS instance, all the other AGS instances in the cluster detect the change and use the same index.
Vertex label index creation exampleโ
Consider an Aerospike Graph database with the following schema:
VERTICES:
label: "Person"
{
"name": "John Doe",
"age": 30,
"address": "123 Main St",
"city": "San Francisco",
"state": "CA",
"country": "USA",
"zip": "94105"
}
EDGES:
label: "knows"
{
}
To create an index on the name
and age
fields, as well as a vertex label index, add the following line to the properties file:
aerospike.graph.index.vertex.properties=name,age
aerospike.graph.index.vertex.label.enabled=true
Impact of indexes on traversalsโ
A vertex property index affects only the first step of a traversal. Subsequent steps are not affected. However, if a traversal's initial steps involve an indexed property and a non-indexed property, AGS reorders the steps automatically to perform the indexed property step first to obtain its benefit.
For maximum benefit, the best vertex properties to index are ones that a query can use to narrow the dataset down to one or very few vertices which the traversal can start from. Properties that tend to have distinct values and a low level of duplication throughout the dataset are best to index.
Example traversalsโ
The following traversals use the schema and indexes shown in the index example.
Single indexed vertex propertyโ
This traversal uses the index on the name
field:
______ The first step uses the index, so it is fast and efficient.
|
| _______________ Subsequent steps do not use
| | | | the index because they are not at the
| | | | start of the traversal.
v V v v
g.V().has("name", "Lyndon").out().in().has("name", "Simon").toList()
Single non-indexed vertex propertyโ
This traversal does not use an index and may perform badly.
______ This step does not use an index and must scan the entire database
| for the `country` property.
|
| __________ These steps do not use the index because they
| | | are not at the start of the traversal.
v V v
g.V().has("country", "USA").out().has("name", "Lyndon").toList()
One indexed and one unindexed vertex propertyโ
This traversal performs two has
steps, one on the unindexed country
field and one on the
indexed name
field. AGS compounds the two has
steps together and runs the indexed one
first, improving the traversal's performance.
g.V().has("country", "USA").has("name", "Lyndon").out().has("name", "Simon").toList()
Two indexed vertex propertiesโ
This traversal performs two initial has
steps, both on indexed properties.
AGS uses cardinality metadata from the Aerospike database to determine which
step to run first for maximum efficiency.
Cardinality metadata in Aerospike is updated once per hour, so index efficiency information may not always be current.
g.V().has("age", 29).has("name", "Lyndon").out().has("name", "Simon").toList()
Label index and indexed vertex propertyโ
This traversal's first two steps are a hasLabel
step which uses the
instance's label index, and a has
step which uses the name
property index.
AGS performs the has
step first, because property indexes usually have
higher cardinality than label indexes.
g.V().hasLabel("Person").has("name", "Lyndon").out().has("name", "Simon").toList()
Label index and unindexed vertex propertyโ
This traversal begins with a hasLabel
step which uses the
instance's label index, and a has
step which involves the unindexed country
property. AGS performs the hasLabel
step first and uses the index, but the
country
step may be slow and inefficient.
g.V().hasLabel("Person").has("country", "USA").out().has("name", "Simon").toList()