Indexing
Indexes can make graph database queries faster and more efficient. To create an index on a vertex property or label in Aerospike Graph, edit the configuration file you use to start the Aerospike Graph Service (AGS) Docker image.
Vertex property index creationโ
To create an index on a vertex property, add the configuration parameter
aerospike.graph.index.vertex.properties
to the file and assign it a
comma-separated list of vertex property keys to index. In the following example,
vertex properties property_key1
and property_key2
are specified for indexing:
aerospike.graph.index.vertex.properties=property_key1,property_key2
Vertex property indexes are taken as a union from all
AGS instances. This means that if one AGS instance has an index on vertex
property property_key1
and another has an index on vertex property
property_key2
, AGS creates indexes for both properties. If an index is created
on any AGS instance in a cluster, the other instances detect it and leverage it as
well.
When a vertex property index is first created on a dataset, the time it takes to create the index is proportional to the amount of data in the Aerospike database. Larger amounts of data take longer to index. You can create a property index either before or after populating the database with data, but before is faster.
Vertex property indexes have a value limit of 2k bytes. Any property values which are greater than 2k bytes cannot be indexed.
Vertex label index creationโ
To create indexes on all vertex labels, add the configuration parameter
aerospike.graph.index.vertex.label.enabled
to the configuration file and set it to true
.
aerospike.graph.index.vertex.label.enabled=true
If you create a label index on one AGS instance, all the other AGS instances in the cluster detect the change and leverage the same index.
Exampleโ
Consider an Aerospike Graph database with the following schema:
VERTICES:
label: "Person"
{
"name": "John Doe",
"age": 30,
"address": "123 Main St",
"city": "San Francisco",
"state": "CA",
"country": "USA",
"zip": "94105"
}
EDGES:
label: "knows"
{
}
To create an index on the name
and age
fields, as well as a vertex label index, add the following line to the
Aerospike Graph configuration file:
aerospike.graph.index.vertex.properties=name,age
aerospike.graph.index.vertex.label.enabled=true
Impact of indexes on traversalsโ
A vertex property index affects only the first step of a traversal. Subsequent steps are not affected. However, if a traversal's initial steps involve an indexed property and a non-indexed property, Graph reorders the steps automatically to perform the indexed property step first to obtain its benefit.
For maximum benefit, the best vertex properties to index are ones that a query can use to narrow the dataset down to one or very few vertices which the traversal can start from. Properties that tend to have distinct values and a low level of duplication throughout the dataset are best to index.
Example traversalsโ
The following traversals use the schema and indexes shown in the index example.
Single indexed vertex propertyโ
This traversal uses the index on the name
field:
______ The first step uses the index, so it is fast and efficient.
|
| _______________ Subsequent steps do not use
| | | | the index because they are not at the
| | | | start of the traversal.
v V v v
g.V().has("name", "Lyndon").out().in().has("name", "Simon").toList()
Single non-indexed vertex propertyโ
This traversal does not use an index and may perform badly.
______ This step does not use an index and must scan the entire database
| for the `country` property.
|
| __________ These steps do not use the index because they
| | | are not at the start of the traversal.
v V v
g.V().has("country", "USA").out().has("name", "Lyndon").toList()
One indexed and one unindexed vertex propertyโ
This traversal performs two has
steps, one on the unindexed country
field and one on the
indexed name
field. Graph compounds the two has
steps together and runs the indexed one
first, improving the traversal's performance.
g.V().has("country", "USA").has("name", "Lyndon").out().has("name", "Simon").toList()
Two indexed vertex propertiesโ
This traversal performs two initial has
steps, both on indexed properties.
AGS uses cardinality metadata from the Aerospike database to determine which
step to run first for maximum efficiency.
Cardinality metadata in Aerospike is updated once per hour, so index efficiency information may not always be current.
g.V().has("age", 29).has("name", "Lyndon").out().has("name", "Simon").toList()
Label index and indexed vertex propertyโ
This traversal's first two steps are a hasLabel
step which uses the
instance's label index, and a has
step which uses the name
property index.
AGS performs the has
step first, because property indexes usually have
higher cardinality than label indexes.
g.V().hasLabel("Person").has("name", "Lyndon").out().has("name", "Simon").toList()
Label index and unindexed vertex propertyโ
This traversal begins with a hasLabel
step which uses the
instance's label index, and a has
step which involves the unindexed country
property. AGS performs the hasLabel
step first and uses the index, but the
country
step may be slow and inefficient.
g.V().hasLabel("Person").has("country", "USA").out().has("name", "Simon").toList()