Supernodes
Overviewโ
This page describes what supernodes in the Aerospike Graph Service (AGS) and how to designate and manage them.
What is a supernode?โ
A supernode is a vertex with a disproportionately high number of incoming or
outgoing edges. The exact number of edges which make a vertex into a
supernode depends on the storage engine
configuration used by the Aerospike database associated with AGS, and
on the max-record-size
configuration option value.
Under the Hybrid Memory Model (the default for Aerospike database namespaces):
If
max-record-size
is set to 1MiB (the default), any vertex with approximately 6,500 or more edges is a supernode.If
max-record-size
is set to 128KiB, any vertex with approximately 800 or more edges is a supernode.
For in-memory namespaces:
- If
max-record-size
is set to 8MiB, any vertex with approximately 50,000 or more edges is a supernode.
Since supernodes are connected to so many other vertices in the graph, traversing supernodes may lead to performance problems due to their highly interconnected nature. The existence of supernodes and, more importantly, traversing over supernodes should be a conscious decision when modeling your data.
Designating supernodesโ
AGS makes a clear distinction between regular vertices and supernodes. Regular vertices maintain inline record edge lists (adjacency lists) to optimize database lookups and improve query performance. Supernodes are maintained in multi-record edge lists that allow for lazy composition of edges as necessary for traversal.
The ~supernode
flag is a virtual property that can be used to
denote that a vertex is or will become a supernode. Set this manually
when you know that a vertex will be a supernode, so that AGS doesn't
populate the record edge lists for a vertex which
can't optimally use it.
In addition to the ~supernode
flag being set manually, AGS automatically assigns
this flag to vertices that become supernodes through the addition of many edges.
Exampleโ
The following examples demonstrate how to set the ~supernode
flag on a newly created vertex. The vertex test
is internally
marked as a supernode and the record edge list is ignored.
Vertex v = g.addV("test").next();
g.V(v.id()).property("~supernode", true).iterate();
Considerationsโ
Once the
~supernode
flag is set, it cannot be unset. It remains in place for the duration of the life of the vertex.You cannot read the value of the
~supernode
flag. If you read it back, it returns nothing.The value you assign to the flag doesn't matter. Any value assigned to the
~supernode
flag is treated as true.
Filtering out supernodesโ
When composing queries, you can check for the ~supernode
property on reading
to filter them out. The property has no value, so you can only check for its existence.
g.V().hasNot("~supernode").outE().inV() // <- Correct usage
g.V().has("~supernode", false).outE().inV() // <- Incorrect
g.V().hasNot("~supernode", true).outE().inV() // <- Also incorrect
Traversing the edges of supernodesโ
To optimize query performance, include property filters when traversing supernode vertices. These filters reduce the query scope, minimizing data retrieval from the storage layer to AGS and improving performance.
Example:
g.V().hasLabel("potentialSupernodes").outE().has("propertyFoo", "valueFoo").inV()
AGS supports equality comparisons on strings and numbers (integers and longs):
- P.eq (=)
and the following comparison operators for numbers:
- P.gt (<)
- p.gte (<=)
- p.lt (>)
- p.lte (>=)
Limits on compound predicatesโ
Compound predicates such as within
,
between
, and and/or
degrade query performance.
Wherever possible, expand your .has()
step to use multiple single predicates
rather than compound predicates. Using multiple .has()
steps, each with
single predicates, results in better query performance than a single .has()
step
which uses compound predicates.
g.V().hasLabel("potentialSupernodes").outE().has("foo", P.within(1, 5)).inV() // Compound predicate - optimization won't apply
g.V().hasLabel("potentialSupernodes").outE().has("foo", P.gte(1)).has("foo", P.lt(5)).inV() // Equivalent single predicates - optimized
Log warning for supernode traversalsโ
An unoptimized query traversing over supernodes may result in drastically degraded performance because the query may retrieve a large number of outgoing/incoming vertices from the database.
To assist in diagnosing performance issues, AGS logs queries in which a traversal encounters one or more supernodes and identifies the ID of the first supernode.
Example log entry:
12:00:00.000 [main] WARN c.a.f.p.t.step.util.TraversalUtil - The traversal, ".V().hasLabel("potentialSupernode").outE()", walks over the Edges of an existing supernode in the Graph which may cause unexpected performance.
Consider adjusting the traversal to filter out supernode Vertices or adding filters to the Edges if required.
ID of first supernode Vertex encountered by this traversal: "supernode1"
The log does not record every occurrence of identical supernode traversals. After an initial log warning, the log records another warning for the same traversal after 10 additional different supernode traversals occur.
Supernode traversal warnings in the server log are enabled by default.
To turn off, set the property key aerospike.graph.log.supernode.warning
to false
.