Modeling relationships
Aerospike has no foreign keys, no joins, and no multi-collection transactions that span arbitrary records. You express relationships through key design, collection data type (CDT) bins, and secondary indexes — choosing the pattern based on cardinality, who drives the read, and the combined size of the related data.
1:1 relationships
The default for a 1:1 relationship is the simplest: store both entities in the same record, as separate bins or as a nested CDT structure. Both entities share the same key, the same primary index entry, and the same lifecycle.
Use separate records when the two entities have different requirements:
- Different TTLs. Record-level time to live applies to the entire record. If one entity expires after 30 days and the other is permanent, they cannot share a record and rely on automatic expiration.
- Different access frequencies. If one entity is read on every request and the other is read rarely, co-locating them wastes I/O on the hot path.
- Different storage tiers. The split-namespace technique (described in Key design) lets two records share the same logical key across namespaces with different storage policies.
When you split a 1:1 relationship into separate records, design the keys
so that one is derivable from the other — for example, the same user key
in two different sets or namespaces. This keeps the lookup a single get
per entity, with no index required.
1:N relationships
A 1:N relationship has a parent and multiple children. The modeling pattern depends on three factors: how many children a parent has, who drives the read (parent or child), and how large each child is.
Pattern 1: parent-held list of child keys
Store a List bin on the parent record containing the keys of the child records. The application reads the parent, extracts the key list, and issues a batch read to fetch the children.
This works well when:
- The child count is modest (tens to low hundreds).
- The parent drives the read (“give me this order and its line items”).
- Each child is a standalone record with its own lifecycle or access pattern.
The parent record grows with each new child key. Monitor the list size against the record sizing guidelines.
Pattern 2: consolidation into one record
Store all children as elements in a CDT bin on the parent record. The parent and all its children are a single record.
This is the default for high-cardinality, small-child relationships where the parent drives the read. A user profile with 200 segment tags, a post with its comments, or a product with its reviews — all fit this pattern when the combined size stays in the 1–128 KiB sweet spot.
The record count becomes O(parents) instead of O(parents + children), which reduces primary index cost dramatically. CDT operations (List, Map) let you read, update, and remove individual children without fetching the full record.
Use the crossover heuristic to decide: multiply the p99 child count by the average child size. If the result is under ~128 KiB, consolidation is the default.
child_count_p99 × avg_child_bytes ≈ 128 KiB → consolidatePattern 3: child-held reference with secondary index
Store a reference to the parent (or a shared attribute) in each child record, and create a secondary index on that bin. The application queries the index to find all children for a given parent.
This pattern is necessary when you need inverse access — “find all parents that contain this child” or “find all children belonging to this parent” when you start from the child side. A secondary index (SI) query scatters to all nodes, so it has higher latency than a key lookup. Use this pattern when the access path is not the dominant one, or when the child set is too large or too diverse to consolidate.
Mixed access
Many relationships require both a dominant read path and an inverse path. The patterns above are not mutually exclusive.
A social application that stores comments consolidated in a post record
(pattern 2) also needs to delete all comments by a specific user when
that user deletes their account. For the dominant path (display a
post’s comments), consolidation is efficient. For the inverse path
(find all posts a user commented on), you maintain a commenters bin
or a separate reference set with a secondary index.
Design the dominant path first, then add the inverse infrastructure only if the application requires it.
N:M relationships
An N:M relationship connects many entities on both sides — users follow other users, products belong to multiple categories, students enroll in multiple courses.
The standard Aerospike pattern uses bidirectional key lists: each entity holds an ordered List bin containing the keys of the entities it is related to on the other side. Traversal in either direction is one record read plus a batch read of the referenced keys. There is no join table and no secondary index on the hot path.
Use an ordered List with ADD_UNIQUE and NO_FAIL write flags for each
side. On an ordered List, ADD_UNIQUE uses binary search for deduplication
(O(log n)), and NO_FAIL makes a duplicate insert succeed silently. See
Collections for details.
For example, a follow relationship between users:
Record key: user:{userId}Bin "following": ordered List of userIds this user followsBin "followers": ordered List of userIds who follow this user“Does Alice follow Bob?” is a single list_get_by_value on Alice’s
following bin. “Who does Alice follow?” reads Alice’s following bin
and batch-reads the referenced user records. Both directions resolve
without a secondary index.
Pagination
For large relationship lists, use list_get_by_index_range with an
offset and count to return pages. The second argument to
list_get_by_index_range is the count (number of elements to return),
not an end index.
Write contention
Aerospike applies a record-level lock during writes. When many concurrent
writers update the same record, the server returns KEY_BUSY (error
code 14) for the writes that cannot acquire the lock.
Consolidation patterns are vulnerable to this when the consolidated record receives a high sustained write rate. The threshold depends on the operation and hardware, but as a guideline: if you expect more than roughly 50 concurrent writes per second sustained to a single record, evaluate contention before committing to consolidation.
Options for handling contention:
- Hot-key sharding. Distribute the workload across sub-records, as described in Key design. Writers pick a shard; readers batch-read all shards and combine.
- Shard-on-demand (described in the following section). Start with one record and split only when the record exceeds a size or contention threshold.
- Separate records. If the children have independent lifecycles and the primary access pattern reads them individually, separate records may be the better default.
Companion record overflow and shard-on-demand
When a consolidated record grows past the sweet spot — whether from accumulating relationship edges, appended events, or a growing list of children — you distribute the overflow across sub-records while keeping the original record as the entry point.
Structure
The original record gains a subkeys bin that describes the sharding
scheme. For example:
subkeys: ["hash-shards", 8]This tells readers and writers that the data is distributed across 8 hash-sharded companion records. The sub-record keys follow a predictable pattern:
{originalKey}:{shardIndex}Writers hash the child identifier to select a shard, then write to the corresponding sub-record. Readers batch-read all shards and merge the results.
When companion records use ordered Lists (for example, ordered Lists of
reference IDs with ADD_UNIQUE), set persistIndex in the list policy.
With persist, paginated reads with list_get_by_index_range are O(M)
for M returned elements instead of O(N + M), and ADD_UNIQUE inserts
remain O(log N) instead of requiring an O(N) index rebuild on each
operation. This is particularly relevant for companion records that grow
into the thousands of entries. Persist-index is supported for top-level
Lists only. See
Collections
for details.
Shard types
| Shard type | Key pattern | Best for |
|---|---|---|
| Hash | {key}:{hash(childId) % S} | Even distribution, full-set reads |
| Time-based (hourly) | {key}:{YYYY-MM-DD-HH} | Recent-window reads, time-bounded cleanup |
| Time-based (daily) | {key}:{YYYY-MM-DD} | Daily rollups, TTL-based expiry of old shards |
Hash shards are the default when the application reads the full set of records. Time-based shards are preferred when the common read targets a recent window (“last 24 hours of followers”), because the application can batch-read only the relevant time buckets.
Activation
You do not need to start with sharding. Begin with a single consolidated
record. Use a filter expression on the write path to check whether the
subkeys bin exists:
- If
subkeysis absent, write directly to the record. - If
subkeysis present, read its value and route the write to the appropriate sub-record.
This makes the non-sharded path zero-overhead. The application activates
sharding for a specific record by writing the subkeys bin when the
record reaches a size or contention threshold.
Sizing shards
For hash shards, choose the shard count so that each sub-record stays in the 1–128 KiB target range under expected growth. If each relationship entry is roughly 15 bytes and you expect up to 300,000 entries:
300,000 entries × 15 bytes = ~4.5 MiB total4.5 MiB ÷ 128 KiB per shard ≈ 35 shardsFor time-based shards, choose the time boundary so that the per-period entry count produces records in the target range. A daily shard for a user with 100 new followers per day at 15 bytes each is roughly 1.5 KiB per shard — well within range.
Choosing the right pattern
| Factor | Pattern 1 (key list) | Pattern 2 (consolidation) | Pattern 3 (SI) | N:M (bidirectional lists) |
|---|---|---|---|---|
| Child count | Tens to low hundreds | Up to p99 × size ≤ ~128 KiB | Unbounded | Per side: ordered list |
| Who reads | Parent | Parent | Child or inverse | Both sides |
| Inverse access | Requires separate infrastructure | Requires separate infrastructure | Native | Native (one read + batch) |
| Write contention | Low (parent record only) | Risk at high concurrency | Low (children are separate) | Moderate (both sides update) |
| Index cost | One PI entry per child | One PI entry per parent | One PI entry per child + SI entries | One PI entry per entity |