Modeling relationships

For the complete documentation index see: llms.txt

All documentation pages available in markdown.

Aerospike has no foreign keys, no joins, and no multi-collection transactions that span arbitrary records. You express relationships through key design, collection data type (CDT) bins, and secondary indexes — choosing the pattern based on cardinality, who drives the read, and the combined size of the related data.

1:1 relationships

The default for a 1:1 relationship is the simplest: store both entities in the same record, as separate bins or as a nested CDT structure. Both entities share the same key, the same primary index entry, and the same lifecycle.

Use separate records when the two entities have different requirements:

Different TTLs. Record-level time to live applies to the entire record. If one entity expires after 30 days and the other is permanent, they cannot share a record and rely on automatic expiration.
Different access frequencies. If one entity is read on every request and the other is read rarely, co-locating them wastes I/O on the hot path.
Different storage tiers. The split-namespace technique (described in Key design) lets two records share the same logical key across namespaces with different storage policies.

When you split a 1:1 relationship into separate records, design the keys so that one is derivable from the other — for example, the same user key in two different sets or namespaces. This keeps the lookup a single get per entity, with no index required.

1:N relationships

A 1:N relationship has a parent and multiple children. The modeling pattern depends on three factors: how many children a parent has, who drives the read (parent or child), and how large each child is.

Pattern 1: parent-held list of child keys

Store a List bin on the parent record containing the keys of the child records. The application reads the parent, extracts the key list, and issues a batch read to fetch the children.

This works well when:

The child count is modest (tens to low hundreds).
The parent drives the read (“give me this order and its line items”).
Each child is a standalone record with its own lifecycle or access pattern.

The parent record grows with each new child key. Monitor the list size against the record sizing guidelines.

Pattern 2: consolidation into one record

Store all children as elements in a CDT bin on the parent record. The parent and all its children are a single record.

This is the default for high-cardinality, small-child relationships where the parent drives the read. A user profile with 200 segment tags, a post with its comments, or a product with its reviews — all fit this pattern when the combined size stays in the 1–128 KiB sweet spot.

The record count becomes O(parents) instead of O(parents + children), which reduces primary index cost dramatically. CDT operations (List, Map) let you read, update, and remove individual children without fetching the full record.

Use the crossover heuristic to decide: multiply the p99 child count by the average child size. If the result is under ~128 KiB, consolidation is the default.

child_count_p99 × avg_child_bytes ≈ 128 KiB → consolidate

Pattern 3: child-held reference with secondary index

Store a reference to the parent (or a shared attribute) in each child record, and create a secondary index on that bin. The application queries the index to find all children for a given parent.

This pattern is necessary when you need inverse access — “find all parents that contain this child” or “find all children belonging to this parent” when you start from the child side. A secondary index (SI) query scatters to all nodes, so it has higher latency than a key lookup. Use this pattern when the access path is not the dominant one, or when the child set is too large or too diverse to consolidate.

Mixed access

Many relationships require both a dominant read path and an inverse path. The patterns above are not mutually exclusive.

A social application that stores comments consolidated in a post record (pattern 2) also needs to delete all comments by a specific user when that user deletes their account. For the dominant path (display a post’s comments), consolidation is efficient. For the inverse path (find all posts a user commented on), you maintain a commenters bin or a separate reference set with a secondary index.

Design the dominant path first, then add the inverse infrastructure only if the application requires it.

N:M relationships

An N:M relationship connects many entities on both sides — users follow other users, products belong to multiple categories, students enroll in multiple courses.

The standard Aerospike pattern uses bidirectional key lists: each entity holds an ordered List bin containing the keys of the entities it is related to on the other side. Traversal in either direction is one record read plus a batch read of the referenced keys. There is no join table and no secondary index on the hot path.

Use an ordered List with ADD_UNIQUE and NO_FAIL write flags for each side. On an ordered List, ADD_UNIQUE uses binary search for deduplication (O(log n)), and NO_FAIL makes a duplicate insert succeed silently. See Collections for details.

For example, a follow relationship between users:

Record key: user:{userId}
Bin "following": ordered List of userIds this user follows
Bin "followers": ordered List of userIds who follow this user

“Does Alice follow Bob?” is a single list_get_by_value on Alice’s following bin. “Who does Alice follow?” reads Alice’s following bin and batch-reads the referenced user records. Both directions resolve without a secondary index.

Pagination

For large relationship lists, use list_get_by_index_range with an offset and count to return pages. The second argument to list_get_by_index_range is the count (number of elements to return), not an end index.

Write contention

Aerospike applies a record-level lock during writes. When many concurrent writers update the same record, the server returns KEY_BUSY (error code 14) for the writes that cannot acquire the lock.

Consolidation patterns are vulnerable to this when the consolidated record receives a high sustained write rate. The threshold depends on the operation and hardware, but as a guideline: if you expect more than roughly 50 concurrent writes per second sustained to a single record, evaluate contention before committing to consolidation.

Options for handling contention:

Hot-key sharding. Distribute the workload across sub-records, as described in Key design. Writers pick a shard; readers batch-read all shards and combine.
Shard-on-demand (described in the following section). Start with one record and split only when the record exceeds a size or contention threshold.
Separate records. If the children have independent lifecycles and the primary access pattern reads them individually, separate records may be the better default.

Companion record overflow and shard-on-demand

When a consolidated record grows past the sweet spot — whether from accumulating relationship edges, appended events, or a growing list of children — you distribute the overflow across sub-records while keeping the original record as the entry point.

Structure

The original record gains a subkeys bin that describes the sharding scheme. For example:

subkeys: ["hash-shards", 8]

This tells readers and writers that the data is distributed across 8 hash-sharded companion records. The sub-record keys follow a predictable pattern:

{originalKey}:{shardIndex}

Writers hash the child identifier to select a shard, then write to the corresponding sub-record. Readers batch-read all shards and merge the results.

When companion records use ordered Lists (for example, ordered Lists of reference IDs with ADD_UNIQUE), set persistIndex in the list policy. With persist, paginated reads with list_get_by_index_range are O(M) for M returned elements instead of O(N + M), and ADD_UNIQUE inserts remain O(log N) instead of requiring an O(N) index rebuild on each operation. This is particularly relevant for companion records that grow into the thousands of entries. Persist-index is supported for top-level Lists only. See Collections for details.

Shard types

Shard type	Key pattern	Best for
Hash	`{key}:{hash(childId) % S}`	Even distribution, full-set reads
Time-based (hourly)	`{key}:{YYYY-MM-DD-HH}`	Recent-window reads, time-bounded cleanup
Time-based (daily)	`{key}:{YYYY-MM-DD}`	Daily rollups, TTL-based expiry of old shards

Hash shards are the default when the application reads the full set of records. Time-based shards are preferred when the common read targets a recent window (“last 24 hours of followers”), because the application can batch-read only the relevant time buckets.

Activation

You do not need to start with sharding. Begin with a single consolidated record. Use a filter expression on the write path to check whether the subkeys bin exists:

If subkeys is absent, write directly to the record.
If subkeys is present, read its value and route the write to the appropriate sub-record.

This makes the non-sharded path zero-overhead. The application activates sharding for a specific record by writing the subkeys bin when the record reaches a size or contention threshold.

Sizing shards

For hash shards, choose the shard count so that each sub-record stays in the 1–128 KiB target range under expected growth. If each relationship entry is roughly 15 bytes and you expect up to 300,000 entries:

300,000 entries × 15 bytes = ~4.5 MiB total
4.5 MiB ÷ 128 KiB per shard ≈ 35 shards

For time-based shards, choose the time boundary so that the per-period entry count produces records in the target range. A daily shard for a user with 100 new followers per day at 15 bytes each is roughly 1.5 KiB per shard — well within range.

Choosing the right pattern

Factor	Pattern 1 (key list)	Pattern 2 (consolidation)	Pattern 3 (SI)	N:M (bidirectional lists)
Child count	Tens to low hundreds	Up to p99 × size ≤ ~128 KiB	Unbounded	Per side: ordered list
Who reads	Parent	Parent	Child or inverse	Both sides
Inverse access	Requires separate infrastructure	Requires separate infrastructure	Native	Native (one read + batch)
Write contention	Low (parent record only)	Risk at high concurrency	Low (children are separate)	Moderate (both sides update)
Index cost	One PI entry per child	One PI entry per parent	One PI entry per child + SI entries	One PI entry per entity