# Modeling relationships

Aerospike has no foreign keys, no joins, and no multi-collection transactions that span arbitrary records. You express relationships through key design, collection data type (CDT) bins, and [secondary indexes](https://aerospike.com/docs/database/learn/architecture/data-storage/secondary-index) — choosing the pattern based on cardinality, who drives the read, and the combined size of the related data.

## 1:1 relationships

The default for a 1:1 relationship is the simplest: store both entities in the same record, as separate bins or as a nested CDT structure. Both entities share the same key, the same [primary index](https://aerospike.com/docs/database/learn/architecture/data-storage/primary-index) entry, and the same lifecycle.

Use separate records when the two entities have different requirements:

-   **Different TTLs.** Record-level [time to live](https://aerospike.com/docs/develop/data-modeling/record-sizing) applies to the entire record. If one entity expires after 30 days and the other is permanent, they cannot share a record and rely on automatic expiration.
-   **Different access frequencies.** If one entity is read on every request and the other is read rarely, co-locating them wastes I/O on the hot path.
-   **Different storage tiers.** The split-namespace technique (described in [Key design](https://aerospike.com/docs/develop/data-modeling/key-design#split-namespace-technique)) lets two records share the same logical key across namespaces with different storage policies.

When you split a 1:1 relationship into separate records, design the keys so that one is derivable from the other — for example, the same user key in two different sets or namespaces. This keeps the lookup a single `get` per entity, with no index required.

## 1:N relationships

A 1:N relationship has a parent and multiple children. The modeling pattern depends on three factors: how many children a parent has, who drives the read (parent or child), and how large each child is.

### Pattern 1: parent-held list of child keys

Store a List bin on the parent record containing the keys of the child records. The application reads the parent, extracts the key list, and issues a [batch read](https://aerospike.com/docs/develop/learn/batch/) to fetch the children.

This works well when:

-   The child count is modest (tens to low hundreds).
-   The parent drives the read (“give me this order and its line items”).
-   Each child is a standalone record with its own lifecycle or access pattern.

The parent record grows with each new child key. Monitor the list size against the [record sizing](https://aerospike.com/docs/develop/data-modeling/record-sizing) guidelines.

### Pattern 2: consolidation into one record

Store all children as elements in a CDT bin on the parent record. The parent and all its children are a single record.

This is the default for high-cardinality, small-child relationships where the parent drives the read. A user profile with 200 segment tags, a post with its comments, or a product with its reviews — all fit this pattern when the combined size stays in the 1–128 KiB sweet spot.

The record count becomes O(parents) instead of O(parents + children), which reduces primary index cost dramatically. CDT operations ([List](https://aerospike.com/docs/develop/data-types/collections/list/), [Map](https://aerospike.com/docs/develop/data-types/collections/map/)) let you read, update, and remove individual children without fetching the full record.

Use the crossover heuristic to decide: multiply the p99 child count by the average child size. If the result is under ~128 KiB, consolidation is the default.

```plaintext
child_count_p99 × avg_child_bytes ≈ 128 KiB → consolidate
```

### Pattern 3: child-held reference with secondary index

Store a reference to the parent (or a shared attribute) in each child record, and create a [secondary index](https://aerospike.com/docs/database/learn/architecture/data-storage/secondary-index) on that bin. The application queries the index to find all children for a given parent.

This pattern is necessary when you need inverse access — “find all parents that contain this child” or “find all children belonging to this parent” when you start from the child side. A secondary index (SI) query scatters to all nodes, so it has higher latency than a key lookup. Use this pattern when the access path is not the dominant one, or when the child set is too large or too diverse to consolidate.

### Mixed access

Many relationships require both a dominant read path and an inverse path. The patterns above are not mutually exclusive.

A social application that stores comments consolidated in a post record (pattern 2) also needs to delete all comments by a specific user when that user deletes their account. For the dominant path (display a post’s comments), consolidation is efficient. For the inverse path (find all posts a user commented on), you maintain a `commenters` bin or a separate reference set with a secondary index.

Design the dominant path first, then add the inverse infrastructure only if the application requires it.

## N:M relationships

An N:M relationship connects many entities on both sides — users follow other users, products belong to multiple categories, students enroll in multiple courses.

The standard Aerospike pattern uses bidirectional key lists: each entity holds an ordered List bin containing the keys of the entities it is related to on the other side. Traversal in either direction is one record read plus a batch read of the referenced keys. There is no join table and no secondary index on the hot path.

Use an ordered List with `ADD_UNIQUE` and `NO_FAIL` write flags for each side. On an ordered List, `ADD_UNIQUE` uses binary search for deduplication (O(log n)), and `NO_FAIL` makes a duplicate insert succeed silently. See [Collections](https://aerospike.com/docs/develop/data-modeling/collections#ordered-lists-with-add_unique) for details.

For example, a follow relationship between users:

```plaintext
Record key: user:{userId}

Bin "following": ordered List of userIds this user follows

Bin "followers": ordered List of userIds who follow this user
```

“Does Alice follow Bob?” is a single `list_get_by_value` on Alice’s `following` bin. “Who does Alice follow?” reads Alice’s `following` bin and batch-reads the referenced user records. Both directions resolve without a secondary index.

### Pagination

For large relationship lists, use `list_get_by_index_range` with an offset and count to return pages. The second argument to `list_get_by_index_range` is the count (number of elements to return), not an end index.

## Write contention

Aerospike applies a record-level lock during writes. When many concurrent writers update the same record, the server returns `KEY_BUSY` (error code 14) for the writes that cannot acquire the lock.

Consolidation patterns are vulnerable to this when the consolidated record receives a high sustained write rate. The threshold depends on the operation and hardware, but as a guideline: if you expect more than roughly 50 concurrent writes per second sustained to a single record, evaluate contention before committing to consolidation.

Options for handling contention:

-   **Hot-key sharding.** Distribute the workload across sub-records, as described in [Key design](https://aerospike.com/docs/develop/data-modeling/key-design#hot-key-sharding). Writers pick a shard; readers batch-read all shards and combine.
-   **Shard-on-demand** (described in the following section). Start with one record and split only when the record exceeds a size or contention threshold.
-   **Separate records.** If the children have independent lifecycles and the primary access pattern reads them individually, separate records may be the better default.

## Companion record overflow and shard-on-demand

When a consolidated record grows past the sweet spot — whether from accumulating relationship edges, appended events, or a growing list of children — you distribute the overflow across sub-records while keeping the original record as the entry point.

### Structure

The original record gains a `subkeys` bin that describes the sharding scheme. For example:

```plaintext
subkeys: ["hash-shards", 8]
```

This tells readers and writers that the data is distributed across 8 hash-sharded companion records. The sub-record keys follow a predictable pattern:

```plaintext
{originalKey}:{shardIndex}
```

Writers hash the child identifier to select a shard, then write to the corresponding sub-record. Readers batch-read all shards and merge the results.

When companion records use ordered Lists (for example, ordered Lists of reference IDs with `ADD_UNIQUE`), set `persistIndex` in the list policy. With persist, paginated reads with `list_get_by_index_range` are O(M) for M returned elements instead of O(N + M), and `ADD_UNIQUE` inserts remain O(log N) instead of requiring an O(N) index rebuild on each operation. This is particularly relevant for companion records that grow into the thousands of entries. Persist-index is supported for top-level Lists only. See [Collections](https://aerospike.com/docs/develop/data-modeling/collections#ordered-lists-with-add_unique) for details.

### Shard types

| Shard type | Key pattern | Best for |
| --- | --- | --- |
| Hash | `{key}:{hash(childId) % S}` | Even distribution, full-set reads |
| Time-based (hourly) | `{key}:{YYYY-MM-DD-HH}` | Recent-window reads, time-bounded cleanup |
| Time-based (daily) | `{key}:{YYYY-MM-DD}` | Daily rollups, TTL-based expiry of old shards |

Hash shards are the default when the application reads the full set of records. Time-based shards are preferred when the common read targets a recent window (“last 24 hours of followers”), because the application can batch-read only the relevant time buckets.

### Activation

You do not need to start with sharding. Begin with a single consolidated record. Use a filter expression on the write path to check whether the `subkeys` bin exists:

-   If `subkeys` is absent, write directly to the record.
-   If `subkeys` is present, read its value and route the write to the appropriate sub-record.

This makes the non-sharded path zero-overhead. The application activates sharding for a specific record by writing the `subkeys` bin when the record reaches a size or contention threshold.

### Sizing shards

For hash shards, choose the shard count so that each sub-record stays in the 1–128 KiB target range under expected growth. If each relationship entry is roughly 15 bytes and you expect up to 300,000 entries:

```plaintext
300,000 entries × 15 bytes = ~4.5 MiB total

4.5 MiB ÷ 128 KiB per shard ≈ 35 shards
```

For time-based shards, choose the time boundary so that the per-period entry count produces records in the target range. A daily shard for a user with 100 new followers per day at 15 bytes each is roughly 1.5 KiB per shard — well within range.

## Choosing the right pattern

| Factor | Pattern 1 (key list) | Pattern 2 (consolidation) | Pattern 3 (SI) | N:M (bidirectional lists) |
| --- | --- | --- | --- | --- |
| Child count | Tens to low hundreds | Up to p99 × size ≤ ~128 KiB | Unbounded | Per side: ordered list |
| Who reads | Parent | Parent | Child or inverse | Both sides |
| Inverse access | Requires separate infrastructure | Requires separate infrastructure | Native | Native (one read + batch) |
| Write contention | Low (parent record only) | Risk at high concurrency | Low (children are separate) | Moderate (both sides update) |
| Index cost | One PI entry per child | One PI entry per parent | One PI entry per child + SI entries | One PI entry per entity |