# Leveraging collections

Lists and Maps are the building blocks you use to consolidate data into well-sized records. Instead of spreading each child entity across separate records, each costing 64 bytes of [primary index](https://aerospike.com/docs/database/learn/architecture/data-storage/primary-index) overhead, you pack related data into [collection data type](https://aerospike.com/docs/develop/data-types/collections/) bins and operate on individual elements in place.

This page covers the modeling patterns that Lists and Maps enable. For the full API reference of each operation, see the [List operations](https://aerospike.com/docs/develop/data-types/collections/list/operations) and [Map operations](https://aerospike.com/docs/develop/data-types/collections/map/operations) pages.

## Map subtypes and ordering

Starting with Aerospike Database 7.0, all Maps are stored in key order on the server regardless of the order hint the client uses when creating them. Maps have three subtypes that differ in which internal indexes they maintain. All three share the same API and support key, index, value, and rank operations. You can convert between subtypes with `set_type()`.

-   **Unordered** — no internal indexes. All lookups scan elements. Lowest storage overhead.
-   **K-ordered** — maintains a key offset index that maps each key position to a byte offset within the packed map.
-   **KV-ordered** — maintains both the key offset index and a value order index that maps rank to element.

Two access dimensions are available on every Map:

-   **Index** is the position in key order (0-based, negative from end). Index 0 is the entry with the smallest key; index -1 is the largest.
-   **Rank** is the position in value order (0 = smallest value). When two entries have the same value, the tie is broken by key order: the entry with the lower key-order position gets the lower rank.

Starting with Aerospike Database 7.1, Map keys are restricted to integer, string, or blob types.

### Persisting internal indexes

By default, the server rebuilds a Map’s internal indexes for each operation and discards them afterwards. Setting `PERSIST_INDEX` in the map policy stores the indexes on disk inside the map particle, so subsequent operations load them directly instead of rebuilding.

There are two persistence levels:

-   **Persisted offset index** (`PERSIST_INDEX` without `V_ORDERED`) — eliminates the per-operation O(N) index rebuild cost. Key lookups become O(log N) and index-based access becomes O(1).
-   **Persisted full index** (`PERSIST_INDEX` with `V_ORDERED`) — persists both the key offset and value order indexes. Rank lookups become O(1) and value-based searches become O(log N), in addition to the offset index benefits.

Persist-index is supported only for top-level Maps. The server silently ignores the flag on nested Maps. For Maps accessed frequently or holding thousands of entries, `PERSIST_INDEX` can reduce operation latency significantly.

For the full performance characteristics of each subtype and persistence level, see the [Map performance](https://aerospike.com/docs/develop/data-types/collections/map/performance/) page.

## Composite-key technique

When you encode the desired sort dimension directly in the map key, key order becomes your display order, and index-based operations serve as rank operations without additional storage.

A leaderboard Map keyed by zero-padded `score-playerId` strings illustrates this:

```plaintext
{"00150-alice": {...}, "00320-bob": {...}, "00320-carol": {...}, "00780-dave": {...}}
```

The entry at index 0 is the lowest score. The entry at index -1 is the highest. The player ID portion breaks ties deterministically. To get the top 50 scores, call `get_by_index_range` with index -50 and count 50 — a single CDT operation on one record, with no secondary structure.

To break ties by time instead of player ID, extend the key to `score-timestamp-playerId`. The key format controls the sort and the tie-breaking, and you define it once at write time.

For large Maps using this technique, `PERSIST_INDEX` makes `get_by_index_range` O(M) instead of O(N), where M is the number of returned entries and N is the total element count. A “top 50” query on a Map with 10,000 entries becomes constant-time relative to the map size.

## Ordered lists with ADD\_UNIQUE

When you need a set-like collection — a list of unique values with efficient membership checks — use an ordered List with the `ADD_UNIQUE` write flag.

On an ordered List, `ADD_UNIQUE` uses binary search to check for duplicates: O(log n). On an unordered List, the same flag falls back to a linear scan: O(n). For relationship lists with thousands of entries (follower IDs, tag collections, reference lists), the difference is significant.

Combine `ADD_UNIQUE` with the `NO_FAIL` write flag when you want the write to succeed silently if the value already exists. This lets you treat the operation as an idempotent “ensure present” without error handling for duplicates.

For ordered Lists that grow into the thousands of entries, set `persistIndex` in the list policy to store the offset index on disk. Without persist, the offset index is rebuilt per operation — an O(N) walk through packed elements. With persist, `ADD_UNIQUE` inserts stay at O(log N) and paginated reads with `list_get_by_index_range` become O(M) for M returned elements instead of O(N + M). In an ordered List, index and rank are equivalent, so rank-based operations also benefit. Persist-index is supported only for top-level Lists; the server silently ignores it on nested Lists.

## Value-based access with list tuples

Maps let you look up entries by key, but they do not support wildcard matching or interval queries on values. When you need value-range or wildcard-based selection, consider storing structured data as list tuples instead of map entries.

A list of `[timestamp, deviceId, reading]` tuples supports queries like “all readings where the first element is between T1 and T2” using `get_by_value_interval` with list-to-list bounds. The same query on map values would require iterating all entries.

The trade-off is that tuples are positional: element 0 is the timestamp, element 1 is the device ID, and so on. Adding or reordering fields requires migrating existing data. For structures that change shape over time, named fields in a Map or the list-of-structs pattern (described later on this page) give safer evolution.

## Nested context

You operate on elements deep inside a collection by providing a context path — a stack of selectors that navigate from the bin to the target element. Each selector drills one level deeper:

-   `BY_LIST_INDEX(n)` / `BY_MAP_KEY(k)` — select by position or key.
-   `BY_LIST_RANK(r)` / `BY_MAP_RANK(r)` — select by value order.
-   `BY_LIST_VALUE(v)` / `BY_MAP_VALUE(v)` — select by value match.
-   `MAP_KEY_CREATE(k)` / `LIST_INDEX_CREATE(n)` — select or create if missing (Aerospike Database 4.9+).

For example, to increment a counter inside a nested Map structure like `{stats: {accolades: {jokes: 317}}}`, you provide a context of `[MAP_KEY_CREATE("stats"), MAP_KEY_CREATE("accolades")]` and then call `map_increment` on key `"jokes"`. The `CREATE` variants ensure the path exists even on the first write, eliminating the need for a separate initialization step.

The maximum nesting depth for context operations is 15 levels. In practice, most models stay within 2–4 levels. If you find yourself reaching deeper, that is often a signal to flatten the structure or split it across records.

For the full context API and multi-language examples, see [Context for operations on nested elements](https://aerospike.com/docs/develop/data-types/collections/context/).

## Multiple sort dimensions

A Map’s key order gives you one native sort dimension, and rank gives you a second (value order). When you need additional sort dimensions — for example, a leaderboard sorted by score that also supports “most recent entries” queries — you maintain auxiliary sorted structures alongside the primary one.

A common pattern: the primary Map bin is keyed by `score-playerId` for rank access, and a second bin holds an ordered List of `[timestamp, playerId]` tuples for recency queries. Both bins update atomically in a single `operate()` call, so they stay consistent.

The cost is write amplification: every insert or update touches multiple bins. The benefit is that each read pattern resolves to a single CDT operation on the appropriate bin, with no client-side sorting or post-processing.

## List-of-structs with path expressions

[Path expressions](https://aerospike.com/docs/develop/expressions/path/) let you treat a List of Maps as a collection of typed structures with named fields. Each Map in the list represents one entity (a notification, a line item, a sensor reading), and path expression operations select or modify entries based on field values. Path expressions require Aerospike Database 8.1.2 or later. (Database 8.1.1 introduced preview support; 8.1.2 adds `mapKeysIn` and `andFilter` context types and is the production prerequisite.)

A notification timeline illustrates the pattern. Each user has one record per day, keyed `notif:{userId}:{YYYY-MM-DD}`, with a bin `items` holding a List of Maps:

```json
[

  {"type": "like",    "from": "bob",   "read": false, "ts_ms": 1711929600000},

  {"type": "comment", "from": "carol", "read": true,  "ts_ms": 1711933200000},

  {"type": "follow",  "from": "dave",  "read": false, "ts_ms": 1711936800000}

]
```

With path expressions you can:

-   Select all unread notifications with a single `selectByPath` call that filters on `read == false`.
-   Mark a specific notification as read with `modifyByPath`, targeting the entry where `from == "carol"` and `type == "comment"`.
-   Remove all notifications from a blocked user with `modifyByPath` and a filter on the `from` field.

These operations execute server-side in one round trip, without transferring the full list to the client. The named fields (`type`, `from`, `read`, `ts_ms`) make the structure self-describing and evolvable — adding a new field does not require migrating existing entries.

Baseline CDT operations still handle the common cases: `list_append` to add new notifications, `list_get_by_index_range` to paginate, and `list_size` to get the count. Path expressions extend the model with field-level filtering and mutation when you need them.

Starting with Database 8.1.2, you can also use `mapKeysIn` to select map entries by key — equivalent to SQL `WHERE key IN (k1, k2, ...)` — and `andFilter` to apply an additional filter at the same context level. Both use the map’s internal index for efficient lookup.

For the full path expression API, worked examples, and performance guidance, see the [path expressions](https://aerospike.com/docs/develop/expressions/path/) section.

## Choosing between Lists and Maps

The choice depends on how you access the data:

| Access pattern | Preferred CDT | Why |
| --- | --- | --- |
| Look up by unique key | Map | Direct key lookup: O(log n) |
| Maintain a ranked or sorted collection | Map (composite key) | Key order = sort order |
| Deduplicated set of values | Ordered List + `ADD_UNIQUE` | Binary search dedup: O(log n) |
| Value-range or wildcard queries | List of tuples | `get_by_value_interval` with bounds |
| Key-based access with occasional value/rank queries | Map + persisted full index | O(log N) key and value access, O(1) rank |
| Append-heavy, ordered by insertion | Unordered List | Append is O(1) |
| Named-field structures with filtering | List of Maps + path expressions | Field-level select/modify (DB 8.1.2+) |

When the access pattern is primarily key-based with occasional value-range or rank queries, a Map with a persisted full index (`PERSIST_INDEX` + `V_ORDERED`) may be preferable to list tuples. The persisted full index gives you O(log N) value-based access and O(1) rank lookups — comparable to or better than an ordered List with persisted index for those operations — while retaining the self-describing key-value structure.

When neither a pure List nor a pure Map fits, combine them: a Map bin for the primary lookup path and a List bin for a secondary access pattern, updated atomically in a single `operate()` call.

### Equality comparison caveat

Only ordered Maps (K-ordered or KV-ordered) can be reliably compared for equality. Unordered Maps have no canonical byte ordering, so two unordered Maps with the same logical content can have different wire representations. Comparisons involving unordered Maps — whether through an expression `eq` operator or through `*_by_value` operations — may return false even when the Maps contain the same elements.

This matters when you store a List of Maps and use `ADD_UNIQUE` for deduplication. If the Map elements are unordered, `ADD_UNIQUE` may fail to detect duplicates because the byte representations differ. When you use `ADD_UNIQUE` on a List of Maps, make sure the Map elements are K-ordered or KV-ordered so that comparison works correctly.

## Sizing collections

The same sizing principles from [Record sizing](https://aerospike.com/docs/develop/data-modeling/record-sizing) apply to collections within a record. Estimate the element count at p99 and multiply by the average element size to get the expected bin size.

A Map bin with 1,000 entries of 50 bytes each is roughly 50 KiB — well within the 1–128 KiB sweet spot. A Map bin with 100,000 entries of 50 bytes each is roughly 5 MiB — functional but approaching the territory where the overflow strategies described in [Relationships](https://aerospike.com/docs/develop/data-modeling/relationships) become relevant.

When a collection grows without bound (relationship edges, event logs, accumulated readings), plan for the eventual size, not the launch size. Time-slice the key to cap per-record growth (see [Key design](https://aerospike.com/docs/develop/data-modeling/key-design#entity--time-slice)), or use the shard-on-demand pattern to distribute overflow across sub-records.