# Key design

Choosing the record key is the most consequential decision you make in Aerospike data modeling. The key you choose controls which partition owns the record, which node stores it, and how your application reaches it. A well-chosen key lets common reads resolve to a single `get` or a bounded `batch get`. A poorly chosen key forces scans, secondary index queries, or client-side assembly that you could have avoided.

## Your key choice controls everything

When a client writes or reads a record, it hashes the user-provided key (namespace, set, and user key) through RIPEMD-160 to produce a 20-byte _digest_. Twelve bits of that digest select one of 4,096 [partitions](https://aerospike.com/docs/database/learn/architecture/clustering/data-distribution), and that partition assignment decides which node in the cluster owns the record.

This means two things for data modeling:

**There is no server-side collocation.** You cannot force two records onto the same node by choosing similar keys. Aerospike distributes partitions uniformly across nodes for load balancing. If the application needs data from multiple records, it uses [batch reads](https://aerospike.com/docs/develop/learn/batch/) to fetch them in parallel — the scatter/gather pattern described in the [overview](https://aerospike.com/docs/develop/data-modeling/#batch-reads-replace-joins).

**You control the primary access path.** The [primary index](https://aerospike.com/docs/database/learn/architecture/data-storage/primary-index) maps every digest to a record location in constant time. When you design keys so that your application can derive them from the data it already has — without a lookup — you get the fastest path to any record.

## Key design patterns

The following patterns appear in different Aerospike data modeling approaches. Each one addresses a different access shape.

### Entity + time slice

When data accumulates over time, encode the entity identifier and a time boundary in the key so that each record covers a bounded period.

```plaintext
sensor:{sensorId}:{YYYY-MM-DD}
```

A sensor reporting every minute produces 1,440 readings per day. Storing one record per reading creates many tiny records, each costing 64 bytes of primary index overhead (see [Record sizing](https://aerospike.com/docs/develop/data-modeling/record-sizing)). Rolling up to one record per sensor per day produces a single record in the low-KiB range. The application reads today’s record with a single `get`, or fetches the last seven days with a batch of seven keys — all derivable from the sensor ID and the date range.

The same pattern applies to any time-bucketed data: transactions per account per hour, page views per URL per day, or metrics per service per minute. Choose the time boundary so that the resulting record stays in the 1–128 KiB sweet spot under realistic load.

### Hot-key sharding

When a single record receives more concurrent writes than the server can serialize — signaled by `KEY_BUSY` errors — distribute the workload across multiple sub-records.

```plaintext
counter:{entityId}:{shardIndex}
```

Each sub-record holds a part of the complete value (a partial counter, a slice of a list, a subset of a map). Writers pick a shard by hashing or round-robin; readers batch-read all shards and combine the results. This trades one hot record for a small, bounded set of warm records.

The full shard-on-demand mechanism, including activation via filter expressions and dynamic scaling, is described in [Relationships](https://aerospike.com/docs/develop/data-modeling/relationships).

### Tenant-prefixed

In multi-tenant deployments, prefixing the key with a tenant identifier makes ownership explicit and prevents collisions across tenants.

```plaintext
tenant:{tenantId}:order:{orderId}
```

This is a naming convention, not a performance optimization — the hash distributes tenant-prefixed keys uniformly just like any other key. The benefit is operational clarity: every key is self-describing, and tenant-scoped batch reads or scans can be constructed from the tenant ID without an external index.

### Composite keys for sort order

Inside a [Map](https://aerospike.com/docs/develop/data-types/collections/map/) bin, Aerospike stores entries in key order. When you encode the desired sort dimension in the map key, key order equals display order, and you can use index-based operations as rank operations without additional storage.

```plaintext
{score}-{playerId}
```

A leaderboard map keyed by zero-padded `score-playerId` strings gives you rank lookup by index position: the entry at index 0 is the lowest score, and the entry at index -1 is the highest. The player ID portion breaks ties. To break ties by time instead, extend the key to `score-timestamp-playerId`.

This technique eliminates the need for a separate value-ordered index on the map and makes range queries (for example, “top 50 scores”) a single `get_by_index_range` call.

### Object ID embeds record key

When objects live inside a [collection data type](https://aerospike.com/docs/develop/data-types/collections/) and must be reachable by object ID, embed the record key (or a derivable identifier) in the object ID.

```plaintext
{regionId}:{storeId}
```

If a Map bin holds all stores in a region and the record key is the region ID, a store ID of `regionId:storeId` lets the application derive the record key from any store ID by splitting on the delimiter. One `get` by key, then a `get_by_key` on the Map, reaches the store — no secondary index, no lookup table.

You can apply this pattern to any structure where you consolidate objects into a parent record: comments keyed by `postId:commentId`, line items keyed by `orderId:itemId`, and so on.

## Choosing an identifier format

Not every key needs the same format. The choice between human-readable composite identifiers and compact hashed identifiers depends on the entity’s access pattern and how often the identifier appears in the data model.

**Cleartext composite** identifiers (for example, `alice-1742468400000`) keep their components visible. They are easy to construct, easy to debug with command-line tools, and self-documenting. The trade-off is variable length.

**Hashed** identifiers (for example, a 16-character hex digest of the same components) are compact and fixed-size. When an identifier is repeated heavily — as a map key in thousands of records, or as an element in long lists — fixed-size identifiers make capacity planning simpler and reduce storage at scale.

The decision driver is **repetition pressure**. If an identifier is mostly a record key with low repetition elsewhere, cleartext composites give better operational visibility at negligible cost. If it appears as a nested value in many records, a fixed-size hash saves measurable space.

Choose the format per entity, not globally. A `userId` that serves primarily as a record key can be cleartext, while a `commentId` that appears in list bins across many post records might benefit from a compact hash.

## When to denormalize

Denormalization — storing the same data in multiple records — is a standard technique in Aerospike. When a second copy of a value serves a different access pattern, duplicating it avoids a cross-record read on the hot path. The question is not whether to denormalize, but which values are safe to copy.

Categorize each candidate value before duplicating it:

**Immutable values** such as creation timestamps, original content, and user handles at the time of an action never change after they are written. Copies never go stale. These are always safe to duplicate.

**Slowly-changing values** such as display names, profile photo URLs, and cached aggregate counts change infrequently. Most applications tolerate bounded staleness — a display name that is a few minutes out of date does not break the user experience. These are usually safe to duplicate, with updates propagated on the next write or by a background job.

**Frequently-changing or consistency-critical values** such as account balances, inventory counts, and authorization roles are risky to duplicate. A stale copy can cause incorrect behavior. For these values, prefer a single authoritative record and read it when needed — batch reads make “just read the source” cheap.

A practical rule of thumb: if a copy that is five minutes stale would break something, do not duplicate it. Read the authoritative record instead.

## Growth behavior: the primary split signal

Deciding what belongs in a single record is not a one-time choice. Records that are well-sized at launch can outgrow their design as data accumulates.

The default is to keep related data together: immutable fields, slowly-changing fields, and small slowly-growing lists belong in the same record. The signal to split is when a portion of the record grows by accumulation — list appends, map additions, new relationship edges — at a rate or eventual size that would degrade performance or approach the 8 MiB limit.

Evaluate the size _trajectory_, not just the current snapshot. A record that holds 10 KiB today but gains 500 bytes per day will reach 128 KiB in eight months. If the access pattern requires the full record on every read, that growth may be acceptable. If the access pattern reads only the most recent entries, that growth is a signal to partition the data by time slice and read only the relevant bucket.

For the detailed sizing methodology and worked examples, see [Record sizing](https://aerospike.com/docs/develop/data-modeling/record-sizing).

## Avoiding read amplification

Consolidation reduces record count and index cost, but it can introduce read amplification when the common read needs only a small slice of a large record. Before splitting, consider three techniques that extract a subset without transferring the full record to the client:

**Bin projection.** Request only specific bins in the read policy. If a record has a 100 KiB `comments` bin and a 200-byte `metadata` bin, and the hot path reads only `metadata`, bin projection avoids transferring `comments` entirely.

**Operation projection.** Use List or Map operations to extract specific elements from within a bin. A `map_get_by_key_list` call on a Map bin returns only the requested entries, not the full map.

A record holding a large `settings` Map bin with hundreds of entries does not need to be split if most reads need only a few keys. A single `map_get_by_key_list` call returns the requested entries without transferring the rest:

-   [Java](#tab-panel-2190)
-   [Python](#tab-panel-2191)
-   [Go](#tab-panel-2192)
-   [C](#tab-panel-2193)
-   [C#](#tab-panel-2194)
-   [Node.js](#tab-panel-2195)

```java
Key key = new Key("app", "users", "user:alice");

List<Value> wanted = Arrays.asList(

    Value.get("locale"), Value.get("theme"), Value.get("timezone"));

Record record = client.operate(null, key,

    MapOperation.getByKeyList("settings", wanted, MapReturnType.KEY_VALUE));
```

```python
import aerospike

from aerospike_helpers.operations import map_operations

key = ("app", "users", "user:alice")

ops = [

    map_operations.map_get_by_key_list(

        "settings", ["locale", "theme", "timezone"],

        aerospike.MAP_RETURN_KEY_VALUE)

]

_, _, bins = client.operate(key, ops)
```

```go
key, _ := as.NewKey("app", "users", "user:alice")

wanted := []interface{}{"locale", "theme", "timezone"}

record, err := client.Operate(nil, key,

    as.MapGetByKeyListOp("settings", wanted, as.MapReturnType.KEY_VALUE),

)
```

```c
as_key key;

as_key_init_str(&key, "app", "users", "user:alice");

as_arraylist key_list;

as_arraylist_inita(&key_list, 3);

as_arraylist_append_str(&key_list, "locale");

as_arraylist_append_str(&key_list, "theme");

as_arraylist_append_str(&key_list, "timezone");

as_operations ops;

as_operations_inita(&ops, 1);

as_operations_add_map_get_by_key_list(&ops, "settings",

    (as_list*)&key_list, AS_MAP_RETURN_KEY_VALUE);

as_record* rec = NULL;

aerospike_key_operate(&as, &err, NULL, &key, &ops, &rec);
```

```csharp
Key key = new Key("app", "users", "user:alice");

IList wanted = new List<Value> {

    Value.Get("locale"), Value.Get("theme"), Value.Get("timezone")

};

Record record = client.Operate(null, key,

    MapOperation.GetByKeyList("settings", wanted, MapReturnType.KEY_VALUE));
```

```js
const key = new Aerospike.Key('app', 'users', 'user:alice')

const result = await client.operate(key, [

  maps.getByKeyList('settings', ['locale', 'theme', 'timezone'],

    maps.returnType.KEY_VALUE)

])
```

**Paginated reads.** Return bounded pages from a large List or Map using index-range or rank-range operations. A `list_get_by_index_range` call with an offset and count returns a page of elements without reading the entire list.

If none of these projections fit the access pattern — for example, the hot path needs a small, unrelated piece of data that shares a record with a large collection — that is a signal to split the data into separate records. The goal is not to avoid consolidation, but to ensure that the common read path transfers only what it needs.

## Split-namespace technique

Aerospike computes the same digest for the same `(set, userKey)` pair regardless of which namespace the record lives in. This means two records in different namespaces can share the same logical key while having different storage policies, retention periods, or replication factors.

A common application is separating hot and cold data. The current version of an object lives in a namespace backed by fast NVMe storage with a short retention window. Archived versions live in a namespace backed by high-capacity storage with longer retention. The application uses the same key to read from either namespace, choosing based on whether it needs the current or historical version.

This technique is useful when entities have genuinely different storage requirements but share a natural key. It avoids the need for a lookup table to map between namespaces.