# Record sizing

Every Aerospike record carries a fixed overhead in the [primary index](https://aerospike.com/docs/database/learn/architecture/data-storage/primary-index): **64 bytes of metadata** per record, typically stored in memory. This metadata gives you consistent, single-hop reads — but it also means that record count has a direct cost. Record sizing is the practice of choosing how much data to pack into each record so that the index-to-data ratio, I/O characteristics, and defragmentation behavior all stay in a healthy range.

## The 64-byte cost

The primary index stores one entry per record in the namespace. Each entry holds the 20-byte digest, generation counter, void-time (expiration), last-update time, replication state, and a pointer to the record’s location on storage. The total is approximately 64 bytes per record, multiplied by the [replication factor](https://aerospike.com/docs/database/learn/architecture/clustering/data-distribution) for the namespace.

For a namespace with 100 million records at replication factor 2 (RF2), the primary index alone consumes roughly:

```plaintext
100,000,000 records × 64 bytes × 2 (RF2) = ~12.8 GiB
```

That memory is well spent when each record holds a meaningful amount of data. But if those 100 million records each hold only 50 bytes of payload, the cluster spends more memory on index metadata than on actual data. Consolidating those into 1 million records of ~5 KiB each would reduce the primary index to ~128 MiB while storing the same data.

## The sweet spot: 1–128 KiB

Aerospike records perform best when they fall in the **1–128 KiB** range. This is sometimes called the Goldilocks Principle:

**Too small** — When records are much smaller than 1 KiB, the ratio of useful data to index overhead is poor. In most deployments the primary index lives in memory while data lives on SSD, so many tiny records increase memory cost without using storage efficiently.

**Too large** — Records above 128 KiB begin to affect read and write latency because each I/O operation transfers more data. Very large records also increase the cost of defragmentation: Aerospike reclaims storage by copying live records out of partially-stale write blocks, and large records make each copy more expensive.

**In range** — Records between 1 and 128 KiB balance index-to-data ratio, I/O size, and defragmentation cost. [Batch reads](https://aerospike.com/docs/develop/learn/batch/) make fetching many records in this range efficient, so you do not sacrifice read performance by distributing data across multiple well-sized records instead of packing everything into one.

The maximum record size is **8 MiB**. Records can exceed the 128 KiB guideline when the access pattern justifies it — for example, a consolidated list that grows into the hundreds of KiB but is always read as a whole. The 1–128 KiB band is a design target, not a hard boundary.

## Consolidation: the primary technique

Consolidation means combining data that would be separate rows in a relational database into a single Aerospike record, using [collection data types](https://aerospike.com/docs/develop/data-types/collections/) (Lists and Maps) to structure the data within bins.

The two most common forms of consolidation are:

**Roll up by time slice.** Instead of one record per event, store one record per entity per time period. An IoT sensor that reports every minute produces 1,440 readings per day. One record per reading would create 1,440 tiny records, each costing 64 bytes of index metadata. Rolling up to one record per sensor per day — with a List bin holding `[minute, value]` tuples — produces a single record of roughly 10 KiB and one index entry.

The key encodes the sensor ID and date, and each reading is a compact `[minute, value]` tuple appended to the `readings` bin:

-   [Java](#tab-panel-2196)
-   [Python](#tab-panel-2197)
-   [Go](#tab-panel-2198)
-   [C](#tab-panel-2199)
-   [C#](#tab-panel-2200)
-   [Node.js](#tab-panel-2201)

```java
Key key = new Key("iot", "sensors", "sensor:4910:2026-03-30");

List<Value> tuple = Arrays.asList(Value.get(723), Value.get(22.5));

Record record = client.operate(null, key,

    ListOperation.append("readings", Value.get(tuple)));
```

```python
from aerospike_helpers.operations import list_operations

key = ("iot", "sensors", "sensor:4910:2026-03-30")

ops = [

    list_operations.list_append("readings", [723, 22.5])

]

_, _, bins = client.operate(key, ops)
```

```go
key, _ := as.NewKey("iot", "sensors", "sensor:4910:2026-03-30")

tuple := []interface{}{723, 22.5}

record, err := client.Operate(nil, key,

    as.ListAppendOp("readings", tuple),

)
```

```c
as_key key;

as_key_init_str(&key, "iot", "sensors", "sensor:4910:2026-03-30");

as_arraylist tuple;

as_arraylist_inita(&tuple, 2);

as_arraylist_append_int64(&tuple, 723);

as_arraylist_append_double(&tuple, 22.5);

as_operations ops;

as_operations_inita(&ops, 1);

as_operations_list_append(&ops, "readings", NULL, NULL, (as_val*)&tuple);

as_record* rec = NULL;

aerospike_key_operate(&as, &err, NULL, &key, &ops, &rec);
```

```csharp
Key key = new Key("iot", "sensors", "sensor:4910:2026-03-30");

IList tuple = new List<Value> { Value.Get(723), Value.Get(22.5) };

Record record = client.Operate(null, key,

    ListOperation.Append("readings", Value.Get(tuple)));
```

```js
const key = new Aerospike.Key('iot', 'sensors', 'sensor:4910:2026-03-30')

const ops = [

  lists.append('readings', [723, 22.5])

]

const result = await client.operate(key, ops)
```

**Aggregate by entity.** Instead of one record per sub-entity, store one record per parent entity with a Map or List bin holding the children. A user profile with 200 segment tags can store them in a single Map bin rather than in 200 separate records. The Map supports in-place operations (put, remove, get by key, get by value range) so the application can read or update individual segments without fetching the entire record.

You add a segment tag with `map_put` and read a subset of tags with `map_get_by_key_list` — both operate on the `segments` Map bin in place, without transferring the full record:

-   [Java](#tab-panel-2202)
-   [Python](#tab-panel-2203)
-   [Go](#tab-panel-2204)
-   [C](#tab-panel-2205)
-   [C#](#tab-panel-2206)
-   [Node.js](#tab-panel-2207)

```java
Key key = new Key("app", "users", "user:alice");

client.operate(null, key,

    MapOperation.put(MapPolicy.Default, "segments",

        Value.get("premium"), Value.get(true)));

List<Value> wanted = Arrays.asList(

    Value.get("premium"), Value.get("early_adopter"));

Record record = client.operate(null, key,

    MapOperation.getByKeyList("segments", wanted, MapReturnType.KEY_VALUE));
```

```python
import aerospike

from aerospike_helpers.operations import map_operations

key = ("app", "users", "user:alice")

ops = [map_operations.map_put("segments", "premium", True)]

client.operate(key, ops)

ops = [

    map_operations.map_get_by_key_list(

        "segments", ["premium", "early_adopter"],

        aerospike.MAP_RETURN_KEY_VALUE)

]

_, _, bins = client.operate(key, ops)
```

```go
key, _ := as.NewKey("app", "users", "user:alice")

_, err := client.Operate(nil, key,

    as.MapPutOp(as.DefaultMapPolicy(), "segments", "premium", true),

)

wanted := []interface{}{"premium", "early_adopter"}

record, err := client.Operate(nil, key,

    as.MapGetByKeyListOp("segments", wanted, as.MapReturnType.KEY_VALUE),

)
```

```c
as_key key;

as_key_init_str(&key, "app", "users", "user:alice");

as_operations ops;

as_operations_inita(&ops, 1);

as_boolean val;

as_boolean_init(&val, true);

as_operations_add_map_put(&ops, "segments", NULL,

    (as_val*)as_string_new("premium", false), (as_val*)&val);

aerospike_key_operate(&as, &err, NULL, &key, &ops, NULL);

as_operations read_ops;

as_operations_inita(&read_ops, 1);

as_arraylist key_list;

as_arraylist_inita(&key_list, 2);

as_arraylist_append_str(&key_list, "premium");

as_arraylist_append_str(&key_list, "early_adopter");

as_operations_add_map_get_by_key_list(&read_ops, "segments",

    (as_list*)&key_list, AS_MAP_RETURN_KEY_VALUE);

as_record* rec = NULL;

aerospike_key_operate(&as, &err, NULL, &key, &read_ops, &rec);
```

```csharp
Key key = new Key("app", "users", "user:alice");

client.Operate(null, key,

    MapOperation.Put(MapPolicy.Default, "segments",

        Value.Get("premium"), Value.Get(true)));

IList wanted = new List<Value> {

    Value.Get("premium"), Value.Get("early_adopter")

};

Record record = client.Operate(null, key,

    MapOperation.GetByKeyList("segments", wanted, MapReturnType.KEY_VALUE));
```

```js
const key = new Aerospike.Key('app', 'users', 'user:alice')

await client.operate(key, [

  maps.put('segments', 'premium', true)

])

const result = await client.operate(key, [

  maps.getByKeyList('segments', ['premium', 'early_adopter'],

    maps.returnType.KEY_VALUE)

])
```

## Sizing a record: what to measure

When deciding whether to consolidate or split, estimate the record size under realistic conditions. The inputs are:

-   **Child count distribution** — not the average, but the p50, p95, and p99. A comments-on-a-post model where the median post has 12 comments but the p99 has 500 comments needs a design that handles the tail.
-   **Child size distribution** — the typical and maximum size of each child element. A comment might average 400 bytes but can reach 2,800 bytes.
-   **Growth trajectory** — whether the record grows over time (list appends, map additions) or is written once and rarely updated.
-   **Access pattern** — whether the common read fetches the entire record or a subset. If most reads need all the children, consolidation works well. If most reads need a single child, bin projection or CDT operations can extract it without transferring the full record.

Multiply child count by child size at the p99 to estimate the worst-case record size. If that stays within the 1–128 KiB band, consolidation is the default choice. If it exceeds that band but stays well under 8 MiB and the access pattern reads the full record, consolidation is still viable — the record is larger than ideal but functional.

### Worked example: comments on a post

Consider a social application where posts receive comments. A sample of posts from a popular account shows:

| Metric | Value |
| --- | --- |
| Comments per post (sample) | 350, 480, 590 |
| Maximum comment length | ~2,800 characters |
| Typical comment length | Under 400 characters |
| Aggregate comment payload per post | ~175 KiB |

The aggregate payload per post is in the low hundreds of KiB — above the sweet spot but well below the 8 MiB limit. One record per comment would create hundreds of tiny records per post, each costing 64 bytes of primary index overhead, and would require assembling the comment thread from many separate reads.

Consolidating all comments for a post into a single record (or a small bounded set of records) gives one read for the full thread, atomic ordering operations, and far lower index cost. If a post later exceeds a size threshold, an overflow strategy (such as the shard-on-demand pattern described in [Relationships](https://aerospike.com/docs/develop/data-modeling/relationships)) distributes the data across sub-records.

The key insight: high child count alone does not mean you need separate records. Use child count, child size distribution, and access pattern together.

## When not to consolidate

Consolidation is the default, but some situations call for separate records:

-   **Different lifecycles.** Record-level time to live (TTL) applies to the entire record, so it cannot expire individual children within a consolidated structure. However, children stored in a Map bin can be cleaned up by date using [Map API](https://aerospike.com/docs/develop/data-types/collections/map/) operations (for example, `remove_by_value_range` on a timestamp field) or [path expressions](https://aerospike.com/docs/develop/expressions/path/) that filter on an expiry value. If the application can tolerate this application-managed cleanup instead of automatic TTL expiration, consolidation still works. Use separate records when you need the server to expire individual children automatically via TTL.
-   **Independent access frequency.** If the child is read far more often (or far less often) than the parent, co-locating them wastes I/O on the common path.
-   **Inverse access.** If the application needs to find all parents that contain a given child (“which posts has this user commented on?”), a child-held reference with a [secondary index](https://aerospike.com/docs/database/learn/architecture/data-storage/secondary-index) query is often more practical than scanning consolidated parent records.
-   **Write contention.** If many concurrent writers update the same consolidated record, the record becomes a hot key. Evaluate whether expected concurrent writes exceed roughly 50 per second sustained before committing to consolidation. See [Relationships](https://aerospike.com/docs/develop/data-modeling/relationships) for contention patterns.
-   **Irrelevant data on the hot path.** If the common read needs a small piece of data that shares a record with a large CDT, and neither bin projection nor CDT sub-operations can efficiently extract it, splitting the data into separate records may reduce read amplification.

## Many medium records over one giant record

When consolidation would produce a record larger than the sweet spot, prefer splitting the data across multiple medium-sized records rather than accepting one large record. [Batch reads](https://aerospike.com/docs/develop/learn/batch/) fetch multiple records in parallel across the cluster, so the latency of reading 10 records of 10 KiB each is comparable to reading one record of 100 KiB — but the smaller records give finer-grained access, better defragmentation behavior, and lower risk of hot-key contention.

This applies at every level of the model. A user with 50,000 followers does not need all follower IDs in a single record. Distributing them across time-bucketed or hash-sharded sub-records keeps each record in the target size range and allows bounded reads (for example, “most recent 100 followers”) without transferring the full list.

## Index cost as a design input

Use the 64-byte-per-record cost as a concrete input to your sizing decisions. For any candidate record layout, estimate:

```plaintext
total_index_memory = record_count × 64 bytes × replication_factor
```

Compare two designs side by side. If a one-record-per-child layout creates 10 million records at 50 bytes each, that is 640 MiB of index memory for 500 MiB of data. Consolidating to 100,000 records at 5 KiB each reduces index memory to 6.4 MiB for the same data, a 100x reduction.

This calculation is especially important for data that accumulates over time (events, readings, relationship edges). The record count, and therefore the index cost, grows with every new item. A consolidated layout grows the record size, not the record count, until you reach the threshold for an overflow strategy.

For detailed capacity planning formulas, including storage overhead, secondary index memory, and provisioning guidance, see the [capacity planning guide](https://aerospike.com/docs/database/manage/planning/capacity/).