Data modeling

For the complete documentation index see: llms.txt

All documentation pages available in markdown.

You drive Aerospike data modeling from access patterns, not from entity normalization or document embedding. This section explains the principles and patterns you need to design an effective data model on Aerospike.

A different mental model

If you come from relational databases, your instinct is to normalize: one table per entity, small rows, joins at query time. If you come from document databases, your instinct is to embed: nest related data in a single document for atomic writes and single-read access.

Aerospike works differently. There are no joins, no server-side $lookup, and no multi-document transactions that span arbitrary collections. Instead, Aerospike provides efficient batch reads that scatter/gather across nodes in parallel. These architectural differences, the 8 MiB record-size limit, the cost of primary index metadata, and the efficiency of batch reads, give you a distinct set of modeling trade-offs.

The three approaches, compared:

Approach	Optimizes for	Reassembly tool	Record shape
Relational	Storage efficiency (normalization)	Server-side joins	Small rows, many tables
Document	Read convenience (embedding)	Server-side `$lookup` or client round-trips	Large documents, few collections
Aerospike	Latency and memory cost	Batch reads (scatter/gather)	Medium records (1–128 KiB), shaped by access patterns

Access patterns drive design

Every Aerospike modeling decision follows from one question: how does the application read and write this data?

Records are the unit of I/O. A read fetches one record; a write updates one record atomically. When the application needs data from multiple records, batch reads fetch them in parallel across the cluster. Designing your model so that common reads resolve to a single get or a bounded batch get is the primary goal.

This means:

Consolidate related data into a single record when it is read together and the combined size stays in the 1–128 KiB sweet spot.
Split data across multiple records when entities have different lifecycles, different access frequencies, or when consolidation would create records that are too large or too hot.
Denormalize when a copy of data in a second record serves a different access pattern and the application tolerates bounded staleness.

Records, not rows

An Aerospike record is not a relational row. A record contains bins (named, typed fields), and each bin can hold a scalar value, a List, a Map, or a deeply nested combination of both. One bin holding a Map of 1,000 entries is often better than 1,000 records with one value each.

This flexibility means you can model a “user profile with 200 segment tags” as a single record with a Map bin, or a “sensor with a day’s readings” as a single record with a List bin — patterns that would require separate tables or embedded arrays in other systems.

The trade-off is that Aerospike has no server-enforced schema. The data model is an application-level contract: the team agrees on namespace, set, key format, bin names, and bin types so that all clients read and write the same logical structure.

The cost of records

Every record in the cluster carries a fixed overhead: 64 bytes of primary index metadata, typically stored in memory. This metadata enables consistent, single-hop access to any record by key — it is what makes Aerospike fast.

But it also means that many tiny records cost disproportionate memory. A design with millions of 50-byte records spends more memory on index metadata than on data. Consolidating those into fewer, medium-sized records (each in the 1–128 KiB range) gives a much better index-to-data ratio while still allowing fine-grained access through collection data type operations.

Batch reads replace joins

In a relational database, you normalize and then join. In Aerospike, you design keys so that the data you need is either in one record or reachable through a bounded batch read.

The following example fetches a user profile and three recent order records in a single batch call. The application derives all four keys from values it already has — no index required.

Key[] keys = new Key[] {
    new Key("app", "users", "user:alice"),
    new Key("app", "orders", "order:1001"),
    new Key("app", "orders", "order:1002"),
    new Key("app", "orders", "order:1003")
};

Record[] records = client.get(null, keys);

keys = [
    ("app", "users", "user:alice"),
    ("app", "orders", "order:1001"),
    ("app", "orders", "order:1002"),
    ("app", "orders", "order:1003"),
]

batch_records = client.batch_read(keys)
for rec in batch_records.batch_records:
    print(rec.key, rec.record)

key1, _ := as.NewKey("app", "users", "user:alice")
key2, _ := as.NewKey("app", "orders", "order:1001")
key3, _ := as.NewKey("app", "orders", "order:1002")
key4, _ := as.NewKey("app", "orders", "order:1003")
keys := []*as.Key{key1, key2, key3, key4}

records, err := client.BatchGet(nil, keys)

as_batch batch;
as_batch_inita(&batch, 4);

as_key_init_str(as_batch_keyat(&batch, 0), "app", "users", "user:alice");
as_key_init_str(as_batch_keyat(&batch, 1), "app", "orders", "order:1001");
as_key_init_str(as_batch_keyat(&batch, 2), "app", "orders", "order:1002");
as_key_init_str(as_batch_keyat(&batch, 3), "app", "orders", "order:1003");

aerospike_batch_get(&as, &err, NULL, &batch, batch_read_cb, NULL);

Key[] keys = new Key[] {
    new Key("app", "users", "user:alice"),
    new Key("app", "orders", "order:1001"),
    new Key("app", "orders", "order:1002"),
    new Key("app", "orders", "order:1003")
};

Record[] records = client.Get(null, keys);

const batchRecords = [
  { type: batchType.BATCH_READ, key: new Aerospike.Key('app', 'users', 'user:alice'), readAllBins: true },
  { type: batchType.BATCH_READ, key: new Aerospike.Key('app', 'orders', 'order:1001'), readAllBins: true },
  { type: batchType.BATCH_READ, key: new Aerospike.Key('app', 'orders', 'order:1002'), readAllBins: true },
  { type: batchType.BATCH_READ, key: new Aerospike.Key('app', 'orders', 'order:1003'), readAllBins: true },
]

const results = await client.batchRead(batchRecords)

Batch reads scatter the request across the nodes that own the relevant partitions and gather the results. For a batch of 10–100 keys, latency is typically close to a single-record read because the requests execute in parallel. This makes “many medium records, fetched together” a practical and performant pattern.

What this section covers

The remaining pages in this section walk through the modeling principles and patterns in detail:

Record sizing — the 64-byte index cost, the 1–128 KiB sweet spot, and how to consolidate effectively.
Key design — how you design keys to control partition placement, access paths, and denormalization.
Collections — List and Map patterns, nested operations, and list-of-structs with path expressions.
Relationships — 1:1, 1:N, and N:M patterns, write contention, and overflow strategies.
Indexes — secondary indexes, set indexes, expression indexes, and when not to index.
Conventions — bin naming, identifier formats, and timestamp contracts.