# Data model

Learn how Aerospike organizes data and how the Developer SDK maps these concepts to intuitive APIs.

## Core concepts

Aerospike uses a hierarchical data model:

```plaintext
Cluster

└── Namespace (like a database)

    └── Set (like a table)

        └── Record (like a row)

            └── Bins (like columns)
```

| Concept | Analogous To | Description |
| --- | --- | --- |
| **Namespace** | Database | Top-level container, defines storage and replication |
| **Set** | Table | Logical grouping of records within a namespace |
| **Record** | Row | Individual data entry identified by a key |
| **Bin** | Column | Named field within a record |

## Namespaces

A **namespace** is the top-level data container in Aerospike. Each namespace has its own:

-   Storage configuration (memory, SSD, or hybrid)
-   Replication factor
-   TTL (time-to-live) defaults
-   Data retention policies

::: namespace configuration
Namespaces are configured in `aerospike.conf`, not created at runtime. Work with your database administrator to set up namespaces before using them.
:::

Common namespace patterns:

| Namespace | Typical Use |
| --- | --- |
| `test` | Development and testing |
| `production` | Production application data |
| `cache` | Ephemeral, memory-only data |
| `archive` | Long-term storage on SSD |

## Sets

A **set** is a logical grouping of records within a namespace—similar to a table in relational databases. Unlike tables, sets:

-   Don’t require a schema definition
-   Can be created implicitly when you write your first record
-   Have no enforced structure—each record can have different bins

-   [Java](#tab-panel-2952)
-   [Python](#tab-panel-2953)

```java
// Records in different sets within the same namespace

DataSet users = DataSet.of("app", "users");

DataSet orders = DataSet.of("app", "orders");

DataSet sessions = DataSet.of("app", "sessions");
```

```python
# Records in different sets within the same namespace

users = DataSet.of("app", "users")

orders = DataSet.of("app", "orders")

sessions = DataSet.of("app", "sessions")
```

::: set naming
Use descriptive, plural names for sets: `users`, `orders`, `events`. Avoid special characters and keep names concise.
:::

## Records

A **record** is a single data entry, identified by a unique key within its set. Each record contains:

-   **Key**: The unique identifier you provide
-   **Digest**: A 20-byte hash of the key (what Aerospike actually stores)
-   **Bins**: The data fields
-   **Metadata**: Generation count, TTL, last update time

### Keys and digests

When you create a record with a key like `"user-123"`, Aerospike:

1.  Hashes your key into a 20-byte digest
2.  Optionally stores the original key alongside the digest (controlled by the `send_key` setting on the active Behavior; the Java default is digest only, the Python default keeps the user key)
3.  Uses the digest for all lookups

-   [Java](#tab-panel-2954)
-   [Python](#tab-panel-2955)

```java
// Your key: "user-123"

// Aerospike stores: digest (20-byte hash)

session.insert(users)

    .bins("name")

    .id("user-123").values("Alice")

    .execute();

// To also store the original key for later retrieval:

session.insert(users)

    .bins("name")

    .id("user-123").values("Alice")

    .sendKey()  // Now Aerospike stores both digest AND "user-123"

    .execute();
```

```python
from aerospike_sdk import Behavior

from aerospike_sdk.policy import Settings

# Whether to store the original user key alongside the digest is a Behavior

# setting (`send_key`), not a per-call builder method. Behavior.DEFAULT already

# enables send_key=True, so the original key is stored by default.

# Default session: original key is stored (digest + "user-123")

session_with_key = cluster.create_session(Behavior.DEFAULT)

await session_with_key.insert(key=users.id("user-123")).put({"name": "Alice"}).execute()

# Opt out: derive a behavior with send_key disabled (digest only)

digest_only = Behavior.DEFAULT.derive_with_changes(

    "DIGEST_ONLY",

    all=Settings(send_key=False),

)

session_digest_only = cluster.create_session(digest_only)

await session_digest_only.insert(key=users.id("user-123")).put(

    {"name": "Alice"}

).execute()
```

**Key types supported**:

| Type | Example | Notes |
| --- | --- | --- |
| String | `"user-123"` | Most common, up to 1KB |
| Integer | `12345` | 64-bit signed integer |
| Bytes | `byte[]` / `bytes` | Raw binary data |

### Generation count

Every record has a **generation** number that increments on each update. Use it for optimistic concurrency control:

-   [Java](#tab-panel-2956)
-   [Python](#tab-panel-2957)

```java
import com.aerospike.client.sdk.Record;

import com.aerospike.client.sdk.RecordStream;

// Read current generation

Record record;

try (RecordStream readStream = session.query(users.id("user-1")).execute()) {

    record = readStream.getFirstRecord();

}

int currentGen = record.getGeneration();

// Update only if generation matches (optimistic locking)

session.update(users.id("user-1"))

    .bin("balance").setTo(newBalance)

    .ensureGenerationIs(currentGen)  // Fails if record was modified

    .execute();
```

```python
# Read current generation

stream = await session.query(users.id("user-1")).execute()

row = await stream.first_or_raise()

record = row.record_or_raise()

stream.close()

current_gen = record.generation

new_balance = record.bins["balance"] + 100  # recompute from current state

# Update only if generation matches (optimistic locking)

await (

    session.update(users.id("user-1"))

    .bin("balance").set_to(new_balance)

    .ensure_generation_is(current_gen)

    .execute()

)
```

## Bins

**Bins** are the named fields within a record—similar to columns, but with key differences:

-   **Schema-free**: Different records in the same set can have different bins
-   **Typed per-value**: The same bin name can hold different types in different records
-   **Max 32KB name**: Bin names are limited to 32KB (keep them short)

### Supported data types

| Type | Java | Python | Notes |
| --- | --- | --- | --- |
| String | `String` | `str` | UTF-8, up to 128KB |
| Integer | `long` | `int` | 64-bit signed |
| Double | `double` | `float` | 64-bit IEEE 754 |
| Boolean | `boolean` | `bool` | Stored as integer (0/1) |
| Bytes | `byte[]` | `bytes` | Raw binary, up to 128KB |
| List | `List<?>` | `list` | Ordered, mixed types |
| Map | `Map<?, ?>` | `dict` | Key-value pairs |
| GeoJSON | `GeoJSON` | `GeoJSON` | Geographic data |
| Null | `null` | `None` | Removes the bin |

### Working with bins

-   [Java](#tab-panel-2958)
-   [Python](#tab-panel-2959)

```java
import java.util.List;

import java.util.Map;

import com.aerospike.client.sdk.Record;

import com.aerospike.client.sdk.RecordStream;

// Different data types

session.insert(users)

    .bins("name", "age", "balance", "verified", "tags", "preferences")

    .id("user-1").values(

        "Alice",

        28,

        150.50,

        true,

        List.of("premium", "active"),

        Map.of(

            "theme", "dark",

            "notifications", true

        ))

    .execute();

// Reading typed values

Record record;

try (RecordStream stream = session.query(users.id("user-1")).execute()) {

    record = stream.getFirstRecord();

}

String name = record.getString("name");

long age = record.getLong("age");

double balance = record.getDouble("balance");

boolean verified = record.getBoolean("verified");

List<String> tags = record.getList("tags");

Map<String, Object> prefs = record.getMap("preferences");
```

```python
# Different data types

await session.insert(key=users.id("user-1")).put(

    {

        "name": "Alice",

        "age": 28,

        "balance": 150.50,

        "verified": True,

        "tags": ["premium", "active"],

        "preferences": {

            "theme": "dark",

            "notifications": True,

        },

    }

).execute()

# Reading typed values

stream = await session.query(users.id("user-1")).execute()

row = await stream.first_or_raise()

record = row.record_or_raise()

stream.close()

bins = record.bins

name = bins.get("name")

age = bins.get("age")

balance = bins.get("balance")

verified = bins.get("verified")

tags = bins.get("tags")

prefs = bins.get("preferences")
```

### Nested data

Lists and maps can contain other lists and maps, enabling complex document structures:

-   [Java](#tab-panel-2960)
-   [Python](#tab-panel-2961)

```java
session.insert(users)

    .bins("profile")

    .id("user-1").values(Map.of(

        "name", "Alice Smith",

        "addresses", List.of(

            Map.of("type", "home", "city", "San Francisco"),

            Map.of("type", "work", "city", "Palo Alto")

        ),

        "scores", Map.of(

            "math", List.of(95, 87, 92),

            "science", List.of(88, 91, 89)

        )

    ))

    .execute();
```

```python
await session.insert(key=users.id("user-1")).put(

    {

        "profile": {

            "name": "Alice Smith",

            "addresses": [

                {"type": "home", "city": "San Francisco"},

                {"type": "work", "city": "Palo Alto"},

            ],

            "scores": {

                "math": [95, 87, 92],

                "science": [88, 91, 89],

            },

        }

    }

).execute()
```

## DataSet: the SDK abstraction

The **DataSet** class is the Developer SDK’s way of representing a namespace + set combination. It provides a clean API for identifying where records live:

-   [Java](#tab-panel-2962)
-   [Python](#tab-panel-2963)

```java
// Create a DataSet reference

DataSet users = DataSet.of("app", "users");

// Use it to identify records

RecordId userId = users.id("user-123");

// All operations use the same pattern

session.insert(users).bins("name").id("user-123").values("Alice").execute();

session.query(userId).execute().close();

session.delete(userId).execute().close();

session.query(users).where("$.age > 21").execute().close();
```

```python
# Create a DataSet reference

users = DataSet.of("app", "users")

# Use it to identify records

user_id = users.id("user-123")

# All operations use the same pattern

await session.insert(key=user_id).put({"name": "Alice"}).execute()

(await session.query(user_id).execute()).close()

(await session.delete(key=user_id).execute()).close()

(await session.query(users).where("$.age > 21").execute()).close()
```

### DataSet vs RecordId

| Class | Represents | Used For |
| --- | --- | --- |
| `DataSet` | Namespace + Set | Queries, scans, set-level operations |
| `RecordId` | Namespace + Set + Key | Single-record operations (get, insert, update, delete) |

-   [Java](#tab-panel-2964)
-   [Python](#tab-panel-2965)

```java
DataSet users = DataSet.of("app", "users");  // Set reference

RecordId alice = users.id("alice");           // Record reference

// Query the whole set

session.query(users).where("$.active == true").execute();

// Operate on a specific record

session.query(alice).execute();
```

```python
users = DataSet.of("app", "users")  # Set reference

alice = users.id("alice")           # Record reference

# Query the whole set

(await session.query(users).where("$.active == true").execute()).close()

# Operate on a specific record

(await session.query(alice).execute()).close()
```

## Data modeling best practices

### 1\. Design for access patterns

Unlike relational databases, Aerospike works best when you model data for how you’ll read it:

```plaintext
❌ Relational approach:

   users table + orders table + JOIN

✅ Aerospike approach:

   users set (with embedded recent_orders list)

   orders set (with denormalized user_name)
```

### 2\. Use sets for entity types

-   [Java](#tab-panel-2966)
-   [Python](#tab-panel-2967)

```java
// Good: Different sets for different entities

DataSet users = DataSet.of("app", "users");

DataSet orders = DataSet.of("app", "orders");

DataSet products = DataSet.of("app", "products");
```

```python
# Good: Different sets for different entities

users = DataSet.of("app", "users")

orders = DataSet.of("app", "orders")

products = DataSet.of("app", "products")
```

### 3\. Keep bin names short

Bin names are stored with every record. Use concise names:

```plaintext
❌ "user_email_address"     → 18 bytes per record

✅ "email"                  → 5 bytes per record
```

### 4\. Consider key design

Choose keys that distribute data evenly and support your access patterns:

| Pattern | Example Key | Use Case |
| --- | --- | --- |
| Natural ID | `"user-12345"` | When you have a business identifier |
| UUID | `"550e8400-e29b..."` | When you need uniqueness without coordination |
| Composite | `"user-123:order-456"` | When combining entities |
| Time-based | `"events:2026-01-20"` | For time-series data |

## Next steps

Behaviors

Configure how operations execute—timeouts, retries, consistency.

[Behaviors →](https://aerospike.com/docs/develop/client/sdk/concepts/behaviors)

Create Records

Start writing data with insert and upsert.

[Create Records →](https://aerospike.com/docs/develop/client/sdk/usage/create)

Query with AEL

Search and filter your data.

[AEL Queries →](https://aerospike.com/docs/develop/client/sdk/concepts/ael)

Error Handling

Handle errors gracefully.

[Error Handling →](https://aerospike.com/docs/develop/client/sdk/concepts/errors)