Primary index

This page describes the purpose of the primary index.

What is the primary index?

This page explains how the Aerospike Database automatically creates and structures primary indexes using metadata, distributed partitions, and “sprigs” to ensure consistent, low-latency data access.

What is the primary index?

A primary index is a specific type of indexing where the index is defined on the primary key of a table. Because the primary key is unique and not null, the primary index allows the database engine to locate a specific record with maximum efficiency.

Aerospike Database creates a primary index automatically for each namespace with a metadata entry for every record in the namespace. This allows Aerospike to provide consistent low-latency access to any record in the database, regardless of the number of records or the total size of the data.

Index metadata

Each primary index metadata entry consumes 64 bytes of storage. The primary index storage is configurable in Aerospike Enterprise Edition (EE) - in memory (DRAM), in persistent memory (PMem), or on an NVMe SSD. The default is for the primary index to live in memory.

In addition to a 20-byte record digest, record metadata includes the following:

The generation count increments each time the record is written; used for resolving conflicting updates.
The void-time tracks when a key expires; used by the expiration and eviction subsystem.
The Last Update Time (LUT) tracks when the record was last written; used for conflict resolution during cold restart and migration (depending on your configuration settings), to filter records using expressions, for incremental backup scans, for truncate commands, for cross-datacenter replication (XDR), and more.
Replication state information; used for strong-consistency namespace.
The exact location of the record in the data storage device; this supports retrieving the record with a single read IO.

Index structure

Aerospike’s primary index is a blend of a distributed hash table with a distributed tree structure in each server. The entire keyspace in the namespace is separated using a robust hash function into partitions. A total of 4096 partitions are evenly distributed across cluster nodes. The replication-factor namespace configuration determines how many copies are kept in replica partitions, which are never on the same node as the master partition. See data-distribution for details on hashing and partitioning.

When a cluster node fails, the indexes on other nodes, where the replica partitions live, are immediately available. If the failed node remains down, data starts rebalancing through data migration, and indexes are built on the new partitions in each node.

Sprigs

Aerospike’s primary index uses red–black trees. Instead of using a single index tree, Aerospike breaks the partition index into multiple smaller red–black trees per partition, called sprigs.

By replacing one massive tree with many smaller ones, this design addresses two specific performance bottlenecks:

Latency from traversals: On nodes exceeding 1 million records per partition and experiencing throughput of over 1 million transactions per second, traversal on a 20-or-more-level index tree could be the single greatest cause of transaction latency.
Access locks: Because each sprig (or group of sprigs) can be locked independently, the database can perform parallel index operations, ensuring that a lock on one section of the data doesn’t block transactions in another.

You can configure the number of sprigs per partition with the partition-tree-sprigs parameter. Configuring the right number of sprigs is a trade-off between extra space overhead and optimized parallel access.

Index persistence

The primary index is derived from the data itself and can be rebuilt from that data, depending on the configuration setting for fast restart (AKA warmstart). Fast restart enables upgrades with minimal downtime in Aerospike EE variants.

Fast restart allocates index memory from a shared memory segment (shmem). For planned shutdowns and restarts, for an upgrade for example, the server re-attaches to the shared memory segment and activates the primary indexes on restart without a data scan of the storage.

Index storage

Where the server stores a primary index is determined by the index-type configuration parameter. The following options are available:

Type	Description
`shmem`	Linux shared memory.
`flash`	A block storage device, typically NVMe SSD.
`pmem`	Persistent memory, such as Intel Optane DC Persistent Memory.

For more information about primary index storage methods, see Configure the primary index.