Skip to main content
Loading

Configure Aerospike Database for AVS

Overview

This page describes how to configure the Aerospike Database (ASDB) for use with Aerospike Vector Search (AVS).

note

AVS requires ASDB version 7.x. For more information, see the FAQ.

Configure the namespace for index metadata

AVS relies on a small namespace for storing metadata about your indexes. The metadata includes the name of your index, details about the distance calculation, and dimensionality. The name of this namespace in Aerospike is proximus-meta and must have NSUP set. We recommend the following configuration in your aerospike.conf files.

namespace proximus-meta {
replication-factor 2
memory-size 5G
storage-engine memory
nsup-period 10
}
note

See Configuring AVS for details about changing the metadata namespace name.

Specify a namespace for vector data

You need at least one namespace for storing vector records. For example, the default aerospike.conf file includes the test namespace. To use the test namespace for storing vector data, specify it when inserting vectors.

Create a namespace for index storage (optional)

You can create a separate namespace for index records. A unique namespace for index records lets you support different SLAs than your data records. For example, record data must be replicated and held in SSD storage for reliability, but index data must be stored in memory for better performance.

To use a unique namespace for an index, you must add the namespace to your aerospike.conf files and restart the Aerospike process on each Aerospike Database node.

namespace vector-data {
replication-factor 2
# Aerospike on SSD
storage-engine device {
device /dev/nvme2
flush-size 128K
}
nsup-period 10
}
namespace vector-index {
replication-factor 1
# Aerospike in memory
storage-engine memory {
data-size 16G
}
nsup-period 10
}

To add a set, include the set when creating your index. This creates the set dynamically. No extra configuration to the Aerospike Database is required to use a set.

Monitor index storage using sets

With a unique set for your index data, you can better monitor the storage used specifically by the index. Sets are created dynamically within the Aerospike Database and require no intervention by an administrator.

tip

By default, the Python client creates a set based on the index name.

Estimate total storage requirements

To estimate total storage requirements, you must know the following:

  • vectors - number of vectors
  • dim - number of dimensions of your vectors
  • metadata - size in bytes of additional data stored with your vector record
  1. Calculate the number of HNSW index entries, which can be estimated using the number of vectors:

    graph-nodes = vectors * 1.06
  2. To calculate the size of an index in bytes, you need the number of dimensions of your vectors:

    size-of-vector = dim * 4                         // each dimension is stored as a 4 byte float
    size-of-index-record = 500 + size-of-vector // aprox. number of bytes for storing HNSW neighborhood info
    total-index-size = graph-nodes * size-of-index-records
  3. Consider additional metadata and vector storage on each of your vector records:

    size-of-vector = dim * 4                          // each dimension is stored as a 4 byte float
    size-of-vector-record = 2048 // typical metadata size is 1-2k
    total-data-size = graph-nodes * size-of-index-records
  4. Add index and data sizes to determine your total storage needs:

    total-unique-data = total-index-size + total-data-size