Configure Aerospike Database for AVS
Overview
This page describes how to configure the Aerospike Database (ASDB) for use with Aerospike Vector Search (AVS).
AVS requires ASDB version 7.x. For more information, see the FAQ.
Configure the namespace for index metadata
AVS relies on a small namespace for storing metadata about your indexes. The metadata includes the name of your index, details about the distance calculation, and dimensionality. The name of this namespace in Aerospike is avs-meta
and must have NSUP set. We recommend the following configuration in your aerospike.conf
files.
namespace avs-meta {
replication-factor 2
memory-size 5G
storage-engine memory
nsup-period 10
}
See Configuring AVS for details about changing the metadata namespace name.
Specify a namespace for vector data
You need at least one namespace for storing vector records. For example, the default aerospike.conf
file includes the test
namespace. To use the test
namespace for storing vector data, specify it when inserting vectors.
Create a namespace for index storage
To create a unique namespace for index records, add the namespace to your aerospike.conf
files and restart the Aerospike process on each Aerospike Database node.
With a unique namespace for index records, you can enable different storage features independently across your data and index. The following example configures replication on record data but not on index data.
namespace avs-index {
replication-factor 1
nsup-period 60
storage-engine device {
file /opt/aerospike/data/index.dat
filesize 8G
}
}
namespace avs-data {
replication-factor 2
nsup-period 60
storage-engine device {
file /opt/aerospike/data/data.dat
filesize 8G
}
}
To add a set, include the set when creating your index. This creates the set dynamically. No extra configuration to the Aerospike Database is required to use a set.
Monitor index storage using sets
With a unique set for your index data, you can better monitor the storage used specifically by the index. Sets are created dynamically within the Aerospike Database and require no intervention by an administrator.
By default, the Python client creates a set based on the index name.
Estimate total storage requirements
To estimate total storage requirements, you must know the following:
- vectors - number of vectors
- dim - number of dimensions of your vectors
- metadata - size in bytes of additional data stored with your vector record
The following calculations assume the default index configuration settings.
Calculate the number of HNSW index entries, which can be estimated using the number of vectors:
graph-nodes = vectors * 1.06
To calculate the size of an index in bytes, you need the number of dimensions of your vectors:
size-of-vector = dim * 4 // each dimension is stored as a 4 byte float
size-of-index-record = 500 + size-of-vector // aprox. number of bytes for storing HNSW neighborhood info
total-index-size = graph-nodes * size-of-index-recordsConsider additional metadata and vector storage on each of your vector records:
size-of-vector = dim * 4 // each dimension is stored as a 4 byte float
size-of-vector-record = 2048 // typical metadata size is 1-2k
total-data-size = graph-nodes * size-of-index-recordsAdd index and data sizes to determine your total storage needs:
total-unique-data = total-index-size + total-data-size
The following table provides some example values, and you can use the Vector Sizing spreadsheet to calculate your own index sizings for the total the storage requirements.
Description | Vectors | Dimensions | Metadata (bytes) | Index Size (TB) | Total Data (TB) |
---|---|---|---|---|---|
1 billion, low-dimension | 1,000,000,000 | 128 | 2048 | 1.0 | 3.3 |
1 billion, medium-dimension | 1,000,000,000 | 768 | 2048 | 3.5 | 8.1 |
1 billion, high-dimension | 1,000,000,000 | 3072 | 2048 | 12.4 | 25.4 |
Total | 3,000,000,000 | 16.9 | 36.8 |