Skip to content
Visit booth 3171 at Google Cloud Next to see how to unlock real-time decisions at scaleMore info

Configure Aerospike Database for AVS

This page describes how to configure the Aerospike Database for use with Aerospike Vector Search (AVS).

Configure the namespace for index metadata

AVS relies on a small namespace for storing metadata about your indexes. The metadata includes the name of your index, details about the distance calculation, and dimensionality. The name of this namespace in Aerospike is avs-meta and must have NSUP set. We recommend the following configuration in your aerospike.conf files.

namespace avs-meta {
replication-factor 1
nsup-period 100
storage-engine memory {
data-size 1G
}
# To use file storage backing, comment out the line above and use the
# following lines instead.
# storage-engine device {
# file /opt/aerospike/data/bar.dat
# filesize 16G
# data-in-memory true # Store data in memory in addition to file.
# }
}

Specify a namespace for vector data

You need at least one namespace for storing vector records. For example, the default aerospike.conf file includes the test namespace. To use the test namespace for storing vector data, specify it when inserting vectors.

Create a namespace for index storage

To create a unique namespace for index records, add the namespace to your aerospike.conf files and restart the Aerospike process on each Aerospike Database node.

With a unique namespace for index records, you can enable different storage features independently across your data and index. The following example configures replication on record data but not on index data.

namespace avs-index {
replication-factor 1
nsup-period 60
storage-engine device {
file /opt/aerospike/data/index.dat
filesize 8G
}
}
namespace avs-data {
replication-factor 2
nsup-period 60
storage-engine device {
file /opt/aerospike/data/data.dat
filesize 8G
}
}

To add a set, include the set when creating your index. This creates the set dynamically. No extra configuration to the Aerospike Database is required to use a set.

Monitor index storage using sets

With a unique set for your index data, you can better monitor the storage used specifically by the index. Sets are created dynamically within the Aerospike Database and require no intervention by an administrator.

Estimate total storage requirements

To estimate total storage requirements, you must know the following:

  • vectors - number of vectors
  • dim - number of dimensions of your vectors
  • metadata - size in bytes of additional data stored with your vector record
  1. Calculate the number of HNSW index entries, which can be estimated using the number of vectors:

    graph-nodes = vectors * 1.06
  2. To calculate the size of an index in bytes, you need the number of dimensions of your vectors:

    size-of-vector = dim * 4 // each dimension is stored as a 4 byte float
    size-of-index-record = 500 + size-of-vector // aprox. number of bytes for storing HNSW neighborhood info
    total-index-size = graph-nodes * size-of-index-records
  3. Consider additional metadata and vector storage on each of your vector records:

    size-of-vector = dim * 4 // each dimension is stored as a 4 byte float
    size-of-vector-record = 2048 // typical metadata size is 1-2k
    total-data-size = graph-nodes * size-of-index-records
  4. Add index and data sizes to determine your total storage needs:

    total-unique-data = total-index-size + total-data-size

The following table provides some example values, and you can use the Vector Sizing spreadsheet to calculate your own index sizings for the total the storage requirements.

DescriptionVectorsDimensionsMetadata (bytes)Index Size (TB)Total Data (TB)
1 billion, low-dimension1,000,000,00012820481.03.3
1 billion, medium-dimension1,000,000,00076820483.58.1
1 billion, high-dimension1,000,000,0003072204812.425.4
Total3,000,000,00016.936.8
Feedback

Was this page helpful?

What type of feedback are you giving?

What would you like us to know?

+Capture screenshot

Can we reach out to you?