Skip to main content
Loading

Manage AVS indexes

Overviewโ€‹

This page describes how to create an index and associate it with a set of records. Aerospike Vector Search (AVS) uses an index to traverse the Hierarchical Navigable Small Worlds (HNSW) neighborhoods to perform queries. Indexes are independent from your records, and you can create an index at any time.

Required configurationโ€‹

The following configuration parameters affect the storage and graph of your index. They cannot be updated and do not have defaults.

ParameterDescription
NamespaceNamespace for creating the index, which must already exist in your Aerospike cluster. See configuring Aerospike Database for details about setting up a namespace for your index.
SetsSpecifies where your data is stored in ASDB. Uses the null set by default.
Index nameName of the index. Used primarily for performing searches.
DimensionsNumber of dimensions (length) of the vector. See generating vector embeddings for details about determining the number of dimensions in your vector.
Vector distance metricDistance calculation used by your index. Options include:
SQUARED EUCLIDEAN โ€” Reports squared Euclidean (L2) distance, avoiding the square root. This is still monotonic as the Euclidean distance, but if exact distances are needed, an additional square root of the result is needed.
COSINE - Measures the cosine of the angle between two vectors to determine how similar their direction is, without regarding magnitude.
DOT PRODUCT - Takes into account angle similarity and vector magnitude.
MANHATTAN - Sums the absolute values of the differences between vector components.
HAMMING - Measures the number of dimensions where vectors differ.
Vector fieldField name of the vector in the record. See Adding records to your index for details.
caution

The value of vector field is limited to 15 characters. Using a name longer than 15 characters will cause errors.

HNSW parametersโ€‹

Parameters for the Hierarchical Navigable Small World (HNSW) algorithm, used for approximate nearest neighbor search. These parameters have default values, but they cannot be updated and must be configured at creation time.

ParameterDescriptionDefault
M (max edges)Number of bidirectional links created per level during construction. Larger values lead to higher recall but slower construction.16
EFSize of the dynamic list for the nearest neighbors (candidates) during the search phase. Larger values lead to higher recall but slower search.100
EF constructionSize of the dynamic list for the nearest neighbors (candidates) during index construction. Larger values lead to higher recall but slower construction.100
Max recordsMaximum number of records to fit in a batch.10000
IntervalMaximum amount of time in milliseconds to wait before finalizing a batch.10000
DisabledDisables batching for index updates.False

Example index creationโ€‹

avs_admin_client.index_create(
namespace="avs-index",
name="search-space",
vector_field="img_vector",
dimensions=768,
vector_distance_metric=types.VectorDistanceMetric.COSINE,
sets="index-set",
index_params=types.HnswParams(
m=32,
ef_construction=200,
ef=400,
),
labels={"model-used": "CLIP"},
)

See more details about the Python client.

Configurations available as updatesโ€‹

The following items do not affect index construction or storage, and can be modified as updates after an index is created and can be used to change the performance of your index by adjusting the cache and healer settings.

tip

We recommend using the AVS CLI tool to tune these parameters for your performance needs.

Labelsโ€‹

Stores information about the index (for example, the model used to create the vector embedding). It is not relevant to search behavior.

Max mem queue sizeโ€‹

Size of the ingest queue for your index.

Record Cachingโ€‹

Number of records to keep in memory. default is 0 (off).

Healerโ€‹

These configurations are relevant to the healing process used to manage updates, deletes, and repairs to the HNSW graph. With the default settings, the healer has a small (<20%) impact on performance and throughput.

ParameterDescriptionDefault
Max scan rate perNumber of records for each scan of the healer.500
Max scan page sizeMaximum number of records in a single scanned page.5000
ParallelismSpecifies additional threads used by the healer on a single node.1
Reindex percentPercentage of records randomly selected for reindexing in a healer cycle.-
ScheduleA cron expression for running the healer process, set to run every fifteen minutes by default.Every 15 minutes
Index parallelismSpecifies additional threads used when merging your index.1
Re-index parallelismSpecifies additional threads used for re-indexing, which fixes errors and handles deletions.1

Cachingโ€‹

Settings for the caching of index records.

ParameterDescriptionDefault
Index cache max entriesMaximum number of index records held in the cache.-
ExpiryCache expiration time in milliseconds. Set to -1 to never expire.-1 (no expiry)
MergingAdjusts the specifics for merging changes into the HNSW graph.-

Updating an index with AVS CLIโ€‹

The following provides examples for creating indexes using the supported clients and updating those indexes using the AVS CLI tool. The AVS CLI provides functionality for creating or updating indexes from the command line or using a YAML configuration file. The following example configures record caching, never expires cache entries, and delays the healer schedule to run infrequently.

asvec index update \
--index-name search-space \
--namespace avs-index \
--hnsw-index-cache-expiry -1 \
--hnsw-index-cache-max-entries 1000000 \
--hnsw-record-cache-expiry -1 \
--hnsw-record-cache-max-entries 1000000 \
--hnsw-healer-schedule '* * * ? * * *'

Dropping an indexโ€‹

If you no longer need to search across your data, you can drop an index to free up storage space with asvec index drop:

asvec index drop --index-name INDEX_NAME --namespace NAMESPACE
note

Dropping an index only frees up space for the index records. To delete all underlying vector records, you must delete individual records or delete the set that stores your data.