Manage AVS indexes
Overviewโ
This page describes how to create an index and associate it with a set of records. Aerospike Vector Search (AVS) uses an index to traverse the Hierarchical Navigable Small Worlds (HNSW) neighborhoods to perform queries. Indexes are independent from your records, and you can create an index at any time.
Required configurationโ
The following configuration parameters affect the storage and graph of your index. They cannot be updated and do not have defaults.
Parameter | Description |
---|---|
Namespace | Namespace for creating the index, which must already exist in your Aerospike cluster. See configuring Aerospike Database for details about setting up a namespace for your index. |
Sets | Specifies where your data is stored in ASDB. Uses the null set by default. |
Index name | Name of the index. Used primarily for performing searches. |
Dimensions | Number of dimensions (length) of the vector. See generating vector embeddings for details about determining the number of dimensions in your vector. |
Vector distance metric | Distance calculation used by your index. Options include: SQUARED EUCLIDEAN โ Reports squared Euclidean (L2) distance, avoiding the square root. This is still monotonic as the Euclidean distance, but if exact distances are needed, an additional square root of the result is needed. COSINE - Measures the cosine of the angle between two vectors to determine how similar their direction is, without regarding magnitude. DOT PRODUCT - Takes into account angle similarity and vector magnitude. MANHATTAN - Sums the absolute values of the differences between vector components. HAMMING - Measures the number of dimensions where vectors differ. |
Vector field | Field name of the vector in the record. See Adding records to your index for details. |
The value of vector field is limited to 15 characters. Using a name longer than 15 characters will cause errors.
HNSW parametersโ
Parameters for the Hierarchical Navigable Small World (HNSW) algorithm, used for approximate nearest neighbor search. These parameters have default values, but they cannot be updated and must be configured at creation time.
Parameter | Description | Default |
---|---|---|
M (max edges) | Number of bidirectional links created per level during construction. Larger values lead to higher recall but slower construction. | 16 |
EF | Size of the dynamic list for the nearest neighbors (candidates) during the search phase. Larger values lead to higher recall but slower search. | 100 |
EF construction | Size of the dynamic list for the nearest neighbors (candidates) during index construction. Larger values lead to higher recall but slower construction. | 100 |
Max records | Maximum number of records to fit in a batch. | 10000 |
Interval | Maximum amount of time in milliseconds to wait before finalizing a batch. | 10000 |
Disabled | Disables batching for index updates. | False |
Example index creationโ
- Python client
- Java client
avs_admin_client.index_create(
namespace="avs-index",
name="search-space",
vector_field="img_vector",
dimensions=768,
vector_distance_metric=types.VectorDistanceMetric.COSINE,
sets="index-set",
index_params=types.HnswParams(
m=32,
ef_construction=200,
ef=400,
),
labels={"model-used": "CLIP"},
)
client
.indexCreate(
IndexId.newBuilder().setName("search-space").setNamespace("avs-index").build(),
"img_vector",
768,
VectorDistanceMetric.COSINE,
null,
HnswParams.newBuilder().setM(32).setEfConstruction(200).setEf(400).build(),
IndexStorage.newBuilder().setNamespace("avs-index").setSet("index-set").build(),
Map.of("model-used", "CLIP"),
60 _000,
1 _000);
Configurations available as updatesโ
The following items do not affect index construction or storage, and can be modified as updates after an index is created and can be used to change the performance of your index by adjusting the cache and healer settings.
We recommend using the AVS CLI tool to tune these parameters for your performance needs.
Labelsโ
Stores information about the index (for example, the model used to create the vector embedding). It is not relevant to search behavior.
Max memory queue sizeโ
Size of the ingest queue for your index.
Record cachingโ
Number of records to keep in memory. Default is 0 (off).
Healerโ
These configurations are relevant to the healing process used to manage updates, deletes, and repairs to the HNSW graph. With the default settings, the healer has a small (<20%) impact on performance and throughput.
Parameter | Description | Default |
---|---|---|
Max scan rate per | Number of records for each scan of the healer. | 500 |
Max scan page size | Maximum number of records in a single scanned page. | 5000 |
Parallelism | Specifies additional threads used by the healer on a single node. | 1 |
Reindex percent | Percentage of records randomly selected for reindexing in a healer cycle. | - |
Schedule | A cron expression for running the healer process, set to run every fifteen minutes by default. | Every 15 minutes |
Index parallelism | Specifies additional threads used when merging your index. | 1 |
Re-index parallelism | Specifies additional threads used for re-indexing, which fixes errors and handles deletions. | 1 |
Cachingโ
Settings for the caching of index records.
Parameter | Description | Default |
---|---|---|
Index cache max entries | Maximum number of index records held in the cache. | - |
Expiry | Cache expiration time in milliseconds. Set to -1 to never expire. | -1 (no expiry) |
Merging | Adjusts the specifics for merging changes into the HNSW graph. | - |
Vector integrity checkโ
Vector integrity checks ensure that no updates or deletes occurred to the record before including it in the results. By default, this is enabled to ensure accurate, up to date results. You can disable this to improve query performance if updates are infrequent.
Parameter | Description | Default |
---|---|---|
Vector integrity check | Check whether records have been modified before including them in results. | true |
AVS CLI index updateโ
The following provides examples for creating indexes using the supported clients and updating those indexes using the AVS CLI tool. The AVS CLI provides functionality for creating or updating indexes from the command line or using a YAML configuration file. The following example maximizes the cache. It configures record caching, never expires cache entries, and delays the healer schedule to run infrequently.
asvec index update \
--index-name search-space \
--namespace avs-index \
--hnsw-index-cache-expiry -1 \
--hnsw-index-cache-max-entries 1000000 \
--hnsw-record-cache-expiry -1 \
--hnsw-record-cache-max-entries 1000000 \
--hnsw-healer-schedule '* * * ? * * *'
Dropping an indexโ
If you no longer need to search across your data, you can drop an index to free up storage space with asvec index drop
:
asvec index drop --index-name INDEX_NAME --namespace NAMESPACE
Dropping an index only frees up space for the index records. To delete all underlying vector records, you must delete individual records or delete the set that stores your data.