Manage AVS indexes
AVS offers an extensive set of configuration options for tuning the performance, storage, and recall
of your index. This guide offers examples of the most common configurations to consider. For a complete
set of configuration options you can either review the types package in our Python documentation or
review the help output by running asvec index create --help
.
Index Modes
A key configuration setting for your index is the mode you want your index to operate in. There are two specific modes to utilize.
- Distributed - This is the default index type. It keeps your index searchable while you stream in changes to your index. Use this index type if you’re not sure what mode you need.
- Standalone - Standalone indexing builds your entire index in memory. Once your index is finished being built stand alone it will be automatically set to distributed.
Fixed configurations
The following configuration parameters affect the storage and graph of your index. They cannot be updated once the index is created. You will need to create a new index if you want to change any of these values.
Parameter | Description |
---|---|
Namespace | Namespace for creating the index, which must already exist in your Aerospike cluster. See configuring Aerospike Database for details about setting up a namespace for your index. |
Sets | Specifies where your data is stored in Aerospike Database. Uses the null set by default. |
Index name | Name of the index. Used primarily for performing searches. |
Dimensions | Number of dimensions (length) of the vector. See generating vector embeddings for details about determining the number of dimensions in your vector. |
Vector distance metric | Distance calculation used by your index. Options include: SQUARED EUCLIDEAN — Reports squared Euclidean (L2) distance, avoiding the square root. This is still monotonic as the Euclidean distance, but if exact distances are needed, an additional square root of the result is needed. COSINE - Measures the cosine of the angle between two vectors to determine how similar their direction is, without regarding magnitude. DOT PRODUCT - Takes into account angle similarity and vector magnitude. MANHATTAN - Sums the absolute values of the differences between vector components. HAMMING - Measures the number of dimensions where vectors differ. |
Vector field | Field name of the vector in the record. |
Mode | Indicates if the initial index should be built standalone. |
avs_client.index_create( namespace="avs-index", name="search-space", vector_field="img_vector", dimensions=768, vector_distance_metric=types.VectorDistanceMetric.COSINE, sets="index-set", mode=types.IndexMode.STANDALONE,)
asvec index create \ --index-name search-space \ --namespace avs-index \ --set index-set \ --vector-field img_vector \ --dimension 768 \ --distance-metric COSINE \ --index-mode STANDALONE
HNSW parameters
Adjusting HNSW parameters is the primary way to improve the recall of your searches. Keep in mind improving recall comes at the expense of higher latency, and more storage requirements. Because these paramaters affect the graph you can not update them after creating your index.
Parameter | Description | Default |
---|---|---|
M (max edges) | Number of bidirectional links created per level during construction. Larger values lead to higher recall but slower construction and larger storage requirements. | 16 |
EF | Size of the dynamic list for the nearest neighbors (candidates) during the search phase. Larger values lead to higher recall but slower searches and higher resource utilization. | 100 |
EF construction | Size of the dynamic list for the nearest neighbors (candidates) during index construction. Larger values lead to higher recall but slower index build and higher resource utilization. | 100 |
avs_client.index_create( namespace="avs-index", name="search-space", vector_field="img_vector", dimensions=768, vector_distance_metric=types.VectorDistanceMetric.COSINE, sets="index-set", mode=types.Mode.STANDALONE, namespace="avs-index", name="search-space", index_params=types.HnswParams( m=32, ef_construction=200, ef=400, ),)
asvec index create \ --index-name search-space \ --namespace avs-index \ --set index-set \ --vector-field img_vector \ --dimension 768 \ --distance-metric COSINE \ --index-mode STANDALONE --hnsw-m 16 \ --hnsw-ef 100 \ --hnsw-ef-construction 10000 \
Dynamic Configurations
The following items do not affect index construction or storage, and can be modified as updates after an index is created and can be used to change the performance of your index by adjusting the cache and healer settings.
Caching
The primary way to reduce the latency of your searches is to adjust the cache settings of your index.
Parameter | Description | Default |
---|---|---|
Index cache max entries | Maximum number of index records held in the cache. | 2,000,000 |
Index cache expiry | Cache expiration time in milliseconds. Set to -1 to never expire. | -1 (no expiry) |
Record cache max entries | Maximum number of vector records held in the cache. | 0 (off) |
Record cache expire | Seconds to keep record values in cache. (-1 for infinity) | 0 (off) |
Vector integrity check | Check whether cached vector records have been modified before including them in results. Setting this to false will improve latency but could include stale results. | true |
avs_client.index_update( namespace="avs-index", name="search-space", hnsw_update_params=types.HnswIndexUpdate( index_caching_params=types.HnswCachingParams( max_entries=2000000, expiry=-1, vector_integrity_check=true, ), record_caching_params=types.HnswCachingParams( max_entries=2000000, expiry=-1, ), ),)
asvec index update \ --index-name search-space \ --namespace avs-index \ --hnsw-index-cache-expiry -1 --hnsw-index-cache-max-entries 2000000 --hnsw-record-cache-expiry -1 --hnsw-record-cache-max-entries 2000000 --hnsw-vector-integrity-check true
Healer
These values control the healing process associated with your index on indexer
or mixed nodes.
The values tell the cluster how aggressively to manage the healing process for a particular index.
By default these values have minimal impact (<20%) on throughput and query performance.
Parameter | Description | Default |
---|---|---|
Parallelism | Specifies additional threads used by the healer on a single node. Increasing this will increase the amount of CPU spent towards healing. | 1 |
Schedule | A cron expression for running the healer process, set to run every fifteen minutes by default. | Every 15 minutes |
avs_client.index_update( namespace="avs-index", name="search-space", hnsw_update_params=types.HnswIndexUpdate( healer_params=types.HnswHealerParams( parallelism=1, schedule="*/15 * * * *", ), ),)
asvec index update \ --index-name search-space \ --namespace avs-index \ --hnsw-healer-parallelism 1 \ --hnsw-healer-schedule "*/15 * * * *"
Additional configuration
The following are additional configurations that can be helpful.
Parameter | Description | Default |
---|---|---|
Labels | Stores information about the index (for example, the model used to create the vector embedding). It is not relevant to search behavior. | N/A |
avs_client.index_update( namespace="avs-index", name="search-space", index_labels={"model-used": "CLIP"},)
asvec index update \ --index-name search-space \ --namespace avs-index \ --index-labels model=CLIP
Dropping an index
If you no longer need to search across your data, you can drop an index to free up storage. The delete process is handled by the healer based on the original configuration of the index:
avs_client.index_drop( namespace="avs-index", name="search-space",)
asvec index drop --index-name INDEX_NAME --namespace NAMESPACE
Troubleshooting
We recommend that you use the asvec CLI tool for troubleshooting your index.
Standalone indexing is not finishing
Standalone indexing requires the following two things to succeed:
-
A standalone indexer node role available in your cluster, and
-
Sufficient resources on that node to fit the index in memory.
If either of these conditions are not met the index status will not become
READY
.
-
Verify that there is a
STANDALONE INDEXER
node.asvec node ls╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Nodes │├───┬────────────────┬────────────────────────┬─────────────────────┬─────────────────────┬──────────┬───────────────────────────────────────────┤│ │ NODE │ ROLES │ ENDPOINT │ CLUSTER ID │ VERSION │ VISIBLE NODES │├───┼────────────────┼────────────────────────┼─────────────────────┼─────────────────────┼──────────┼───────────────────────────────────────────┤│ 1 │ 37627287442312 │ [INDEX_QUERY] │ 34.56.201.26:5000 │ 1685524865657087021 │ 1.1.0 │ { ││ │ │ │ │ │ │ 37912119677832: [34.123.26.109:5000] ││ │ │ │ │ │ │ 37956902785928: [34.133.135.181:5000] ││ │ │ │ │ │ │ 39450770215816: [35.225.89.37:5000] ││ │ │ │ │ │ │ } │├───┼────────────────┼────────────────────────┼─────────────────────┤ ├──────────┼───────────────────────────────────────────┤│ 2 │ 37912119677832 │ [INDEX_QUERY] │ 34.123.26.109:5000 │ │ 1.1.0 │ { ││ │ │ │ │ │ │ 37627287442312: [34.56.201.26:5000] ││ │ │ │ │ │ │ 37956902785928: [34.133.135.181:5000] ││ │ │ │ │ │ │ 39450770215816: [35.225.89.37:5000] ││ │ │ │ │ │ │ } │├───┼────────────────┼────────────────────────┼─────────────────────┤ ├──────────┼───────────────────────────────────────────┤│ 3 │ 37956902785928 │ [STANDALONE_INDEXER] │ 34.133.135.181:5000 │ │ 1.1.0 │ { ││ │ │ │ │ │ │ 37627287442312: [34.56.201.26:5000] ││ │ │ │ │ │ │ 37912119677832: [34.123.26.109:5000] ││ │ │ │ │ │ │ 39450770215816: [35.225.89.37:5000] ││ │ │ │ │ │ │ } │├───┼────────────────┼────────────────────────┼─────────────────────┤ ├──────────┼───────────────────────────────────────────┤│ 4 │ 39450770215816 │ [INDEXER INDEX_UPDATE] │ 35.225.89.37:5000 │ │ 1.1.0 │ { ││ │ │ │ │ │ │ 37627287442312: [34.56.201.26:5000] ││ │ │ │ │ │ │ 37912119677832: [34.123.26.109:5000] ││ │ │ │ │ │ │ 37956902785928: [34.133.135.181:5000] ││ │ │ │ │ │ │ } │╰───┴────────────────┴────────────────────────┴─────────────────────┴─────────────────────┴──────────┴───────────────────────────────────────────╯ -
If you have a
STANDALONE-INDEXER
node, use the following command to confirm the status of your index. Check theSTATUS
column forNOT-READY
.asvec index ls╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Indexes │├───┬────────────────────────┬───────────┬────────────────────┬───────────────┬────────────┬───────────────────┬──────────┬────────────────┬───────────┬────────────┬─────────────┬──────────┤│ │ NAME │ NAMESPACE │ SET │ FIELD │ DIMENSIONS │ DISTANCE METRIC │ UNMERGED │ VECTOR RECORDS │ SIZE │ UNMERGED % │ MODE* │ STATUS │├───┼────────────────────────┼───────────┼────────────────────┼───────────────┼────────────┼───────────────────┼──────────┼────────────────┼───────────┼────────────┼─────────────┼──────────┤│ 1 │ sift-128-euclidean_Idx │ avs-data │ sift-128-euclidean │ HDF_embedding │ 128 │ SQUARED_EUCLIDEAN │ 0 │ 0 │ 0 │ 0 │ STANDALONE │NOT-READY │╰───┴────────────────────────┴───────────┴────────────────────┴───────────────┴────────────┴───────────────────┴──────────┴────────────────┴───────────┴────────────┴─────────────┴──────────╯ -
When the status says READY, run the following command to switch your index mode to
DISTRIBUTED
to preserve your index updates.asvec index update \--index-name <INDEX-NAME> \--namespace <NAMESPACE> \--index-mode DISTRIBUTED -
After updating to distributed mode, monitor the status of your index by checking to see how much remains unmerged.
asvec index ls --seeds 35.225.89.37╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Indexes │├───┬────────────────────────┬───────────┬────────────────────┬───────────────┬────────────┬───────────────────┬──────────┬────────────────┬───────────┬────────────┬─────────────┬────────┤│ │ NAME │ NAMESPACE │ SET │ FIELD │ DIMENSIONS │ DISTANCE METRIC │ UNMERGED │ VECTOR RECORDS │ SIZE │ UNMERGED % │ MODE* │ STATUS │├───┼────────────────────────┼───────────┼────────────────────┼───────────────┼────────────┼───────────────────┼──────────┼────────────────┼───────────┼────────────┼─────────────┼────────┤│ 1 │ sift-128-euclidean_Idx │ avs-data │ sift-128-euclidean │ HDF_embedding │ 128 │ SQUARED_EUCLIDEAN │ 464334 │ 548302 │ 549.43 MB │ 81.56% │ DISTRIBUTED │ READY │╰───┴────────────────────────┴───────────┴────────────────────┴───────────────┴────────────┴───────────────────┴──────────┴────────────────┴───────────┴────────────┴─────────────┴────────╯
Switching between index modes
As shown in the previous example, switching from STANDALONE
mode to DISTRIBUTED
mode hands over index construction
to the index healer, resulting in an eventually consistent index state. Switching from DISTRIBUTED
to STANDALONE
forces an entire rebuild of your index.