Skip to content

Manage AVS indexes

AVS offers an extensive set of configuration options for tuning the performance, storage, and recall of your index. This guide offers examples of the most common configurations to consider. For a complete set of configuration options you can either review the types package in our Python documentation or review the help output by running asvec index create --help.

Index modes

A key configuration setting for your index is the mode you want your index to operate in. An index can operate in one of two modes:

  • Distributed: This is the default index type. It keeps your index searchable while you stream in changes to your index. Use this index type if you’re not sure what mode you need.
  • Standalone: Standalone indexing builds your entire index in memory. After your standalone index is finished being built, it is automatically set to distributed.

Fixed configurations

The configuration parameters in the following table affect the storage and graph of your index. These parameters cannot be updated after an index is created. You must create a new index if you want to change any of these values.

The syntax of parameter names differs slightly between the Python client and asvec. Refer to the relevant tab below the table to see the specific syntax.

ParameterDescription
NamespaceNamespace for creating the index, which must already exist in your Aerospike cluster. See configuring Aerospike Database for details about setting up a namespace for your index.
SetsSpecifies where your data is stored in Aerospike Database. Uses the null set by default.
Index nameName of the index. Used primarily for performing searches.
DimensionsNumber of dimensions (length) of the vector. See generating vector embeddings for details about determining the number of dimensions in your vector.
Vector distance metricDistance calculation used by your index. Options include:
SQUARED EUCLIDEAN — Reports squared Euclidean (L2) distance, avoiding the square root. This is still monotonic as the Euclidean distance, but if exact distances are needed, an additional square root of the result is needed.
COSINE - Measures the cosine of the angle between two vectors to determine how similar their direction is, without regarding magnitude.
DOT PRODUCT - Takes into account angle similarity and vector magnitude.
MANHATTAN - Sums the absolute values of the differences between vector components.
HAMMING - Measures the number of dimensions where vectors differ.
Vector fieldField name of the vector in the record.
ModeIndicates if the initial index should be built standalone.
avs_client.index_create(
namespace="avs-index",
sets="index-set",
name="search-space",
dimensions=768,
vector_distance_metric=types.VectorDistanceMetric.COSINE,
vector_field="img_vector",
mode=types.IndexMode.STANDALONE,
)

See index_create() for complete details about creating indexes with the Python client.

HNSW parameters

Adjust HNSW parameters to improve the recall of your searches or to control how many relevant items in the search space your search returns. Improving recall comes at the expense of higher latency and more storage requirements. Because these parameters affect the graph, you cannot update them after creating your index.

ParameterDescriptionDefault
M (max edges)Number of bidirectional links created per level during construction. Larger values lead to higher recall but slower construction and larger storage requirements.16
EF (search)The default size of the “exploration factor” for the nearest neighbors (candidates) during the search phase. Larger values lead to higher recall but slower searches and higher resource utilization. This can be adjusted after the fact in the search call. See HnswSearchParams for more information.100
EF constructionSize of the dynamic list for the nearest neighbors (candidates) stored during index construction. Larger values lead to larger indexes with higher recall but higher resource utilization.100
avs_client.index_create(
namespace="avs-index",
name="search-space",
vector_field="img_vector",
dimensions=768,
vector_distance_metric=types.VectorDistanceMetric.COSINE,
sets="index-set",
mode=types.Mode.STANDALONE,
namespace="avs-index",
name="search-space",
index_params=types.HnswParams(
m=32,
ef_construction=200,
ef=400,
),
)

See HnswParams for complete details.

Dynamic configurations

The following items do not affect index construction or storage, and can be modified as updates after an index is created and can be used to change the performance of your index by adjusting the cache and healer settings.

Caching

The primary way to reduce the latency of your searches is to adjust the cache settings of your index.

ParameterDescriptionDefault
Index cache max entriesMaximum number of index records held in the cache.2,000,000
Index cache expiryCache expiration time in milliseconds. Set to -1 to never expire.3,600,000 (1 hour)
Record cache max entriesMaximum number of vector records held in the cache.0 (off)
Record cache expireSeconds to keep record values in cache. (-1 for infinity)0 (off)
avs_client.index_update(
namespace="avs-index",
name="search-space",
hnsw_update_params=types.HnswIndexUpdate(
index_caching_params=types.HnswCachingParams(
max_entries=2000000,
expiry=-1
),
record_caching_params=types.HnswCachingParams(
max_entries=2000000,
expiry=-1,
)
),
)

See HnswIndexUpdate for complete details.

Vector Integrity Check

The vector integrity check flag checks whether cached vector records have been modified before including them in the results. This flag is true by default. Setting this flag to false can reduce latency, but it might result in the inclusion of stale results in your index.

avs_client.index_update(
namespace="avs-index",
name="search-space",
hnsw_update_params=types.HnswIndexUpdate(
enable_vector_integrity_check=false
),
)

See HnswIndexUpdate for complete details.

Healer

These values control the healing process associated with your index on indexer or mixed nodes. The values tell the cluster how aggressively to manage the healing process for a particular index. By default these values have minimal impact (<20%) on throughput and query performance.

ParameterDescriptionDefault
ParallelismSpecifies additional threads used by the healer on a single node. Increasing this will increase the amount of CPU spent towards healing.1
ScheduleA cron expression for running the healer process, set to run every fifteen minutes by default.Every 15 minutes
avs_client.index_update(
namespace="avs-index",
name="search-space",
hnsw_update_params=types.HnswIndexUpdate(
healer_params=types.HnswHealerParams(
parallelism=1,
schedule="*/15 * * * *",
),
),
)

See HnswHealerParams for complete details.

Additional configuration

The following are additional configurations that can be helpful.

ParameterDescriptionDefault
LabelsStores information about the index (for example, the model used to create the vector embedding). It is not relevant to search behavior.N/A
avs_client.index_update(
namespace="avs-index",
name="search-space",
index_labels={"model-used": "CLIP"},
)

See IndexDefinition for complete details.

Dropping an index

If you no longer need to search across your data, you can drop, or delete, an index to free up storage. The delete process is handled by the healer based on the original configuration of the index:

avs_client.index_drop(
namespace="avs-index",
name="search-space",
)

Troubleshooting

We recommend that you use the asvec CLI tool for troubleshooting your index.

Standalone indexing is not finishing

Standalone indexing requires the following two things to succeed:

  • A standalone indexer node role available in your cluster, and

  • Sufficient resources on that node to fit the index in memory.

    If either of these conditions are not met the index status will not become READY.

  1. Verify that there is a STANDALONE_INDEXER node.

    asvec node ls
    ╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
    │ Nodes │
    ├───┬────────────────┬────────────────────────┬─────────────────────┬─────────────────────┬──────────┬───────────────────────────────────────────┤
    │ │ NODE │ ROLES │ ENDPOINT │ CLUSTER ID │ VERSION │ VISIBLE NODES │
    ├───┼────────────────┼────────────────────────┼─────────────────────┼─────────────────────┼──────────┼───────────────────────────────────────────┤
    │ 1 │ 37627287442312 │ [INDEX_QUERY] │ 34.56.201.26:5000 │ 1685524865657087021 │ 1.1.0 │ { │
    │ │ │ │ │ │ │ 37912119677832: [34.123.26.109:5000] │
    │ │ │ │ │ │ │ 37956902785928: [34.133.135.181:5000] │
    │ │ │ │ │ │ │ 39450770215816: [35.225.89.37:5000] │
    │ │ │ │ │ │ │ } │
    ├───┼────────────────┼────────────────────────┼─────────────────────┤ ├──────────┼───────────────────────────────────────────┤
    │ 2 │ 37912119677832 │ [INDEX_QUERY] │ 34.123.26.109:5000 │ │ 1.1.0 │ { │
    │ │ │ │ │ │ │ 37627287442312: [34.56.201.26:5000] │
    │ │ │ │ │ │ │ 37956902785928: [34.133.135.181:5000] │
    │ │ │ │ │ │ │ 39450770215816: [35.225.89.37:5000] │
    │ │ │ │ │ │ │ } │
    ├───┼────────────────┼────────────────────────┼─────────────────────┤ ├──────────┼───────────────────────────────────────────┤
    │ 3 │ 37956902785928 │ [STANDALONE_INDEXER] │ 34.133.135.181:5000 │ │ 1.1.0 │ { │
    │ │ │ │ │ │ │ 37627287442312: [34.56.201.26:5000] │
    │ │ │ │ │ │ │ 37912119677832: [34.123.26.109:5000] │
    │ │ │ │ │ │ │ 39450770215816: [35.225.89.37:5000] │
    │ │ │ │ │ │ │ } │
    ├───┼────────────────┼────────────────────────┼─────────────────────┤ ├──────────┼───────────────────────────────────────────┤
    │ 4 │ 39450770215816 │ [INDEXER INDEX_UPDATE] │ 35.225.89.37:5000 │ │ 1.1.0 │ { │
    │ │ │ │ │ │ │ 37627287442312: [34.56.201.26:5000] │
    │ │ │ │ │ │ │ 37912119677832: [34.123.26.109:5000] │
    │ │ │ │ │ │ │ 37956902785928: [34.133.135.181:5000] │
    │ │ │ │ │ │ │ } │
    ╰───┴────────────────┴────────────────────────┴─────────────────────┴─────────────────────┴──────────┴───────────────────────────────────────────╯
  2. If you have a STANDALONE_INDEXER node, use the following command to confirm the status of your index. Check the STATUS column for NOT-READY.

    asvec index ls
    ╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
    │ Indexes │
    ├───┬────────────────────────┬───────────┬────────────────────┬───────────────┬────────────┬───────────────────┬──────────┬────────────────┬───────────┬────────────┬─────────────┬──────────┤
    │ │ NAME │ NAMESPACE │ SET │ FIELD │ DIMENSIONS │ DISTANCE METRIC │ UNMERGED │ VECTOR RECORDS │ SIZE │ UNMERGED % │ MODE* │ STATUS │
    ├───┼────────────────────────┼───────────┼────────────────────┼───────────────┼────────────┼───────────────────┼──────────┼────────────────┼───────────┼────────────┼─────────────┼──────────┤
    │ 1 │ sift-128-euclidean_Idx │ avs-data │ sift-128-euclidean │ HDF_embedding │ 128 │ SQUARED_EUCLIDEAN │ 0 │ 0 │ 0 │ 0 │ STANDALONE │NOT-READY │
    ╰───┴────────────────────────┴───────────┴────────────────────┴───────────────┴────────────┴───────────────────┴──────────┴────────────────┴───────────┴────────────┴─────────────┴──────────╯
  3. When the status says READY, run the following command to switch your index mode to DISTRIBUTED to preserve your index updates.

    asvec index update \
    --index-name <INDEX-NAME> \
    --namespace <NAMESPACE> \
    --index-mode DISTRIBUTED
  4. After updating to distributed mode, monitor the status of your index by checking to see how much remains unmerged.

    asvec index ls --seeds 35.225.89.37
    ╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
    │ Indexes │
    ├───┬────────────────────────┬───────────┬────────────────────┬───────────────┬────────────┬───────────────────┬──────────┬────────────────┬───────────┬────────────┬─────────────┬────────┤
    │ │ NAME │ NAMESPACE │ SET │ FIELD │ DIMENSIONS │ DISTANCE METRIC │ UNMERGED │ VECTOR RECORDS │ SIZE │ UNMERGED % │ MODE* │ STATUS │
    ├───┼────────────────────────┼───────────┼────────────────────┼───────────────┼────────────┼───────────────────┼──────────┼────────────────┼───────────┼────────────┼─────────────┼────────┤
    │ 1 │ sift-128-euclidean_Idx │ avs-data │ sift-128-euclidean │ HDF_embedding │ 128 │ SQUARED_EUCLIDEAN │ 464334 │ 548302 │ 549.43 MB │ 81.56% │ DISTRIBUTED │ READY │
    ╰───┴────────────────────────┴───────────┴────────────────────┴───────────────┴────────────┴───────────────────┴──────────┴────────────────┴───────────┴────────────┴─────────────┴────────╯

Switching between index modes

As shown in the previous example, switching from STANDALONE mode to DISTRIBUTED mode hands over index construction to the index healer, resulting in an eventually consistent index state. Switching from DISTRIBUTED to STANDALONE forces an entire rebuild of your index.

Feedback

Was this page helpful?

What type of feedback are you giving?

What would you like us to know?

+Capture screenshot

Can we reach out to you?