Skip to main content
Loading

Manage AVS indexes

Overviewโ€‹

This page describes how to create an index and associate it with a set of records. Aerospike Vector Search (AVS) uses an index to traverse the Hierarchical Navigable Small Worlds (HNSW) neighborhoods to perform queries. Indexes are independent from your records, and you can create an index at any time.

We recommend that you manage and monitor indexes in AVS with the asvec tool. If you prefer, you can use the same settings to manage indexes using the Python admin client.

tip

The syntax of the index properties described in the following sections is specific to the tool in use. Syntax is illustrated in code examples, and you can see full details in AVS CLI tool, developer docs for the Python client, and developer docs for the Java client.

Required configurationโ€‹

When creating an index, use the following required fields to provide details about your Aerospike Database (ASDB) cluster and specifics about your vector embeddings:

  • Namespace - Namespace for creating the index, which must already exist in your Aerospike cluster. See configuring Aerospike Database for details about setting up a namespace for your index.
  • Index name - Name of the index. Used primarily for performing searches.
  • Dimensions - Number of dimensions (length) of the vector. See generating vector embeddings for details about determining the number of dimensions in your vector.
  • Vector distance metric - Distance calculation used by your index. Options include:
    • SQUARED EUCLIDEAN โ€” This reports squared Euclidean (L2) distance, avoiding the square root. This is still monotonic as the Euclidean distance, but if exact distances are needed, an additional square root of the result is needed.
    • COSINE
    • DOT PRODUCT
    • MANHATTAN
    • HAMMING
  • Vector field - Field name of the vector in the record. See Adding records to your index for details.
caution

The value of vector field is limited to 15 characters. Using a name greater than 15 characters will cause errors.

Optional configurationโ€‹

The following fields are optional when creating an index. Setting these overrides the index and affects the performance and quality of your index results.

Configure upon creationโ€‹

The following items affect index construction and must be set during index creation. They are not available as updates. You must create a new index if you wish to change these values after creation.

  • Sets - Specifies where your data is stored in ASDB. Uses the null set by default.
  • HNSW params - Parameters for the Hierarchical Navigable Small World (HNSW) algorithm, used for approximate nearest neighbor search.
    • M (max edges) - Number of bi-directional links created per level during construction. Larger M values lead to higher recall but slower construction. Defaults to 16.
    • EF - Size of the dynamic list for the nearest neighbors (candidates) during the search phase. Larger EF values lead to higher recall but slower search. Defaults to 100.
    • EF construction - Size of the dynamic list for the nearest neighbors (candidates) during the index construction. Larger EF construction values lead to higher recall but slower construction. Defaults to 100.
    • Batching params - Parameters related to configuring batch processing, such as the maximum number of records per batch and batching interval.
      • Max records - Maximum number of records to fit in a batch. Defaults to 10000. interval
      • Interval - Maximum amount of time in milliseconds to wait before finalizing a batch. Defaults to 10000. disabled
      • Disabled - Disables batching for index updates. Default is False.

Available as updatesโ€‹

The following items do not affect index construction, and can be modified as updates after an index is created. You can update an index using asvec.

  • Labels - Stores information about the index (for example, the model used to create the vector embedding). Do not use for searching data.
  • Max mem queue size - Size of the ingest queue for your index. Use this to override the index configuration value set on the node.
  • Cache max entries - Number of index records held in the cache. Use this to override the cache configuration value set on the node.
  • Cache expiry - Cache expiration time in milliseconds. Use this to override the cache configuration value set on the node.
  • Healer max scan rate per node - Number of records for each scan of the healer. Use this to override the healer configuration value set on the node.
  • Healer max scan page size - Maximum number of records in a single scanned page. Use this to override the healer configuration value set on the node.
  • Healer reindex percent - Percentage of records randomly selected for reindexing in a healer cycle. Use this to override the healer configuration value set on the node.
  • Healer scheduler delay - Delay between healer cycles, which can be increased to limit healer CPU consumption. Default is 0.
  • Healer parallelism - Specifies additional threads used by the healer on a single node. Default is 1.
  • Merge parallelism - Specifies additional threads used when merging your index. Default is 1.

Create an indexโ€‹

The following example creates a basic index. For details, see AVS CLI tool or the Python API documentation.

asvec index create \
--index-name search-space \
--namespace test \
--sets index-set \
--dimension 8 \
--distance-metric COSINE \
--vector-field img_vector
note

When creating an index, you must define the Aerospike namespace where the data will be stored. Indexes can grow quite large and take time to build. For more details about namespace and set configurations on an index, see Configure Aerospike Database for AVS.

Add records to your indexโ€‹

To make your records searchable, specify the appropriate set and vector field when upserting records. For example, to add vector records to the example index created in the previous section, specify the vector field as img_vector.

Upsert example:

avs_client.upsert(
namespace="test",
key="b24f7e3a-9c38-4b0e-b5e8-d6f5b8f0a921",
record_data={
#vector field much match the one defined above
"img_vector": [1,2,3,4,5,6,7,8],

#optional vector metadata
"image_path": f"b24f7e3a-9c38-4b0e-b5e8-d6f5b8f0a921.jpg",
"map": {"a": "A", "inlist": [1, 2, 3]},
"list": ["a", 1, "c", {"a": "A"}]
},

##optional set specification
set_name="index-set"
)

Dropping an indexโ€‹

If you no longer need to search across your data, you can drop an index to free up storage space. The following example illustrates how to drop an index:

asvec index drop --index-name <index-name> --namspace <namespace>
note

Dropping an index only frees up space for the index records. To delete all of the underlying vector records, you must delete individual records or delete the set that stores your data.