Skip to main content
Loading

Ingestion and indexing

Overview

This page highlights key concepts related to ingesting and indexing new data in Aerospike Vector Search (AVS). AVS updates vector records in real time, but index records are updated asynchronously. This means that records may be available for retrieval right away, but the same records may not appear immediately in search results.

Supported indexes - HNSW

AVS supports Hierarchical Navigable Small World (HNSW) index types only, constructing a multi-layer graph where nodes represent data points and edges connect each node to its nearest neighbors. The nearest neighbor is calculated according to the distance metric chosen for your index and a given search vector embedding. The neighborhood refers to the set of closest nodes to a given node within the graph. The world refers to the entire set of nodes and edges in the graph, representing the high-dimensional space. HNSW optimizes search efficiency by navigating through hierarchical levels, where higher levels contain fewer nodes, simplifying the search process.

Because each record affects other records in its neighborhood, AVS performs HNSW queries during ingestion to pre-hydrate the index cache. These queries are not reported as query requests, but they do show as reads against the storage layer.

Record data updates

As updates and inserts are made into AVS, data goes through several steps for processing the update. First, record data, including the vector, is written to the Aerospike Database (ASDB). You can see your record data immediately in ASDB. To be indexed, each record must contain at least one vector in the specified vector field of an index. Specifying multiple vectors and indexes creates multiple index processes for a single record, but it enables multiple search approaches on the same data.

tip

Aerospike recommends that when you upsert a record, you assign it to a specific set. This helps with monitoring and operations.

Index construction

After a record is updated, AVS processes the vector for indexing by assembling the neighborhoods for each vector and committing those to ASDB. An index record contains a copy of the vector itself along with the associated neighbors for that vector at a given layer of the HNSW graph. Index construction takes advantage of advanced vector extensions (AVX), which allows for single instruction, multiple data parallel processing.

tip

You can monitor index construction using the indexing_queue_size metric for monitoring your ingest queue and the requests_metric for monitoring your total indexed records.

Asynchronous parallel index construction

A unique aspect of AVS is its ability to manage index construction asynchronously. While vector record updates are committed directly to ASDB, index records are processed asynchronously. This is done in batches, and index construction is spread across all AVS nodes to maximize the use of CPU cores in your AVS cluster. This allows you to scale up for specific ingestion needs. Keep in mind, ingestion is highly dependant on host memory and storage layer configuration.

image

Waiting for index construction

While index construction can be done continuously as a background process, in some circumstances it can be helpful to wait for index construction to complete. This functionally is built into the client or can be monitored using the indexing_queue_size metric.