Using Aerospike Vector Search standalone indexes
Optimize your data indexing with Aerospike Vector Search's standalone mode. Learn how to set up, index, and perform vector searches efficiently.
By default, Aerospike Vector Search (AVS) indexes operate in distributed mode, honoring queries as requested and indexing records as they are written. AVS 1.1.0 now includes support for a standalone index mode, allowing users to build their indexes more efficiently in memory before transitioning them into distributed mode. This blog post covers how to leverage this new feature using the AVS Python client.
What is a standalone index?
The standalone index mode in AVS enables rapid index construction by temporarily storing the index in memory on a single node. Once the construction is complete, the index is automatically switched to distributed mode, allowing real-time updates and queries across the cluster.
Key differences between standalone and distributed indexes
The table below compares standalone and distributed indexes in AVS 1.1.0. Standalone mode enables fast bulk indexing in memory but is not searchable until it transitions to distributed mode. Distributed mode supports real-time indexing and queries, handling continuous updates across multiple nodes.
Feature | Standalone mode | Distributed mode |
---|---|---|
Use case |
Bulk data ingestion before search is needed | Real-time search and indexing |
Index construction | Built entirely in memory on a single node taking advantage of the full hardware. | Built-in small batches with real-time updates being persisted to Aerospike storage. |
Performance constraints | Much faster initial indexing but limited to indexes that can fit within the RAM of a single node. | Slower due to distributed real-time updates but able to handle very large index updates. |
Queries | Not possible until processing is finished and transitioned to Distributed Mode. | Searching always supported and does not affect indexing performance. |
Failure modes | Failures not to be completely reprocessed. | Incremental updates are guaranteed with the index being eventually consistent. |
Monitoring | Index status can be either READY or NOT READY | Monitor percent of records unmerged. |
Prerequisites
Before using the standalone index, ensure that you have:
Local Python 3.9 environment (or use our basic search python notebook).
The aerospike-vector-search Python package version 4.1.1 or newer installed:
pip install aerospike-vector-search
An AVS 1.1.0 or newer cluster running that is accessible with at least one node with the standalone-indexer node role configured.
Adding data to be indexed
Unlike distributed indexes, standalone indexes immediately begin to process data that has already been written. Therefore, the records you want to index should be written before the standalone index is created.
You can insert vector records using the upsert
method.
for i in range(1000):
key = f"vec_{i}"
vector_data = [float(i) for _ in range(DIMENSIONS)]
client.upsert(
namespace=NAMESPACE,
set_name="vector_set",
key=key,
record_data={VECTOR_FIELD: vector_data},
)
Creating a standalone index
Now that the data has been written, create a standalone index to quickly index the data in bulk. Using the AVS Python client, create a standalone Index by specifying the IndexMode.STANDALONE
mode parameter.
from aerospike_vector_search import Client, types
client = Client(seeds=types.HostPort(host="AVS_HOST", port=5000))
# Define index parameters
INDEX_NAME = "stdalone_index"
NAMESPACE = "test"
VECTOR_FIELD = "vecdata"
DIMENSIONS = 256
# Create a standalone index
client.index_create(
namespace=NAMESPACE,
name=INDEX_NAME,
vector_field=VECTOR_FIELD,
dimensions=DIMENSIONS,
mode=types.IndexMode.STANDALONE,
)
Checking index status
Once the index is created, check its status to monitor its progress.
index = client.index(namespace=NAMESPACE, name=INDEX_NAME)
status = index.status()
print("Index Status:", status)
# Will print something like...
# Index Status: IndexStatusResponse(unmerged_record_count=0, index_healer_vector_records_indexed=0, index_healer_vertices_valid=0, standalone_metrics=StandaloneIndexMetrics(index_id=IndexId(namespace=, name=), state=StandaloneIndexState.CREATING, inserted_record_count=0), readiness=IndexReadiness.NOT_READY)
The index status' readiness
field will transition from NOT_READY
to READY
once construction is complete and the index has transitioned to distributed mode.
inserted_record_count
will increase as the index makes progress. Once the index is finished and transitions to distributed mode, the standalone_metrics
field should be ignored.
Using the asvec CLI tool
You can also iteract with standalone indexes using the asvec CLI tool. To monitor the status of a standalone index using asvec, simply run the asvec index ls
command.
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Indexes │
├───┬────────────────┬───────────┬────────┬────────────┬───────────────────┬──────────┬────────────────┬──────┬────────────┬────────────┬───────────┤
│ │ NAME │ NAMESPACE │ FIELD │ DIMENSIONS │ DISTANCE METRIC │ UNMERGED │ VECTOR RECORDS │ SIZE │ UNMERGED % │ MODE* │ STATUS │
├───┼────────────────┼───────────┼────────┼────────────┼───────────────────┼──────────┼────────────────┼──────┼────────────┼────────────┼───────────┤
│ 1 │ standalone_idx │ test │ vector │ 3 │ SQUARED_EUCLIDEAN │ 0 │ 0 │ 0 B │ 0% │ STANDALONE │ NOT_READY │
╰───┴────────────────┴───────────┴────────┴────────────┴───────────────────┴──────────┴────────────────┴──────┴────────────┴────────────┴───────────╯
To see more information about the standalone index run the asvec index ls --verbose
command and you will see the additional standalone index metrics table like below.
╭───────────────────────────────────╮
│ Standalone Index Metrics │
├────────────────────────┬──────────┤
│ State │ CREATING │
│ Scanned Vector Records │ 731 │
│ Indexed Vector Records │ 0 │
╰────────────────────────┴──────────╯
Performing a query
Once the index is in distributed mode, you can execute vector searches against it.
query_vector = [0.1] * DIMENSIONS
results = index.vector_search(query=query_vector, limit=5)
for result in results:
print(f"Result: {result}")
Key takeaways
Standalone indexes are much faster at bulk indexing data than distributed indexes.
Standalone indexes are not searchable while they are building.
Standalone indexes are created on a single node and require the standalone-indexer node role configured.
Data to be indexed must be written before the standalone index is created.
When a standalone index finishes building, it automatically becomes a distributed index.
Accelerate bulk indexing with Aerospike Vector Search 1.1.0
With AVS 1.1.0 and Python client 4.1.1, developers can efficiently build standalone indexes in memory before transitioning them to distributed mode for real-time querying. This feature significantly speeds up bulk indexing while maintaining the robust capabilities of Aerospike Vector Search. Try it out today to accelerate your vector search workflows!