Skip to main content

Use the Python client

This page describes how to create AI applications with the gRPC API and Python client provided with Aerospike Vector Search (AVS).

Overview

The AVS package includes a client class which is the entrypoint for all AVS operations. The package also provides a types module that contains classes necessary for interacting with the various client APIs.

  • The client performs database operations with vector data, RBAC admin functions, and record reading and writing.
  • The client supports Hierarchical Navigable Small World (HNSW) vector searches, so that users can find vectors similar to a given query vector within an index.

You can use all of the example code on this page with this interactive Jupyter notebook.

Prerequisites

  • Python 3.9 or later
  • pip 9.0.1 or later
  • A running AVS deployment (see Install AVS)

Set up AVS and Python

  1. Install the AVS package.

    pip install aerospike-vector-search
  2. Install the client.

    from aerospike_vector_search import Client, types
  3. Initialize a new client by providing one or more seed hosts for the client to connect to.

    # Admin client configuration
    # LISTENER_NAME corresponds to the AVS advertised_listener config.
    # https://aerospike.com/docs/vector/operate/configuration#advertised-listener
    # this is often needed when connection to AVS clusters in the cloud
    LISTENER_NAME = None
    # LOAD_BALANCED is True if the AVS cluster is load balanced
    # using a load balancer with AVS is best practice and even works
    # with a single node AVS cluster that is not load balanced
    LOAD_BALANCED = True

    client = Client(
    seeds=types.HostPort(host=AVS_HOST, port=AVS_PORT),
    listener_name=LISTENER_NAME,
    is_loadbalancer=LOAD_BALANCED,
    )

Index your data

To search across a set of vectors, create an index associated with those vectors. AVS uses an index to traverse the HNSW neighborhoods to perform queries. See Manage AVS indexes for details about creating an index.

  1. Add vector entries.

    To take advantage of standalone indexing performance, add your vector data with the upsert method. Specify the following values when writing a record:

    • namespace - Namespace in which the index exists.
    • key - Primary identifier for your record.
    • record data - Map of any data you want to associate with your vector.
    • setName (optional) - Set in which to place the record.
    # set_name is the Aerospike set to write the records to
    SET_NAME = "basic-set"
    # VECTOR_FIELD is the Aerospike record bin that stores its vector embedding
    # The created index uses the data in this bin to perform nearest neighbor searches etc
    VECTOR_FIELD = "vector"
    # NAMESPACE is the Aerospike namespace where the data is stored
    NAMESPACE = "test"

    print("inserting vectors")
    for i in range(10):
    key = "r" + str(i)
    client.upsert(
    # namespace must match the namespace of the Index
    namespace=NAMESPACE,
    set_name=SET_NAME,
    key=key,
    record_data={
    "url": f"http://host.com/data{i}",
    # record_data must include VECTOR_FIELD to be indexed
    VECTOR_FIELD: [i * 1.0, i * 1.0],
    "map": {"a": "A", "inlist": [1, 2, 3]},
    "list": ["a", 1, "c", {"a": "A"}],
    },
    )
  2. The following example creates an index in standalone indexing mode, and start the indexing process for the data that was upserted in Step 1.

    from aerospike_vector_search import AVSServerError

    # Index creation arguments
    # INDEX_NAME is the name of the HNSW index to create
    INDEX_NAME = "basic_index"
    # DIMENSIONS is the dimensionality of the vectors
    DIMENSIONS = 2

    try:
    print("creating index")
    client.index_create(
    namespace=NAMESPACE,
    name=INDEX_NAME,
    vector_field=VECTOR_FIELD,
    dimensions=DIMENSIONS,
    mode=types.IndexMode.STANDALONE,
    )
    except AVSServerError as e:
    print("failed creating index " + str(e) + ", it may already exist")
    pass
  3. Interact with an Index object.

    After creating an index, you can interact with it through an Index object.

    from aerospike_vector_search import Index

    # create an Index object to interact with the index
    index = client.index(namespace=NAMESPACE, name=INDEX_NAME)

    # get the status of the index
    print("index status: ", index.status())
  4. Wait for index construction. You can confirm the index is ready by checking the status

    # Wait for the index to finish indexing records
    def wait_for_indexing(index: Index, timeout=30):
    import time

    index_status = index.status()
    timeout = float(timeout)
    while index_status.readiness != types.IndexReadiness.READY:
    time.sleep(0.5)

    timeout -= 0.5
    if timeout <= 0:
    raise Exception("timed out waiting for indexing to complete, "
    "maybe standalone indexing is not configured on this AVS cluster")

    index_status = index.status()

    wait_for_indexing(index)
    print("indexing complete")
  5. Use the following code to check if a vector has already been indexed.

    status = index.is_indexed(
    key=key,
    set_name=SET_NAME,
    )

    print("indexed: ", status)

Searching

After vectors have been indexed, you can begin searching them by providing a vector for search.

  1. Run your machine learning model on user input, and then perform a search using the generated embedding.

    print("querying")
    for i in range(10):
    print(" query " + str(i))
    results = index.vector_search(
    query=[i * 1.0, i * 1.0],
    limit=3,
    )
    for result in results:
    print(str(result.key.key) + " -> " + str(result.fields))
  2. Results are a list of nearest neighbors. Loop through the results from your entries to extract the relevant properties to use in your application.

    for result in results:
    print(str(result.key) + " -> " + str(result.bins))
    note

    To save on network traffic and CPU resources, the vector field is excluded by default.

  3. To retrieve the vector data, include it in the include_fields argument.

    print("querying")
    for i in range(10):
    print(" query " + str(i))
    results = index.vector_search(
    query=[i * 1.0, i * 1.0],
    include_fields=[VECTOR_FIELD, "url"]
    limit=3,
    )
    for result in results:
    print(str(result.key.key) + " -> " + str(result.fields))
  4. Read a record from AVS.

    key = "r0"

    result = client.get(
    namespace=NAMESPACE,
    key=key,
    set_name=SET_NAME,
    )

    print(str(result.key.key) + " -> " + str(result.fields))

AVS Python client using Asyncio

The aerospike-vector-search module provides an aio module with asynchronous clients that replace any client methods with coroutine methods. The asynchronous client are initialized in the same way as the synchronous clients. Simply add await in front of synchronous code to convert code examples:

from aerospike_vector_search.aio import Client as asyncClient

async_client = asyncClient(
seeds=types.HostPort(host=AVS_HOST, port=AVS_PORT),
listener_name=LISTENER_NAME,
is_loadbalancer=LOAD_BALANCED,
)

# Use await on client methods to await completion of the coroutine
results = await async_client.vector_search(
namespace=NAMESPACE,
index_name=INDEX_NAME,
query=[8.0, 8.0],
limit=3,
)

for result in results:
print(str(result.key.key) + " -> " + str(result.fields))

Close the clients

When you finish using the client and index objects, close the clients to release associated resources.

client.close()
async_client.close()

Read the documentation