Estimate AVS scaling needs

Aerospike Vector Search (AVS) can be scaled horizontally to accommodate large indexes and high throughput SLAs. This page provides guidance for scaling your cluster based on your use case needs.

Choosing a specialized node role

By default, AVS nodes perform indexing and querying simultaneously. If you need to isolate and scale a node to meet indexing or query SLAs, you can configure that node to perform just one of the two operations, which are as follows:

Query and Caching - Nodes available for query perform HNSW traversals and cache the index and results in memory.
Online Indexing - Nodes available for indexing handle upsert calls and HNSW index healing. These nodes perform live indexing, which makes the index available for searching during updates.
Offline Indexing - Nodes that utilize dedicated hardware for building a complete index in memory, which is effective for batch ingestion jobs. While building, an offline index is not available for search.

Scaling for query latency

A common scaling scenario is to reduce search times by ensuring your index is fully available in the AVS cache. You can accomplish this by allocating enough memory to hold the entire index on each node, or across nodes. In effect, there are three different performance modes you will see based on this distribution.

Caching Level	Cache Hit Ratio	p50	p95	p99
Index cached on every AVS node	100%	2.5ms	5ms	7.5ms
Index cached across AVS nodes	>99%	10ms	25ms	50ms
Index mostly cached	>90%	10ms	150ms	200ms
Index partially cached	<50%	100ms	500ms	1.25s

Cache distribution

Each index has unique latency characteristics depending on its size and the memory capacity of each AVS node. You can adjust the cache expiration period to prevent cooling of the cache and improve performance.

Index cached on all nodes - For smaller indexes, AVS builds the cache automatically on every node and performs queries in memory without steering to another node.
Index cached across nodes - For larger indexes where performance is a concern, distributing your index across the nodes in your cluster ensures that AVS performs queries in memory. Steering queries to other nodes may be required, resulting in an increased likelihood of cache misses.
Partially cached indexes - For larger indexes where spikes are not a concern, the default cache settings distribute and expire the cache across your AVS nodes. This configuration
optimizes memory use in AVS for frequent queries, but it does not have a minimum memory requirement.

Example index sizings

The following examples provide in-memory deployment patterns and various index sizes, allocating 50% of total RAM to index caching. AVS requires additional RAM for process and node management.

Description	Caching Level	Records	Dimensions	Index Size	Node Size	Cluster Size
1M Low Dimension	On Node	1,000,000	128	1 GB	32 GB	N/A
1B Medium Dimension	Across Node	1,000,000,000	768	3.5 TB	512 GB	16
1B High Dimension	Across Node	1,000,000,000	3,072	12.4 TB	1,024 GB	32
Billions High Dimension	Partially Cached	1,000,000,000+	768+	35 TB+	N/A	N/A

Scaling for query throughput

AVS is designed to handle high query throughput, but transactions are queued if the host machine does not have enough CPU resources. We recommend scaling up for periods of high query throughput.

Query throughput estimates

Description	Peak QPS	Dimensions	QPS / Core	Cluster Size*
Low Dimension	100,000	128	200	16
Medium Dimension	100,000	768	50	64

Scaling for indexing

You can plan for scaling throughput by estimating your needs and assigning specific nodes in your cluster as standalone or distributed indexer nodes. By default, nodes perform distributed indexing which is best for small updates. For large updates in batch, consider adding a standalone index node.

Ingest Throughput Scaling	Dimensions	CPU Cores	Distributed Indexer per Hour	Standard Indexer per Hour
Low Dimension	128	32	96M	1.152B
Medium Dimension	768	32	10M	120M
High Dimension	3072	32	1.67M	20M