Skip to main content

Clustering and Horizontal Scaling


AVS scales horizontally to handle increased throughput, ingestion, and performance SLAs. This section describes horizontal scaling and tuning considerations that are specific to throughput, ingestion, and performance.


AVS nodes use heartbeats to discover and track each other, increasing the resiliency of AVS cluster nodes and simplifying the process of scaling and upgrading. During node management, all nodes detect changes to the cluster using the heartbeat mechanism.

Node failure and cluster growth

When new nodes join the cluster, or when a cluster node fails, AVS needs to approximate the cache distances again. This is done by load balancing queries to the new cluster nodes to warm the cache on those nodes.

When a node failure occurs in the AVS cluster, the cache is reorganized among the remaining nodes based on recomputed centroids. This means when scaling up or down the cluster will experience a series of cache misses until the cache has been rebalanced.

Scaling for index throughput

When sizing your AVS cluster, you should consider the throughput needs of your searches. Sizing depends on throughput, vector dimensions, the type of each vector dimension value, etc.

These factors can help you determine the number of CPU cores needed for your index. The following table describes approximate performance numbers.

DimensionsQPS RequirementQPS / CoreNode Recommendation
12850k2008 x 32-core instances
76850k5016 x 64-core instances

Actual numbers depend on chipset and other factors. A benchmark for your instance type is recommended.

Scaling for ingestion (index construction)

Since AVS uses a parallel approach to HNSW index construction, you can scale out to accommodate ingest loads triggered by batch jobs or spikes in streaming data. Monitor the index healing queue to determine if scaling for ingestion is required.

Scaling for performance

To achieve minimal latency, you can tune your cluster to always pull the index from the cache in memory.

You can do this by monitoring the cache hit ratio. The goal is to maximize the memory on each node, and add nodes to your cluster until that number approaches 100%.