Skip to main content


To improve search performance, the neighborhoods for a search are cached on the search nodes. This means that similar searches searches that target the same neighborhoods can be read from the cache rather than making a round trip to the storage layer. This caching approach, based on locality, uses the geometric distance between neighborhoods and is referred to as "geometric caching." In addition to geometric caching, AVS directs searches to the node most likely to have the index cached on the node. This prevents additional network hops on queries and further improves performance.

Geometric allocation

The HNSW index is cached on the AVS nodes based on the location of each neighborhood layer. This is accomplished using the Voronoi technique, which allocates metric space based on the distance between the vectors in the neighborhood layer. This approach allows for a cache distribution that can scale effectively both horizontally and vertically.


Query steering

To take advantage of the geometric distribution of the HNSW caching layers, a load balancing algorithm determines the optimal “location” to perform the query. This improves cache hit ratio and is done using one of two approaches.

  1. Each node calculates a map of the cache and determines which area of the cache is closest to the query vector. AVS combines this map with recent query performance on the node to forward the query to the predicted fastest node. This can be done with any size deployment and traditional load balancers.

  2. Clients can be enabled to steer queries directly to the appropriate node. This approach requires specific client versions and is best done within the same network without a load balancer.

All in-memory caching

For optimal performance, it is recommended to scale your cache size to approach 100% of cache hits. This depends on the nature of your dataset and data pipeline, and is ideal for smaller datasets that are relatively static.

For more details cache configuration options, see Configuring AVS.