Skip to main content
Loading

AVS search caching

To improve search performance, the neighborhoods for a search are cached on the search nodes. This means that similar searches β€” searches that target the same neighborhoods β€” can be read from the cache rather than making a round trip to the storage layer. This caching approach, based on locality, uses the geometric distance between neighborhoods and is referred to as "geometric caching."

In addition to geometric caching, AVS directs searches to the node most likely to have the index cached. This prevents additional network hops on queries and further improves performance.

Geometric allocation​

The HNSW index is cached on the AVS nodes based on the location of each neighborhood layer. This is accomplished using the Voronoi technique, which allocates metric space based on the distance between the vectors in the neighborhood layer. This approach allows for a cache distribution that can scale effectively both horizontally and vertically.

image

Query steering​

To take advantage of the geometric distribution of the HNSW caching layers, each AVS node is aware of the best location in the index to perform the query. AVS nodes route the query to the appropriate node when performing an HNSW traversal.

note

Query steering occurs regardless of whether or not you use a load balancer. See details about using a load balancer in our system diagram.

Query steering allows AVS to scale horizontally for performance and throughput without having to add more nodes.

tip

You can monitor for query steering by filtering for proxy-out requests on an AVS node.

All in-memory caching​

For optimal performance, it is recommended to scale your cache size to approach 100% of cache hits. This depends on the nature of your dataset and data pipeline, and is ideal for smaller datasets that are relatively static.

For details about cache configuration options, see Configuring AVS.