Aerospike 8 introduces a new era of real-time transaction processingLearn more
Blog

Maximize your infrastructure and speed up HNSW index construction with new standalone indexing

Discover how Aerospike's new standalone indexing feature speeds up HNSW index construction, ideal for batch updates and testing new settings.

March 20, 2025 | 4 min read
49853fa3668050d8494757a13108bf3f
Adam Hevenor
Director of Product Management, AI

When Aerospike Vector Search (AVS) launched in 2024, it had the unique ability to scale out to compute nodes for handling HNSW (Hierarchical Navigable Small World) index construction and maintenance. This unique ability means AVS can handle indexes that exceed the memory of a single node, whereas most vector search systems must keep a copy of the index on every node. This is ideal for large indexes that are constantly being updated. However, we heard from our customers that they wanted a faster option for the small indexes they were building from scratch.

For these reasons, we are launching our new standalone index option with Aerospike Vector Search 1.1, which gives the ability to build indexes on a single node in the cluster. This “standalone” node will take full advantage of all the memory and CPU of a single node without fear of impacting the rest of your cluster. This makes it ideal for batch updates or full index rebuilds when testing new index settings or embedding models. Depending on your ingestion needs, operators can scale up or scale down the designated standalone index. Once that is complete, AVS automatically makes your index available for search and streaming updates.

The flexibility to build in either standalone or distributed indexing modes gives you the best of both worlds. You can jump-start your index construction on a single node without impacting the performance of your cluster and then stream incremental updates while your index remains available for search. 

Standalone indexing: When should you use it?

Standalone indexing is perfect for situations where you are doing many updates in batches that you want to build in bulk. As long as the index itself can fit in the memory of a single node, the process will handle the updates much more quickly. This will make it faster to test changes to your embedding model and the search application as a whole. Learn more about how to take advantage of standalone indexing in our developer blog.

How do you know if your index will fit in memory?

Index size can be calculated with a simple formula. With the default index settings, 1 million vectors and 768 dimensions would require around 3.5GB of index storage, plus best practice to include 10GB of overhead for the operating system and AVS itself. This size comfortably fits in memory, so you can use standalone indexing to index it on a single node. While this and many indexes can fit in memory, you’ll need to use distributed indexing for the largest indexes that exceed the memory footprint of a single node.

How much faster?

The following compares the initial distributed indexing and standalone indexing times, using equivalent numbers of cores, for different dataset sizes.

Dataset name Node size Time to index (Distributed) Time to index (Standalone)
Sift 10M
32 cores 11.5 hours 27 minutes
Sift 1M 32 cores 30 minutes 2 minutes

Based on Intel Xeon Scalable Processor (Ice Lake) 3rd Generation.

Choosing the right indexing strategy for your needs

When building out your data pipeline and search application, consider your embedding model, index settings, and recall tradeoffs. For some use cases, you'll want to fully rebuild your index periodically, while others will demand continuous updates at scale. 

A fraud detection system, for example, might use standalone indexing to quickly rebuild fraud pattern models offline while relying on distributed indexing to continuously update live transaction data and catch emerging threats in real time. Similarly, an e-commerce platform could process bulk product catalog updates in standalone mode while maintaining distributed indexing for real-time search and personalization.

With the flexibility to mix and match, teams can iterate faster, scale efficiently, and keep their search and retrieval systems optimized. Rebuilding your HNSW indexes no longer needs to require advanced planning; making batch updates or streaming is just a design choice based on your use case.

Try Aerospike: Community or Enterprise Edition

Aerospike offers two editions to fit your needs:

Community Edition (CE)

  • A free, open-source version of Aerospike Server with the same high-performance core and developer API as our Enterprise Edition. No sign-up required.

Enterprise & Standard Editions

  • Advanced features, security, and enterprise-grade support for mission-critical applications. Available as a package for various Linux distributions. Registration required.