Skip to main content

Vector Data in Aerospike​

AVS uses a simple data model for interacting with data inside Aerospike, which simplifies adding AVS to an existing Aerospike deployment. A record in AVS is equivalent to a record in Aerospike, but the AVS record has one or more additional bins (analogous to one or more additional columns). The additional bins are the only requirement for AVS to start performing index construction.

Index partitioning in Aerospike

Since the index is stored in Aerospike, it is automatically partitioned using Aerospike’s approach to data distribution. This allows you to grow a dataset as big as you need in your storage layer, without needing to keep the entire index in memory.


Namespace and set considerations

When configuring data in Aerospike, you should consider a few constraints before deciding on a data model.

  • Namespaces - Aerospike supports up to 32 namespaces, and configuring new namespaces requires updating each Aerospike node. The namespace should be used to define the storage model for your datasets and should be thought of as an abstraction for that storage type.

  • Sets - Aerospike supports up to 4096 sets, which can be allocated in any way across namespaces. Aerospike sets do not require updating an Aerospike node, so this makes them an ideal tool for separation of datasets. Additionally an index can include multiple sets, so you can combine this data for administrative or other scenarios where filtering is useful.

Vector bins + index storage

An Aerospike bin is similar to a column in a relational database. AVS can index as many vector bins as you want on a particular record. When creating an index, specify which field contains the vector you want to index, and AVS will create the bin and start building the index.

In addition to this new vector bin, the HNSW index is persisted in Aerospike as well, meaning that new index records are added to your namespaces in Aerospike where your dataset resides. This will add to the storage requirements for your namespace for each index you create on your dataset.