Secondary index capacity planning

This page describes how to plan for the cost of using secondary indexes (SI). You can either calculate an estimate of the space required, or extrapolate from a proof-of-concept.

Capacity planning for Database 6.0 and later

Secondary indexes have a B-Tree structure for each partition, 4096 of them collocated with the primary index partitions on the node. Each B-Tree node takes 4KiB of space.

Secondary index spatial cost

Each secondary index entry costs 14 bytes. 8 bytes for the relevant bin value (or hash thereof), plus 6 bytes to reference the record in the primary index.
- Scalar data (integer, string) is indexed as a single entry in the secondary index.
- When the bin being indexed is a list, each list element matching the defined data type is indexed as a separate entry in the secondary index.
- For map data types, each of the map keys or values being indexed has a separate entry in the secondary index.
- A GeoJSON bin containing a Point is stored as a single secondary index entry.
- A GeoJSON bin containing a Polygon or AeroCircle uses max-cells number of entries in the secondary index.
Due to B-Tree characteristics, the average-case sparseness factor is 1.5 times the space used for all entries. The worst-case sparseness factor is twice the total space used for all entries. For example, 10 million entries in a secondary index use on average 10^7 x 14 bytes x 1.5 = 200MiB of space, and at worst 10^7 x 14 bytes x 2 = 267MiB of space.
Multiply the total from the previous step by the replication factor of the namespace to get a cluster total.

Secondary index space allocation

So far, we’ve discussed the spatial cost of the entries in a secondary index. However, memory allocations don’t occur in 14 byte increments.

Each secondary index requires an initial memory allocation of 16MiB on each cluster node. For example, when defining four secondary indexes in a namespace, they require an initial allocation of 4 x 16MiB = 64 MiB of space on each node.
From Database 6.1 onward, secondary index space allocation is controlled by the namespace sindex-stage-size. This can be set as low as 128MiB, with a default space allocation of 1GiB. So when the first of the four secondary indexes from the example above is defined, the server immediately allocates the first sindex-stage-size allocation. The secondary indexes grow into this allocation, and when it is full the next stage is allocated. This means you should roll up to the sindex-stage-size increments when performing capacity planning.

Max secondary index size

If the size of the secondary index exceeds 2TiB, you must change sindex-stage-size from the default value of 1GiB. Secondary index space is allocated in slices (arenas), the size of which are defined by the sindex-stage-size configuration parameter.

The maximum number of arenas is 2048, so if the secondary index needs to be bigger than 2TiB the sindex-stage-size must be increased.

Capacity planning for Database 5.7 and earlier

Extrapolating from server statistics

If you are running a proof of concept on Aerospike, you can use the server statistics for SIs as a way to extrapolate your expected memory consumption.

The metric ibtr_memory_used tracks the memory consumption of the SI keys.
The metric nbtr_memory_used tracks the memory consumed by the record metadata entries in the SI.
The metric si_accounted_memory combines both metrics ibtr_memory_used and nbtr_memory_used. Removed in Database 5.7.
The metric heap_efficiency_pct provides insight on potential fragmentation (also includes potential data memory usage if running with data in memory).

The server statistics method gives better capacity planning estimates. Extrapolate from these values, scaling them up linearly to your projected dataset.

Calculating an estimate

A SI has a key for each unique value encountered during indexing. Associated with the SI key, is metadata about the records that have this value in their bin, stored in one of two in-memory structures.

K - number of SI keys in the SI.
R - number of records indexed in the SI.

The formula to calculate the estimate changes based on whether the number of records indexed per unique value is lower or higher than a threshold. Below the threshold, we use arrays for compact storage and cache friendliness. Above the threshold, we use tree for optimal lookup.

THRESHOLD (referred in the rest of this article)
- 64 - if server_version >= 5.7
- 32 - Otherwise

Less than THRESHOLD

If there are less than THRESHOLD records indexed per unique value (SI key) the following formula is used.

For Database 5.7 and later

Type	Min. mem usage	Max. mem usage	Avg. case mem usage
Integer	(16.25 x 1 x K) + (8 x 1 x R)	(16.25 x 2 x K) + (8 x 3.5 x R)	(16.25 x 1.5 x K) + (8 x 2 x R)
String	(28.44 x 1 x K) + (8 x 1 x R)	(28.44 x 2 x K) + (8 x 3.5 x R)	(28.44 x 1.5 x K) + (8 x 2 x R)

For Database 5.6 and earlier

Type	Min. mem usage	Max. mem usage	Avg. case mem usage
Integer	(16.25 x 1 x K) + (20 x 1 x R)	(16.25 x 2 x K) + (20 x 3.5 x R)	(16.25 x 1.5 x K) + (20 x 2 x R)
String	(28.44 x 1 x K) + (20 x 1 x R)	(28.44 x 2 x K) + (20 x 3.5 x R)	(28.44 x 1.5 x K) + (20 x 2 x R)

THRESHOLD or more

When there are THRESHOLD records or more, the following calculation is used instead.

For Database 5.7 and later

Type	Min. mem usage	Max. mem usage	Avg. case mem usage
Integer	(16.25 x 1 x K) + (8.25 x 1 x R)	(16.25 x 2 x K) + (8.25 x 2 x R) + (16.25 x K)	(16.25 x 1.5 x K) + (8.25 x 1.5 x R)
String	(28.44 x 1 x K) + (8.25 x 1 x R)	(28.44 x 2 x K) + (8.25 x 2 x R) + (28.44 x K)	(28.44 x 1.5 x K) + (8.25 x 1.5 x R)

For Database 5.6 and earlier

Type	Min. mem usage	Max. mem usage	Avg. case mem usage
Integer	(16.25 x 1 x K) + (20.25 x 1 x R)	(16.25 x 2 x K) + (20.25 x 2 x R) + (16.25 x K)	(16.25 x 1.5 x K) + (20.25 x 1.5 x R)
String	(28.44 x 1 x K) + (20.25 x 1 x R)	(28.44 x 2 x K) + (20.25 x 2 x R) + (28.44 x K)	(28.44 x 1.5 x K) + (20.25 x 1.5 x R)

Cost of an index built over list values depends on the type of the elements (Integer or String, see above), multiplied by the average number of unique values in the list.
Cost of an index built over map keys depends on the type of the map keys (Integer or String, see above), multiplied by the average number of keys in the map.
Cost of an index built over map values depends on the type of the map values (Integer or String, see above), multiplied by the average number of unique values in the map.
Extreme values may vary by a few hundred KBs.
SI garbage collection may affect this value by a few hundred MBs.

Each formula can be divided as:

  Sum of (m * a * n * s) for n = keys and records + constant
  n - number of keys/records in SI.
  m - memory overhead per key/record in a B-tree or array.
  a - memory allocated per memory used by SI.
  s - size of the data type in the bin. For scalar values, s=1;
      for lists it's the number of unique values in the list;
      for maps it's the number of keys or unique map values, depending on what is being indexed

  Following tables explains the different values of m and a.

|Keys/Records|Type|Records per SI Key|Value of m (5.7 and later)|Value of m (5.6 and earlier)|Note| |---|---|---|---|---| |Keys|Integer|Any|16.25 |16.25 |8 bytes for integer + 8 bytes for pointer + 8 /31 (one extra pointer / B-tree degree)| |Keys|String|Any|28.44 |28.44 |20 bytes digests + 8 bytes pointer + 8 /18 (one extra pointer / B-tree degree)| |Records|Any|< THRESHOLD|8|20|For < THRESHOLD records per SI key, arrays are used to store records. Overhead is 8 (size of handle) starting with version 5.7, and 20 (size of digest) otherwise| |Records|Any|> THRESHOLD|8.25|20.25|For > THRESHOLD records per SI key, records are stored in a B-tree. 8 bytes handle (20 byte digest for pre 5.7) + 8 /31 (one extra pointer / B-tree degree)|

Scenario	Keys/Records	Records per SI Key	Value of a	Note
Best Case	Any	Any	1	In the best case scenario every memory which is allocated is being used.
Worst case	Keys	Any	2	Keys are always stored in B-tree. In the worst case every node of B-tree is half full.
Worst case	Records	< THRESHOLD	3.5	For < THRESHOLD records per SI key, arrays are used to store record metadata. In the worst case 3.5 times memory is allocated extra.
Worst case	Records	> THRESHOLD	2	For > THRESHOLD records per SI key, record metadata is stored similarly as keys.
Avg case	Keys	Any	1.5	Generally, memory efficiency of B-tree lies between 0.5 and 1.
Avg case	Records	< THRESHOLD	2	For < THRESHOLD records per SI key, record metadata is stored in arrays. Generally they are allocated twice more than needed.
Avg case	Records	> THRESHOLD	1.5	Same as average case for keys.

Note - In the worst case, each B-tree can also have an extra allocated unused B-tree node.
Thus (28.44 * K) is added into the calculations.

GeoJSON Data types: Bins containing Points use the same sizing as one integer bin. Bins containing Polygons or AeroCircles use max-cells number of integer bin value entries. max-cells is defined in the server configuration as any integer between 1 - 256, and defaults to 12. Each cell entry is sized the same as an integer. Refer to the max-cells configuration reference for more information.