Secondary index capacity planning
##Overview This page describes how to plan for the cost of using secondary indexes (SI). You can either calculate an estimate of the space required, or extrapolate from a proof-of-concept.
Not all records are indexed. Only records in which the specified bin exists and has the correct data type are indexed.
Capacity planning for Database 6.0 and later
In this section the meaning of the word space depends on the sindex-type
, as configured for the namespace.
- It refers to memory when a secondary index is in memory (whether
sindex-type shmem
, or process memory in Aerospike Database Community Edition). - It refers to persistent memory (PMem) when configured as
sindex-type pmem
. - It refers to disk storage when using secondary index in flash (
sindex-type flash
).
Secondary indexes have a B-Tree structure for each partition, 4096 of them collocated with the primary index partitions on the node. Each B-Tree node takes 4KiB of space.
Secondary index spatial cost
- Each secondary index entry costs 14 bytes. 8 bytes for the relevant bin value (or hash thereof), plus 6 bytes to reference the record in the primary index.
- Scalar data (integer, string) is indexed as a single entry in the secondary index.
- When the bin being indexed is a list, each list element matching the defined data type is indexed as a separate entry in the secondary index.
- For map data types, each of the map keys or values being indexed has a separate entry in the secondary index.
- A GeoJSON bin containing a Point is stored as a single secondary index entry.
- A GeoJSON bin containing a Polygon or AeroCircle uses max-cells number of entries in the secondary index.
- Due to B-Tree characteristics, the average-case sparseness factor is 1.5 times the space used for all entries. The worst-case sparseness factor is twice the total space used for all entries. For example, 10 million entries in a secondary index use on average
10^7 x 14 bytes x 1.5 = 200MiB
of space, and at worst10^7 x 14 bytes x 2 = 267MiB
of space. - Multiply the total from the previous step by the replication factor of the namespace to get a cluster total.
Secondary index space allocation
So far, we've discussed the spatial cost of the entries in a secondary index. However, memory allocations don't occur in 14 byte increments.
- Each secondary index requires an initial memory allocation of 16MiB on each cluster node. For example, when defining four secondary indexes in a namespace, they require an initial allocation of
4 x 16MiB = 64 MiB
of space on each node. - From Database 6.1 onward, secondary index space allocation is controlled by the namespace
sindex-stage-size
. This can be set as low as 128MiB, with a default space allocation of 1GiB. So when the first of the four secondary indexes from the example above is defined, the server immediately allocates the firstsindex-stage-size
allocation. The secondary indexes grow into this allocation, and when it is full the next stage is allocated. This means you should roll up to thesindex-stage-size
increments when performing capacity planning.
Max secondary index size
If the size of the secondary index exceeds 2TiB, you must change sindex-stage-size
from the default value of 1GiB. Secondary index space is allocated in slices (arenas), the size of which are defined by the sindex-stage-size
configuration parameter.
The maximum number of arenas is 2048, so if the secondary index needs to be bigger than 2TiB the sindex-stage-size
must be increased.
Capacity planning for Database 5.7 and earlier
Extrapolating from server statistics
If you are running a proof of concept on Aerospike, you can use the server statistics for SIs as a way to extrapolate your expected memory consumption.
- The metric
ibtr_memory_used
tracks the memory consumption of the SI keys. - The metric
nbtr_memory_used
tracks the memory consumed by the record metadata entries in the SI. - The metric
si_accounted_memory
combines both metricsibtr_memory_used
andnbtr_memory_used
. Removed in Database 5.7. - The metric
heap_efficiency_pct
provides insight on potential fragmentation (also includes potential data memory usage if running with data in memory).
The server statistics method gives better capacity planning estimates. Extrapolate from these values, scaling them up linearly to your projected dataset.
Calculating an estimate
A SI has a key for each unique value encountered during indexing. Associated with the SI key, is metadata about the records that have this value in their bin, stored in one of two in-memory structures.
- K - number of SI keys in the SI.
- R - number of records indexed in the SI.
The formula to calculate the estimate changes based on whether the number of records indexed per unique value is lower or higher than a threshold. Below the threshold, we use arrays for compact storage and cache friendliness. Above the threshold, we use tree for optimal lookup.
- THRESHOLD (referred in the rest of this article)
- 64 - if server_version >= 5.7
- 32 - Otherwise
Less than THRESHOLD
If there are less than THRESHOLD records indexed per unique value (SI key) the following formula is used.
For Database 5.7 and later
Type | Min. mem usage | Max. mem usage | Avg. case mem usage |
---|---|---|---|
Integer | (16.25 x 1 x K) + (8 x 1 x R) | (16.25 x 2 x K) + (8 x 3.5 x R) | (16.25 x 1.5 x K) + (8 x 2 x R) |
String | (28.44 x 1 x K) + (8 x 1 x R) | (28.44 x 2 x K) + (8 x 3.5 x R) | (28.44 x 1.5 x K) + (8 x 2 x R) |
For Database 5.6 and earlier
Type | Min. mem usage | Max. mem usage | Avg. case mem usage |
---|---|---|---|
Integer | (16.25 x 1 x K) + (20 x 1 x R) | (16.25 x 2 x K) + (20 x 3.5 x R) | (16.25 x 1.5 x K) + (20 x 2 x R) |
String | (28.44 x 1 x K) + (20 x 1 x R) | (28.44 x 2 x K) + (20 x 3.5 x R) | (28.44 x 1.5 x K) + (20 x 2 x R) |
THRESHOLD or more
When there are THRESHOLD records or more, the following calculation is used instead.
For Database 5.7 and later
Type | Min. mem usage | Max. mem usage | Avg. case mem usage |
---|---|---|---|
Integer | (16.25 x 1 x K) + (8.25 x 1 x R) | (16.25 x 2 x K) + (8.25 x 2 x R) + (16.25 x K) | (16.25 x 1.5 x K) + (8.25 x 1.5 x R) |
String | (28.44 x 1 x K) + (8.25 x 1 x R) | (28.44 x 2 x K) + (8.25 x 2 x R) + (28.44 x K) | (28.44 x 1.5 x K) + (8.25 x 1.5 x R) |
For Database 5.6 and earlier
Type | Min. mem usage | Max. mem usage | Avg. case mem usage |
---|---|---|---|
Integer | (16.25 x 1 x K) + (20.25 x 1 x R) | (16.25 x 2 x K) + (20.25 x 2 x R) + (16.25 x K) | (16.25 x 1.5 x K) + (20.25 x 1.5 x R) |
String | (28.44 x 1 x K) + (20.25 x 1 x R) | (28.44 x 2 x K) + (20.25 x 2 x R) + (28.44 x K) | (28.44 x 1.5 x K) + (20.25 x 1.5 x R) |
Cost of an index built over list values depends on the type of the elements (Integer or String, see above), multiplied by the average number of unique values in the list.
Cost of an index built over map keys depends on the type of the map keys (Integer or String, see above), multiplied by the average number of keys in the map.
Cost of an index built over map values depends on the type of the map values (Integer or String, see above), multiplied by the average number of unique values in the map.
Extreme values may vary by a few hundred KBs.
SI garbage collection may affect this value by a few hundred MBs.
Each formula can be divided as:
Sum of (m * a * n * s) for n = keys and records + constant
n - number of keys/records in SI.
m - memory overhead per key/record in a B-tree or array.
a - memory allocated per memory used by SI.
s - size of the data type in the bin. For scalar values, s=1;
for lists it's the number of unique values in the list;
for maps it's the number of keys or unique map values, depending on what is being indexed
Following tables explains the different values of m and a.
Keys/Records | Type | Records per SI Key | Value of m (5.7 and later) | Value of m (5.6 and earlier) |
---|---|---|---|---|
Keys | Integer | Any | 16.25 | 16.25 |
Keys | String | Any | 28.44 | 28.44 |
Records | Any | < THRESHOLD | 8 | 20 |
Records | Any | > THRESHOLD | 8.25 | 20.25 |
Scenario | Keys/Records | Records per SI Key | Value of a | Note |
---|---|---|---|---|
Best Case | Any | Any | 1 | In the best case scenario every memory which is allocated is being used. |
Worst case | Keys | Any | 2 | Keys are always stored in B-tree. In the worst case every node of B-tree is half full. |
Worst case | Records | < THRESHOLD | 3.5 | For < THRESHOLD records per SI key, arrays are used to store record metadata. In the worst case 3.5 times memory is allocated extra. |
Worst case | Records | > THRESHOLD | 2 | For > THRESHOLD records per SI key, record metadata is stored similarly as keys. |
Avg case | Keys | Any | 1.5 | Generally, memory efficiency of B-tree lies between 0.5 and 1. |
Avg case | Records | < THRESHOLD | 2 | For < THRESHOLD records per SI key, record metadata is stored in arrays. Generally they are allocated twice more than needed. |
Avg case | Records | > THRESHOLD | 1.5 | Same as average case for keys. |
Note - In the worst case, each B-tree can also have an extra allocated unused B-tree node.
Thus (28.44 * K) is added into the calculations.
- GeoJSON Data types: Bins containing Points use the same sizing as one integer bin. Bins containing Polygons or AeroCircles use
max-cells
number of integer bin value entries.max-cells
is defined in the server configuration as any integer between 1 - 256, and defaults to 12. Each cell entry is sized the same as an integer. Refer to the max-cells configuration reference for more information.