Frequently Asked Questions
Can the same storage device be used for the primary index (PI) and the data?โ
Answerโ
You should store the PI and the data on separate devices.
Can two namespaces share the same location to store indexes?โ
Answerโ
Although not recommended, a mount may be shared with other namespaces.
See the mount
configuration reference for further details.
For sizing details when using index-type flash
, see Capacity Planning.
Mounts can be shared across namespaces because a mount is a directory, and the actual files are the index arena stages.
The names of these files have the namespace and instance IDs built in.
Files for different namespaces and instances can coexist in a directory.
For example, namespace1
uses mount /mnt/nvme
with size 4 GiB and namespace2
also uses mount /mnt/nvme
with size 8 GiB, assuming NVMe is at least 12 GiB.
The configuration item to indicate index device quotas for namespaces is mounts-size-limit
.
This limit is enforced only using eviction, for which there is a configurable threshold: mounts-high-water-pct
.
While mounts-size-limit
is not a hard limit (expiration and eviction are not required) it must be configured anyway.
The minimum allowed is 4 GiB, and the maximum may not exceed the actual space available on the mounts.
Sharing mounts across namespaces is possible, but it is not recommended.
It may be better for performance to use multiple mounts, and underlying devices, for one namespace.
How do I monitor space used for the index?โ
Answerโ
Monitor the percentage used with index_flash_used_pct. Monitor the usage in bytes with index_flash_used_pct.
Itโs essential to understand that All Flash configurations pre-allocate the index space.
For example, in an 8-node cluster configured with 32768 partition-tree-sprigs (sprigs per partition) and replication factor of 2, the sprigs need to pre-allocate 128 GiB of index device space on each node:
4096 (number of partitions) 2 (replication factor) 32,768 (sprigs per partition) * 4 KiB) / 8 (nodes) = 128 GiB) The amount of memory consumed by 32,768 primary index sprigs on the same namespace is 3.25 GiB for a cluster running on the Enterprise Edition. Each sprig has an overhead of 13 bytes.
These statistics show the usage based on the number of records rather than the number of sprigs instantiated. Initially, sprigs are instantiated with the first record they contain, causing the primary index mount to fill up at a pace of roughly 4 KiB per record inserted until all the sprigs configured have been instantiated. The primary index mount usage can be checked directly on the system. Once all the sprigs are instantiated, the primary index disk usage remains stable until sprigs approach their 4 KiB initial allocation (they would then hold 64 records) and overflow into a second 4 KiB block. The performance impact would likely require a re-sizing.
Server Logsโ
There is an INFO log entry for index-flash-usage for each namespace:
{ns_name} index-flash-usage: used-bytes 5502926848 used-pct
This is printed every ten seconds for each namespace configured with index-type flash.
See Server Log Reference for more details on the log line.
How important is it to set the partition-tree-sprigs and determining fill-factor based on estimated records/current records?โ
Answerโ
If the namespace is projected to grow rapidly, a lower fill fraction would be adequate to leave room for future records. Full sprigs will span more than a single 4 KiB index block, and will then require more than a single index device read, impacting performance. It is essential to determine the adequate fill factor in advance because modifying the number of sprigs to mitigate this situation requires a cold start to rebuild the primary index.
See Capacity Planning for All-Flash for more details.
Another important point to consider is when the cluster size is reduced. This can be due to planned maintenance, an unexpected shutdown, or network partition splits, for example.
The min-cluster-size
parameter prevents sub-clusters below the configured minimum size from forming, preventing a quick proliferation of sprigs from previous partitions from filling the primary index mounts.
See Index Device Space for All-Flash for further details.
If the number of records is not expected to drastically change, a higher fill fraction would improve index traversal time for queries, migrations, NSUP cycle, and other operations that traverse the full primary index. Fuller sprigs mean fewer device read operations for a given number of records โ each read simply fetches more records into memory.
The configuration documentation states that it requires a feature-key file. Is there an additional cost for this feature?โ
Answerโ
The All-Flash feature does require a feature-key file and has an additional cost. Contact your Aerospike account owner for pricing.
How do I get information about dirty pages of the primary index?โ
Answerโ
Run index-pressure
for detailed information about primary index memory usage.
asinfo -l -v index-pressure
test:1630212096:69632
The two returned numbers indicate, in bytes, the amount of RAM occupied by the primary index and the amount in the primary index that is "dirty" (not written back to the index drive). In the example above, ~1.5 GiB is used in total for the index and around 68 KiB is dirty.
The command works for both hybrid storage and all-flash but gains particular utility when the index is on disk.
In an all-flash mode, index-pressure
indicates how far behind the index the write-back is lagging.
With index on disk, mmap()
is run on primary indices. They are modified as if they were in RAM.
When an index entry is touched, the kernel brings the corresponding page from the index drive to RAM.
If an index entry is modified, the kernel writes the corresponding modified page from RAM back to the index drive.
The RAM the page used then becomes available again for other purposes.
When the write-back process cannot keep up with index modifications, dirty pages pile up in RAM to a point where the system may run out of memory.
The higher the dirty value returned by index-pressure
, the more the write-back is lagging.