Metrics Reference
See the Metrics command examples for information on usage.
Namespace
aerospike_namespace_appeals_records_exonerated
Number of records that were marked replicated as result of an appeal. Partition appeals will happen for namespaces operating under the strong-consistency
mode when a node needs to validate the records it has when joining the cluster.
counter
integer
aerospike_namespace_appeals_rx_active
Number of partition appeals currently being received. Partition appeals will happen for namespaces operating under the strong-consistency
mode when a node needs to validate the records it has when joining the cluster.
gauge
integer
aerospike_namespace_appeals_tx_active
Number of partition appeals currently being sent. Partition appeals will happen for namespaces operating under the strong-consistency
mode when a node needs to validate the records it has when joining the cluster.
gauge
integer
aerospike_namespace_appeals_tx_remaining
Number of partition appeals not yet sent. Partition appeals will happen for namespaces operating under the strong-consistency
mode when a node needs to validate the records it has when joining the cluster. Appeals occur after a node has been cold-started. The replication state of each record is lost on cold-start and all records must assume an unreplicated state. An appeal resolves replication state from the partition’s acting master. These are important for performance; an unreplicated record will need to re-replicate to be read which adds latency. During a rolling cold-restart, an operator may want to wait for the appeal phase to complete after each restart to minimize the performance impact of the procedure.
gauge
integer
aerospike_namespace_auto_revived_partitions
Number of partitions that the auto-revive
feature revived at startup.
gauge
integer
aerospike_namespace_available_bin_names
Remaining number of unique bins that the user can create for this namespace.
The formula for the associated metrics is as follows:
bin_names_quota
- bin_names
= available_bin_names
gauge
integer
aerospike_namespace_batch_sub_delete_error
Number of batch-index delete sub-batches that failed with an error. For example, invalid set name, unavailable (if SC), failure to apply a predexp filter, key mismatch if key was sent), device error (i/o error), key busy (duplicate resolution or if SC), problem during bitwise, HLL or CDT.
counter
integer
aerospike_namespace_batch_sub_delete_filtered_out
Number of batch-index delete sub-batches that did not happen because the record was filtered out with Filter Expressions.
counter
integer
aerospike_namespace_batch_sub_delete_not_found
Number of batch-index delete sub-batches that resulted in not found.
counter
integer
aerospike_namespace_batch_sub_delete_success
Number of records successfully deleted by batch-index sub-batches.
counter
integer
aerospike_namespace_batch_sub_delete_timeout
Number of batch-index delete sub-batches that timed out.
counter
integer
aerospike_namespace_batch_sub_lang_delete_success
Number of successful batch-index UDF delete sub-batches.
counter
integer
aerospike_namespace_batch_sub_lang_error
Number of language (Lua) batch-index errors for UDF sub-transactions.
counter
integer
aerospike_namespace_batch_sub_lang_read_success
Number of successful batch-index UDF read sub-batches.
counter
integer
aerospike_namespace_batch_sub_lang_write_success
Number of successful batch-index UDF write sub-batches.
counter
integer
aerospike_namespace_batch_sub_proxy_complete
Number of proxied batch-index sub-batches that completed.
counter
integer
aerospike_namespace_batch_sub_proxy_error
Number of proxied batch-index sub transactions that failed with an error.
counter
integer
aerospike_namespace_batch_sub_proxy_timeout
Number of proxied batch-index sub-batches that timed out.
counter
integer
aerospike_namespace_batch_sub_read_error
Number of batch-index read subtransaction that failed with an error. For example: invalid set name, unavailable (if SC), failure to apply a predexp filter, key mismatch if key was sent), device error (i/o error), key busy (duplicate resolution or if SC), problem during bitwise, HLL or CDT.
counter
integer
aerospike_namespace_batch_sub_read_filtered_out
Number of batch-index read sub-batches that were skipped because the record was filtered out with Filter Expressions.
counter
integer
aerospike_namespace_batch_sub_read_not_found
Number of batch-index read subtransaction that resulted in not found.
counter
integer
aerospike_namespace_batch_sub_read_success
Number of records successfully read by batch-index sub-batches.
counter
integer
aerospike_namespace_batch_sub_read_timeout
Number of batch-index read sub-batches that timed out.
counter
integer
aerospike_namespace_batch_sub_tsvc_error
Number of batch-index read sub-batches that failed with an error in the transaction service, before attempting to handle the transaction. For example, protocol errors or security permission mismatches. In strong-consistency
enabled namespaces, this includes transactions against unavailable_partitions
and dead_partitions
.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc
errors happen before records are accessed for reads or writes, and they are counted separately from tsvc
timeouts.
counter
integer
aerospike_namespace_batch_sub_tsvc_timeout
Number of batch-index read sub-batches that timed out in the transaction service, before attempting to handle the transaction.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc
errors happen before records are accessed for reads or writes, and they are counted separately from tsvc
timeouts.
counter
integer
aerospike_namespace_batch_sub_udf_complete
Number of completed batch-index UDF sub-batches for scan/query background UDF jobs. See the following statistics for the underlying operation statuses batch_sub_lang_delete_success
, batch_sub_lang_error
, batch_sub_lang_read_success
, batch_sub_lang_write_success
.
counter
integer
aerospike_namespace_batch_sub_udf_error
Number of failed batch-index UDF sub-batches for scan/query background UDF jobs. Does not include timeouts. See the following statistics for the underlying operation statuses: batch_sub_lang_delete_success
, batch_sub_lang_error
, batch_sub_lang_read_success
, batch_sub_lang_write_success
.
counter
integer
aerospike_namespace_batch_sub_udf_filtered_out
Number of batch-index UDF sub-batches that did not happen because the record was filtered out with Filter Expressions.
counter
integer
aerospike_namespace_batch_sub_udf_timeout
Number of batch-index UDF sub-batches that timed out for scan/query background UDF jobs. See the following statistics for the underlying operation statuses: batch_sub_lang_delete_success
, batch_sub_lang_error
, batch_sub_lang_read_success
, batch_sub_lang_write_success
.
counter
integer
aerospike_namespace_batch_sub_write_error
Number of batch-index write sub-batches that failed with an error. For example, invalid set name, unavailable (if SC), failure to apply a predexp filter, key mismatch if key was sent), device error (i/o error), key busy (duplicate resolution or if SC), problem during bitwise, HLL or CDT.
counter
integer
aerospike_namespace_batch_sub_write_filtered_out
Number of batch-index write sub-batches that did not happen because the record was filtered out with Filter Expressions.
counter
integer
aerospike_namespace_batch_sub_write_success
Number of records successfully written by batch-index sub-batches.
counter
integer
aerospike_namespace_batch_sub_write_timeout
Number of batch-index write sub-batches that timed out.
counter
integer
aerospike_namespace_bin_names
Number of bin names used for the namespace.
The formula for the associated metrics is as follows:
bin_names_quota
- bin_names
= available_bin_names
gauge
integer
aerospike_namespace_bin_names_quota
Quota of bin names for the namespace. Starting with Database 7.0, there is no limit on bin names per namespace. In Database 5.0 and 6.0, the limit was 65,535.
The formula for the associated metrics is as follows:
bin_names_quota
- bin_names
= available_bin_names
If you have met the quota, see KB article How to clear up bin names when they exceed the limits.
gauge
integer
aerospike_namespace_cache_read_pct
Percentage of read commands that are hitting the post-write-cache
or the blocks in the max-write-cache
and will save an IO to the underlying storage device.
See the post-write-cache
and read-page-cache
documentation for ways to improve read-intensive workloads latency by leveraging those 2 different caching options.
Reads from update commands as well as migrations, scans, XDR reads and anything that tries to load a record off the device are accounted for in the cache_read_pct
figures.
gauge
integer
aerospike_namespace_client_delete_error
Number of client delete commands that failed with an error.
Compare client_delete_error
to client_delete_success
.
If ratio is higher than acceptable, alert operations to investigate.
counter
integer
aerospike_namespace_client_delete_filtered_out
Number of client delete commands that did not happen because the record was filtered out with Filter Expression.
counter
integer
aerospike_namespace_client_delete_not_found
Number of client delete commands that resulted in a not found.
counter
integer
aerospike_namespace_client_delete_success
Number of successful client delete commands.
counter
integer
aerospike_namespace_client_delete_timeout
Number of client delete commands that timed out.
counter
integer
aerospike_namespace_client_lang_delete_success
Number of UDF commands that successfully deleted a record.
counter
integer
aerospike_namespace_client_lang_error
Number of UDF commands that failed with a language (Lua) error during UDF execution.
counter
integer
aerospike_namespace_client_lang_read_success
Number of successful record reads caused by a UDF command.
counter
integer
aerospike_namespace_client_lang_write_success
Number of successful record writes caused by a UDF command.
counter
integer
aerospike_namespace_client_proxy_complete
Number of client commands proxied to another node.
counter
integer
aerospike_namespace_client_proxy_error
Number of client commands that failed to proxy to another node.
counter
integer
aerospike_namespace_client_proxy_timeout
Number of client commands that timed out while being proxied to another node.
counter
integer
aerospike_namespace_client_read_error
Number of read commands that failed with an error. For example, invalid set name, unavailable (if SC), failure to apply a predexp filter, key mismatch if key was sent), device error (i/o error), key busy (duplicate resolution or if SC), problem during bitwise, HLL or CDT.
Compare client_read_error
to client_read_success
.
If ratio is higher than acceptable, alert operations to investigate.
counter
integer
aerospike_namespace_client_read_filtered_out
Number of read commands that did not happen because they were filtered out.
counter
integer
aerospike_namespace_client_read_not_found
Number of client read commands that resulted in not found.
counter
integer
aerospike_namespace_client_read_success
Number of successful client read commands. Does not include records read by batch-reads or scans. batch-reads have the separate batch_sub_read_success
metric. Scans have separate metrics depending on the type of scan between scan_basic_complete
, scan_aggr_complete
, scan_ops_bg_complete
, and scan_udf_bg_complete
metrics.
counter
integer
aerospike_namespace_client_read_timeout
Number of client read commands that timed out.
counter
integer
aerospike_namespace_client_tsvc_error
Number of client commands that failed in the transaction service, before attempting to handle the transaction. For example, protocol errors or security permission mismatch. In strong-consistency
enabled namespaces, this includes commands against unavailable_partitions
and dead_partitions
.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc
errors happen before records are accessed for reads or writes. They’re counted separately from tsvc
timeouts.
counter
integer
aerospike_namespace_client_tsvc_timeout
Number of client commands that timed out while in the transaction service, before attempting to handle the command. At this stage the commands has not yet been identified as a read or a write, but the namespace is known. Likely cause, there may not be enough service threads to keep pace with the workload. Other common situations falling into this category would be commands that have to be retried after waiting in the rw-hash (for example hotkeys) and use cases where the timeout set by the client is too aggressive.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc
errors happen before records are accessed for reads or writes. They’re counted separately from tsvc
timeouts.
counter
integer
aerospike_namespace_client_udf_complete
Number of completed UDF commands initiated by the client.
counter
integer
aerospike_namespace_client_udf_error
Number of failed UDF commands initiated by the client. Does not include timeouts. Error is also returned to the client.
Compare client_udf_error
to client_udf_complete
.
If ratio is higher than acceptable, alert operations to investigate.
counter
integer
aerospike_namespace_client_udf_filtered_out
Number of client UDF commands that did not happen because the record was filtered out with Filter Expressions.
counter
integer
aerospike_namespace_client_udf_timeout
Number of UDF commands initiated by the client that timed out. The timeout error is returned to the client.
counter
integer
aerospike_namespace_client_write_error
Number of client write commands that failed with an error. Includes common errors like fail_generation
, fail_key_busy
, fail_record_too_big
, fail_xdr_forbidden
and some less common errors. Includes xdr_client_write_error
. See Why is my client_write_error metrics incrementing? for details on the type of errors that increment this statistic.
Compare client_write_error
to client_write_success
.
If ratio is higher than acceptable,alert operations to investigate.
For more details, see to the knowledge base article Why is my client_write_error metrics incrementing?.
counter
integer
aerospike_namespace_client_write_filtered_out
Number of client write commands that did not happen because the record was filtered out with Filter Expressions.
counter
integer
aerospike_namespace_client_write_success
Number of successful client write commands. Includes xdr_client_write_success
.
counter
integer
aerospike_namespace_client_write_timeout
Number of client write commands that timed out on the server. On a stable cluster with no migrations in progress, this metric indicates the number of replica write timeouts. A timeout error is returned to the client. In strong-consistency
enabled namespaces, the record is marked as unreplicated and will re-replicate. Includes xdr_client_write_timeout
.
counter
integer
The following conditions can cause this metric to increment:
-
Every single write replica failure (master failing to replicate) increments the client_write_timeout metric.
-
If duplicate resolution is enabled for writes (default), during migrations, the
client_write_timeout
metric also increments if there is a timeout during duplicate resolution and could occur before we apply the write on the master side. -
See
transaction-max-ms
for details on when the server checks for timeout. Transactions can also timeout earlier in the transaction flow, in which case, theclient_tsvc_timeout
statistic increments.
aerospike_namespace_clock_skew_stop_writes
Namespace will stop accepting client writes when true.
For strong-consistency
enabled namespaces, will be true if the clock skew is outside of tolerance, typically 20 seconds.
For Available mode (AP) namespaces running Database 4.5.1 or later, and where NSUP is enabled (nsup-period
not zero), will be true if the cluster clock skew exceeds 40 seconds. In such occurrences, NSUP will also not run, disabling record expirations and evictions until the clock skew falls back in the tolerated range.
If clock_skew_stop_writes
is true, it is a critical ALERT.
Verify that clocks are synchronized across the cluster.
gauge
boolean
aerospike_namespace_current_time
Current time represented as Aerospike epoch time.
If cluster_max(current_time
) and cluster_min(current_time
) differ by more than 10 seconds, critical ALERT.
Server time skew might indicate that NTP or similar service is not running on this node.
gauge
integer
aerospike_namespace_data_avail_pct
Measures the minimum contiguous storage-engine device, pmem, or memory storage file space across all such files in a namespace. The namespace is read-only if this value falls below stop-writes-avail-pct
. It is important for all configured storage files in a namespace to have the same size, otherwise, data_avail_pct
could be low even when a lot of space is available across other files.
gauge
integer
Example: Where 5 files of 96MiB each for a given namespace, and each file has 24MiB of data spread across 6 write blocks (with the 8MiB write-block size):
- The
data_used_pct
is 75%. - The
data_avail_pct
is 50%. - If the distribution is not perfectly uniform (which is usual),
data_avail_pct
represents the file that has the fewest free blocks.
Warn your operations group about any of the following conditions:
- If
data_avail_pct
drops below 20%, the defrag may not be able to keep up with the current load. - If
data_avail_pct
drops below 15%, this is a critical ALERT. - If
data_avail_pct
drops below 5%, this condition might result instop_writes
.
aerospike_namespace_data_compression_ratio
Measures the average compressed size to uncompressed size ratio. Thus 1.000
indicates no compression and 0.100
indicates a 1:10
compression ratio (90% reduction in size). device_compression_ratio
is not included if the compression
configuration parameter is set to none
.
gauge
integer
The compression ratio is a moving average calculated based on the most recently written records. Read records do not factor into the ratio. Records that don’t try to compress are not included in the moving average. If the written data changes over time, then the compression ratio changes with it. In case of a sudden change in data, the indicated compression ratio may lag. As a rule of thumb, assume that the compression ratio covers the most recently written 100,000 to 1,000,000 records.
aerospike_namespace_data_total_bytes
Regardless of storage-engine, the total allocated storage.
gauge
integer
aerospike_namespace_data_used_bytes
Regardless of storage-engine, the total storage allocated is data_total_bytes
, and the amount of data used in that storage is data_used_bytes
.
gauge
integer
aerospike_namespace_data_used_pct
Percentage of used storage capacity for this namespace. Calculated as data_used_bytes
* 100 / data_total_bytes
. Evictions will be triggered when this percentage crosses the configured evict-used-pct
.
gauge
integer
aerospike_namespace_dead_partitions
Number of dead partitions for this namespace when using strong-consistency
. This is the number of partitions that are unavailable when all roster nodes are present. Requires the use of the revive
command to make them available again. Revived nodes restore availability only when all nodes are trusted.
If dead_partitions
is not zero, critical ALERT. If you are certain that there are no potential data inconsistencies or if data inconsistencies are acceptable, consider issuing revive
and recluster
commands.
gauge
integer
aerospike_namespace_deleted_last_bin
Number of objects deleted because their last bin was deleted.
counter
integer
aerospike_namespace_device_available_pct
Measures the minimum contiguous disk space across all devices in a namespace. The namespace will be read only (stop writes) if this value falls below min-avail-pct
. It is important for all configured devices in a namespace to have the same size, otherwise, the device_available_pct
could be low even when a lot of space is available across other devices.
- If
device_available_pct
drops below 20%, warn your operations group, this condition might indicate that defrag is unable to keep up with the current load. - If
device_available_pct
drops below 15%, critical ALERT. - If
device_available_pct
drops below 5%, usable disk resources are critically low. This condition might result instop_writes
.
gauge
integer
Not to be confused with device_free_pct
which represents the amount of free space across all devices in a namespace and does not take account of the fragmentation. Here is an example to represent the difference between device_free_pct
and device_available_pct
. Assume 5 devices of 100MiB each for a given namespace, where each device has 20MiB of data that are spread across 5 write-blocks (where each write-block is 8MiB):
- The
device_free_pct
would be 80%. - The
device_available_pct
would be 60%. - If the distribution is not uniform (it usually is not perfectly uniform) the
device_available_pct
would represent the device that has the least free blocks.
aerospike_namespace_device_compression_ratio
Measures the average compressed size to uncompressed size ratio. 1.000
indicates no compression and 0.100
indicates a 1:10
compression ratio (90% reduction in size). device_compression_ratio
will not be included if compression
is set to none
.
moving average
decimal
The compression ratio is a moving average. It is calculated based on the most recently written records. Read records do not factor into the ratio. Records that don’t try to compress are not included in the moving average. If the written data changes over time then the compression ratio will change with it. In case of a sudden change in data, the indicated compression ratio may lag behind a bit. As a rule of thumb, assume that the compression ratio covers the most recently written 100,000 to 1,000,000 records.
aerospike_namespace_device_free_pct
Percentage of disk capacity free for this namespace. This is the amount of free storage across all devices in the namespace. Evictions will be triggered when the used percentage across all devices (which is represented by 100 - device_free_pct
) crosses the configured high-water-disk-pct
.
gauge
integer
Not to be confused with device_available_pct
which represents the amount of free contiguous space on the device that has the least contiguous free space across the namespace. Here is an example to represent the difference between device_free_pct
and device_available_pct
. Assume 5 devices of 100MB each for a given namespace, where each device has 25MB of data that are spread across 50 write blocks (let’s assume a 1MB write-block-size):
- The
device_free_pct
would be 75%. - The
device_available_pct
would be 50%. - If the distribution is not uniform (it usually is not perfectly uniform) the
device_available_pct
would represent the device that has the least free blocks.
aerospike_namespace_device_total_bytes
Total bytes of disk space allocated to this namespace on this node.
gauge
integer
aerospike_namespace_device_used_bytes
Total bytes of disk space used by this namespace on this node.
Trending device_used_bytes
provides operations insight into how disk usage changes over time for this namespace.
gauge
integer
aerospike_namespace_dup_res_ask
Number of duplicate resolution requests made by the node to other individual nodes.
counter
integer
aerospike_namespace_dup_res_respond_no_read
Number of duplicate resolution requests handled by the node without reading the record.
counter
integer
aerospike_namespace_dup_res_respond_read
Number of duplicate resolution requests handled by the node where the record was read.
counter
integer
aerospike_namespace_effective_active_rack
The effective active-rack
for the namespace. The configured active rack owns all of the master partition copies.
For strong consistency-enabled namespaces, this is the roster’s current active rack. Otherwise, it is the configured active-rack
.
gauge
integer
aerospike_namespace_effective_is_quiesced
Reports ‘true’ when the namespace has rebalanced after previously receiving a quiesce
info request.
gauge
integer
aerospike_namespace_effective_prefer_uniform_balance
Applies only to Enterprise Edition. Value can be true or false. If Aerospike applied the uniform balance algorithm for the current cluster state, the value returned is true
. If any node having this namespace isn’t configured with prefer-uniform-balance
true
, the value returned is false
and uniform balance algorithm is disabled for this namespace on all participating nodes.
gauge
integer
aerospike_namespace_effective_replication_factor
The effective replication factor for the namespace, included with the namespace info command metrics.
The effective replication factor is less than the replication-factor
if the cluster size is smaller than the RF, in which case the effective replication factor would match the cluster size.
In Database 5.7 and earlier, if the paxos-single-replica-limit
size is reached, the effective replication factor is 1.
The effective replication factor is 0 for a node that has been orphaned by the cluster. For example, if a node tries to join a cluster but that node is unable to communicate with every other node in the cluster, the principal node rejects the request and the node marks itself as an orphan.
gauge
integer
For AP namespaces in Database 7.1 and earlier, the effective replication factor drops when a node is shut down or crashes, and the remaining nodes are fewer than the RF. In Database 5.7 and earlier, if the paxos-single-replica-limit
size is reached, the effective replication factor is 1.
aerospike_namespace_evict_ttl
The current eviction depth, or the highest ttl of records that have been evicted, in seconds.
gauge
integer
aerospike_namespace_evict_void_time
The current eviction depth, expressed as a void time in seconds since 1 January 2010 UTC.
gauge
integer
aerospike_namespace_evicted_objects
Number of objects evicted from this namespace on this node since the server started.
counter
integer
aerospike_namespace_fail_client_lost_conflict
Number of non-XDR write commands that failed because some bin’s last-update-time is greater than the write command’s time. Error code 28 is returned. This can happen only when the XDR bin convergence feature is enabled. This can happen due to either:
-
a clock skew across DCs causing XDR write commands to write bins with a future timestamp compared to local time.
-
a race condition between an incoming XDR write command and a local client write command.
See fail_xdr_lost_conflict
and cluster_max_compatibility_id
.
counter
integer
aerospike_namespace_fail_generation
Number of read/write commands failed on generation check.
counter
integer
aerospike_namespace_fail_key_busy
Number of read/write commands that failed on ‘hot keys’, meaning there were already a number of commands queued up higher than transaction-pending-limit
for the same record waiting in the rw-hash or rw_in_progress
. For read this can only happen when duplicate resolution is necessary.
If the application is not expected to have hot keys and fail_key_busy
rate of change exceeds expectations, this condition might indicate a problem with the application.
counter
integer
Detail level logging for the rw
context will log transactions (digest) triggering this error. Read transactions would only fail if they had to go through the rw-hash (for example if duplicate resolution are in effect).
aerospike_namespace_fail_mrt_blocked
Number of transactions or read/write commands blocked by an ongoing transaction.
gauge
integer
aerospike_namespace_fail_mrt_version_mismatch
Number of version mismatches - usually in verify reads, but also individual commands (reads/writes/deletes/UDFs) where version checks occur if the record had previously been read in the transaction.
gauge
integer
aerospike_namespace_fail_record_too_big
Number of write commands that failed because a record was larger than max-record-size
. Only counts client writes failures on master side.
counter
integer
Detail level logging for the rw
context will log transactions (digest) triggering this error (originating from client side master writes). Enabling detail level logging for the drv_ssd
context will log all attempts at writing records that are too big, including replica-writes, immigration (migrations) writes and applying duplicate resolution winners. See “How do I change the write-block-size configuration?” for more information.
aerospike_namespace_fail_xdr_forbidden
Number of read/write commands that failed due to configuration restriction. Error code 22 is returned. This counts any of the traffic rejected due to either of the following:
-
incoming XDR traffic (xdr-write stat) and
allow-xdr-writes
set to false. -
non-XDR write traffic and
allow-nonxdr-writes
set to false.
counter
integer
aerospike_namespace_fail_xdr_key_busy
Number of XDR key-busy errors (code 32) that have occurred. This error is raised if either of the following occurs:
ship-versions-policy
isall
and a new write is attempted before the most recent update to the record successfully shipped to the destination.ship-versions-policy
isinterval
and a new write is attempted before at least one version has shipped in the most recentship-versions-interval
.
counter
integer
aerospike_namespace_fail_xdr_lost_conflict
Number of XDR write commands that did not succeed in updating all the attempted bins. Only a subset of bin updates might have failed or all the bin updates might have failed. This can happen only when the XDR bin convergence feature is enabled. If a conflicting write happens on the same record across two or more data centers, the bin with the earlier last update time will lose during XDR shipping. An XDR retry due to a timeout, where a record that has already been successfully updated at a destination is received again, would fail and this metric will be updated. In other retry scenarios, such as key busy or device busy, the remote record will not be updated. Only a timeout-based retry can lead to this situation. See fail_client_lost_conflict
.
counter
integer
aerospike_namespace_from_proxy_batch_sub_delete_error
Number of batch-index delete subtransactions proxied from another node that failed with an error.
counter
integer
aerospike_namespace_from_proxy_batch_sub_delete_filtered_out
Number of batch-index delete subtransactions proxied from another node that did not happen because the record was filtered out with Filter Expressions.
counter
integer
aerospike_namespace_from_proxy_batch_sub_delete_not_found
Number of batch-index delete subtransactions proxied from another node that resulted in not found.
counter
integer
aerospike_namespace_from_proxy_batch_sub_delete_success
Number of records successfully deleted by batch-index subtransactions proxied from another node.
counter
integer
aerospike_namespace_from_proxy_batch_sub_delete_timeout
Number of batch-index delete subtransactions proxied from another node that timed out.
counter
integer
aerospike_namespace_from_proxy_batch_sub_lang_delete_success
Number of successful batch-index UDF delete subtransactions proxied from another node.
counter
integer
aerospike_namespace_from_proxy_batch_sub_lang_error
Number of language (Lua) batch-index errors for UDF sub-transactions proxied from another node.
counter
integer
aerospike_namespace_from_proxy_batch_sub_lang_read_success
Number of successful batch-index UDF read subtransactions proxied from another node.
counter
integer
aerospike_namespace_from_proxy_batch_sub_lang_write_success
Number of successful batch-index UDF write subtransactions proxied from another node.
counter
integer
aerospike_namespace_from_proxy_batch_sub_read_error
Number of batch-index read sub-transactions proxied from another node that failed with an error.
counter
integer
aerospike_namespace_from_proxy_batch_sub_read_filtered_out
Number of batch-index read subtransactions proxied from another node that did not happen because the record was filtered out with Filter Expressions.
counter
integer
aerospike_namespace_from_proxy_batch_sub_read_not_found
Number of batch-index read subtransactions proxied from another node that resulted in not found.
counter
integer
aerospike_namespace_from_proxy_batch_sub_read_success
Number of records successfully read by batch-index subtransactions proxied from another node.
counter
integer
aerospike_namespace_from_proxy_batch_sub_read_timeout
Number of batch-index read subtransactions proxied from another node that timed out.
counter
integer
aerospike_namespace_from_proxy_batch_sub_tsvc_error
Number of batch-index read subtransactions proxied from another node that failed with an error in the transaction service, before attempting to handle the transaction. For example, protocol errors or security permission mismatch. In strong-consistency
enabled namespaces, this will include transactions against unavailable_partitions
and dead_partitions
.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc
errors happen before records are accessed for reads or writes. They’re counted separately from tsvc
timeouts.
counter
integer
aerospike_namespace_from_proxy_batch_sub_tsvc_timeout
Number of batch-index read subtransactions proxied from another node that timed out in the transaction service, before attempting to handle the transaction.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc
errors happen before records are accessed for reads or writes. They’re counted separately from tsvc
timeouts.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc
errors happen before records are accessed for reads or writes. They’re counted separately from tsvc
timeouts.
counter
integer
aerospike_namespace_from_proxy_batch_sub_udf_complete
Number of completed batch-index UDF subtransactions proxied from another node for scan/query background UDF jobs. See the following statistics for the underlying operation statuses: from_proxy_batch_sub_lang_delete_success
, from_proxy_batch_sub_lang_error
, from_proxy_batch_sub_lang_read_success
, from_proxy_batch_sub_lang_write_success
.
counter
integer
aerospike_namespace_from_proxy_batch_sub_udf_error
Number of failed batch-index UDF subtransactions proxied from another node for scan/query background UDF jobs. Does not include timeouts. See the following statistics for the underlying operation statuses: from_proxy_batch_sub_lang_delete_success
, from_proxy_batch_sub_lang_error
, from_proxy_batch_sub_lang_read_success
, from_proxy_batch_sub_lang_write_success
.
counter
integer
aerospike_namespace_from_proxy_batch_sub_udf_filtered_out
Number of batch-index UDF subtransactions proxied from another node that did not happen because the record was filtered out with Filter Expressions.
counter
integer
aerospike_namespace_from_proxy_batch_sub_udf_timeout
Number of batch-index UDF subtransactions proxied from another node that timed out for scan/query background UDF jobs. See the following statistics for the underlying operation statuses: from_proxy_batch_sub_lang_delete_success
, from_proxy_batch_sub_lang_error
, from_proxy_batch_sub_lang_read_success
, from_proxy_batch_sub_lang_write_success
.
counter
integer
aerospike_namespace_from_proxy_batch_sub_write_error
Number of batch-index write subtransactions proxied from another node that failed with an error.
counter
integer
aerospike_namespace_from_proxy_batch_sub_write_filtered_out
Number of batch-index write subtransactions proxied from another node that did not happen because the record was filtered out with Filter Expressions.
counter
integer
aerospike_namespace_from_proxy_batch_sub_write_success
Number of records successfully written by batch-index subtransactions proxied from another node.
counter
integer
aerospike_namespace_from_proxy_batch_sub_write_timeout
Number of batch-index write subtransactions proxied from another node that timed out.
counter
integer
aerospike_namespace_from_proxy_delete_error
Number of errors for delete transactions proxied from another node. This includes xdr_from_proxy_delete_error
.
counter
integer
aerospike_namespace_from_proxy_delete_filtered_out
Number of delete transactions proxied from another node that did not happen because the record was filtered out with Filter Expressions.
counter
integer
aerospike_namespace_from_proxy_delete_not_found
Number of delete transactions proxied from another node that resulted in not found. This includes xdr_from_proxy_delete_not_found
.
counter
integer
aerospike_namespace_from_proxy_delete_success
Number of successful delete transactions proxied from another node. This includes xdr_from_proxy_delete_success
.
counter
integer
aerospike_namespace_from_proxy_delete_timeout
Number of timeouts for delete transactions proxied from another node. This includes xdr_from_proxy_delete_timeout
.
counter
integer
aerospike_namespace_from_proxy_lang_delete_success
Number of successful UDF delete transactions proxied from another node.
counter
integer
aerospike_namespace_from_proxy_lang_error
Number of language (Lua) errors for UDF transactions proxied from another node.
counter
integer
aerospike_namespace_from_proxy_lang_read_success
Number of successful UDF read commands proxied from another node.
counter
integer
aerospike_namespace_from_proxy_lang_write_success
Number of successful UDF write commands proxied from another node.
counter
integer
aerospike_namespace_from_proxy_read_error
Number of errors for read commands proxied from another node.
counter
integer
aerospike_namespace_from_proxy_read_filtered_out
Number of read commands proxied from another node that did not happen because they were filtered out with Filter Expressions.
counter
integer
aerospike_namespace_from_proxy_read_not_found
Number of read commands proxied from another node that resulted in not found.
counter
integer
aerospike_namespace_from_proxy_read_success
Number of successful read commands proxied from another node.
counter
integer
aerospike_namespace_from_proxy_read_timeout
Number of timeouts for read commands proxied from another node.
counter
integer
aerospike_namespace_from_proxy_tsvc_error
Number of commands proxied from another node that failed in the transaction service, before attempting to handle the commands. For example protocol errors or security permission mismatch. In strong-consistency
enabled namespaces, this will include commands against unavailable_partitions
and dead_partitions
.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc
errors happen before records are accessed for reads or writes. They’re counted separately from tsvc
timeouts.
counter
integer
aerospike_namespace_from_proxy_tsvc_timeout
Number of commands proxied from another node that timed out while in the transaction service, before attempting to handle the commands. At this stage the commands has not yet been identified as a read or a write, but the namespace is known. There could be congestion in the internal transaction queue, or it could be that the timeout set by the client is too aggressive.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc
errors happen before records are accessed for reads or writes. They’re counted separately from tsvc
timeouts.
counter
integer
aerospike_namespace_from_proxy_udf_complete
Number of successful UDF commands proxied from another node.
counter
integer
aerospike_namespace_from_proxy_udf_error
Number of errors for UDF commands proxied from another node.
counter
integer
aerospike_namespace_from_proxy_udf_filtered_out
Number of UDF commands proxied from another node that did not happen because the record was filtered out with Filter Expressions.
counter
integer
aerospike_namespace_from_proxy_udf_timeout
Number of timeouts for UDF commands proxied from another node.
counter
integer
aerospike_namespace_from_proxy_write_error
Number of errors for write commands proxied from another node. This includes xdr_from_proxy_write_error
.
counter
integer
aerospike_namespace_from_proxy_write_filtered_out
Number of write commands proxied from another node that did not happen because the record was filtered out with Filter Expressions.
counter
integer
aerospike_namespace_from_proxy_write_success
Number of successful write commands proxied from another node. This includes xdr_from_proxy_write_success
.
counter
integer
aerospike_namespace_from_proxy_write_timeout
Number of timeouts for write commands proxied from another node. This includes xdr_from_proxy_write_timeout
.
counter
integer
aerospike_namespace_geo_region_query_cells
Number of cell coverings for query region queried.
counter
integer
aerospike_namespace_geo_region_query_falsepos
Number of points outside the region. Total query result points is geo_region_query_points
+ geo_region_query_falsepos
.
gauge
integer
aerospike_namespace_geo_region_query_points
Number of points within the region. Total query result points is geo_region_query_points
+ geo_region_query_falsepos
.
gauge
integer
aerospike_namespace_geo_region_query_reqs
Number of geo queries on the system since the uptime of the node.
counter
integer
aerospike_namespace_hwm_breached
If true, Aerospike has breached ‘high-water-[disk|memory]-pct’ for this namespace.
If hwm_breached
is true, alert your operations group that memory or disk resources are strained. This condition might indicate the need to increase cluster capacity.
gauge
boolean
aerospike_namespace_index-type.mount[ix].age
Applies only to Enterprise Edition configured to index-type
flash
. This shows the percentage of lifetime (total usage) claimed by OEM for underlying device. Value is -1 unless underlying device is NVMe and may exceed 100. ‘ix’ is the device index. For example, storage-engine.file[0]=/opt/aerospike/test0.dat and storage-engine.file[1]=/opt/aerospike/test2.dat for 2 files specified in the configuration.
gauge
integer
aerospike_namespace_index_flash_alloc_bytes
Applies only to Enterprise Edition configured with index-type
flash
. Total bytes allocated on the mount(s) for the primary index used by this namespace on this node. This statistic represents entire 4KiB chunks which have at least one element in use. Also available in the log on the index-flash-usage ticker entry.
gauge
integer
aerospike_namespace_index_flash_alloc_pct
Applies only to Enterprise Edition configured with index-type
flash
. Percentage of the mount(s) allocated for the primary index used by this namespace on this node. Prior to Database 7.0, calculated as (index_flash_alloc_bytes
/ index-type.mounts-size-limit
) * 100. In Database 7.0 and later, calculated as (index_flash_alloc_bytes
/ index-type.mounts-budget
) * 100. This statistic represents entire 4KiB chunks which have at least one element in use. Also available in the log on the index-flash-usage ticker entry.
If index_flash_alloc_pct
gets close to or greater than 100%, alert operations to review the sizing of the namespace.
gauge
integer
aerospike_namespace_index_flash_used_bytes
Applies only to Enterprise Edition configured with index-type
flash
. Total bytes in-use on the mount(s) for the primary index used by this namespace on this node. This is the same value memory_used_index_bytes
would have if the index were not persisted.
gauge
integer
aerospike_namespace_index_flash_used_pct
Applies only to Enterprise Edition configured with index-type
flash
. Percentage of the mount(s) in-use for the primary index used by this namespace on this node. Calculated as (index_flash_used_bytes
/ index-type.mounts-size-limit
) * 100.
gauge
integer
aerospike_namespace_index_mounts_used_pct
Applies only to Enterprise Edition configured with index-type
pmem
or flash
. Percentage of the mount(s) in-use for the primary index used by this namespace on this node.
gauge
integer
aerospike_namespace_index_pmem_used_bytes
Applies only to Enterprise Edition configured with index-type
pmem
. Total bytes in-use on the mount(s) for the primary index used by this namespace on this node. This is the same value memory_used_index_bytes
would have if the index were not persisted.
gauge
integer
aerospike_namespace_index_pmem_used_pct
Applies only to Enterprise Edition configured with index-type
pmem
. Percentage of the mount(s) in-use for the primary index used by this namespace on this node. Calculated as (index_pmem_used_bytes
/ index-type.mounts-size-limit
) * 100
gauge
integer
aerospike_namespace_index_used_bytes
Amount of memory occupied by the primary index for this namespace. Applies to all types of index storage (index-type
.
gauge
integer
aerospike_namespace_indexes_memory_used_pct
Combined RAM indexes’ size as a percentage of indexes-memory-budget
when indexes-memory-budget
is configured nonzero.
gauge
integer
aerospike_namespace_master_tombstones
Number of tombstones on this node which are active masters.
gauge
integer
aerospike_namespace_max-evicted-ttl
The highest record TTL that Aerospike has evicted from this namespace.
gauge
integer
aerospike_namespace_max_void_time
Maximum record TTL ever inserted into this namespace.
gauge
integer
aerospike_namespace_memory_free_pct
Percentage of memory capacity free for this namespace.
If memory_free_pct
approaches the configured value for high-water-memory-pct
or stop-writes-pct
, alert operations to investigate the cause. Might indicate a need to reduce the object count or increase capacity and may require further investigation into memory_used_sindex_bytes
if secondary indexes are in use, into memory_used_set_index_bytes
if set indexes are used, or into heap_efficiency_pct
if data is stored in memory.
gauge
integer
aerospike_namespace_memory_used_bytes
Total bytes of memory used by this namespace on this node. Used against the [high-water-memory-pct
](/database/reference/config#namespace__high-water-memory-p\ ct) and stop-writes-pct
thresholds. It represents the sum of the following values:
memory_used_data_bytes
memory_used_index_bytes
memory_used_set_index_bytes
(Database 5.6 and later)
memory_used_sindex_bytes
See heap_allocated_kbytes
for the total amount of memory allocated on a node other than primary index shared memory in Enterprise Edition and, for Database 6.1 and later, secondary index shared memory in Enterprise Edition.
Trending used-bytes-memory
provides operations insight into how memory usage changes over time for this namespace.
gauge
integer
aerospike_namespace_memory_used_data_bytes
Amount of memory occupied by data. See memory_used_bytes
for the total memory accounted for the namespace.
gauge
integer
aerospike_namespace_memory_used_index_bytes
Amount of memory occupied by the index for this namespace. Allocated in shared memory by default (index-type
shmem
) for the Enterprise Edition.
If your index is persisted, either in block storage (index-type
flash
, or in persistent memory (index-type
pmem
, (Database 4.5 and later), refer instead to index_flash_used_bytes
or index_pmem_used_bytes
. For these persisted index configurations, the value of memory_used_index_bytes
is 0
.
See memory_used_bytes
for the total memory accounted for the namespace.
gauge
integer
aerospike_namespace_memory_used_set_index_bytes
Amount of memory occupied by set indexes for this namespace on this node. See memory_used_bytes
for the total memory accounted for the namespace.
gauge
integer
aerospike_namespace_memory_used_sindex_bytes
Amount of memory occupied by secondary indexes for this namespace on this node. See memory_used_bytes
for the total memory accounted for the namespace.
gauge
integer
aerospike_namespace_migrate_fresh_partitions
Number of partitions that are created fresh or empty because a number of nodes, greater than the replication factor, have left the cluster. Applies to AP and SC namespaces.
gauge
integer
aerospike_namespace_migrate_record_receives
Number of record insert request received by immigration.
counter
integer
aerospike_namespace_migrate_record_retransmits
Number of times emigration has retransmitted records.
counter
integer
Retransmission statistics are collected in the retransmits
ticker log line.
aerospike_namespace_migrate_records_skipped
Number of times emigration did not ship a record because the remote node was already up-to-date.
counter
integer
aerospike_namespace_migrate_records_transmitted
Number of records emigration has read and sent.
counter
integer
aerospike_namespace_migrate_records_unreadable
Number of records skipped during migration because they were unreadable when migrate-skip-unreadable
is enabled.
counter
integer
aerospike_namespace_migrate_rx_instance_count
Number of instance objects managing immigrations.
gauge
integer
aerospike_namespace_migrate_rx_partitions_active
Number of partitions currently immigrating to this node. If migrate_rx_partitions_active
is greater than 0 and cluster is not in maintenance, Operations needs to identify why migrations are running.
gauge
integer
aerospike_namespace_migrate_rx_partitions_initial
Total number of migrations this node will receive during the current migration cycle for this namespace.
gauge
integer
aerospike_namespace_migrate_rx_partitions_remaining
Number of migrations this node has not yet received during the current migration cycle for this namespace.
gauge
integer
aerospike_namespace_migrate_signals_active
For finished partition migrations on this node, number of outstanding clean-up signals, sent to participating member nodes, waiting for clean-up acknowledgment. Signals are messages that are sent from a partition’s master node to all other nodes that currently have data for the partition. The signals are used to notify all nodes that migrations have completed for this partitions and if they aren’t a replica they can now drop the partition.
gauge
integer
aerospike_namespace_migrate_signals_remaining
For unfinished partition migrations on this node, number of clean-up signals to send to participating member nodes, as migration completes. Signals are messages that are sent from a partition’s master node to all other nodes that currently have data for the partition. The signals are used to notify all nodes that migrations have completed for this partitions and if they aren’t a replica they can now drop the partition.
gauge
integer
aerospike_namespace_migrate_tx_instance_count
Number of instance objects managing emigrations.
gauge
integer
aerospike_namespace_migrate_tx_partitions_active
Number of partitions currently emigrating from this node. If migrate_tx_partitions_active
is greater than 0 and cluster is not in maintenance, Operations needs to identify why migrations are running.
gauge
integer
aerospike_namespace_migrate_tx_partitions_imbalance
Number of partition migrations failures which could lead to partitions being imbalanced. For each increment there will also be a warning logged.
counter
integer
aerospike_namespace_migrate_tx_partitions_initial
Total number of migrations this node will send during the current migration cycle for this namespace.
gauge
integer
aerospike_namespace_migrate_tx_partitions_lead_remaining
Number of initially scheduled emigrations which are not delayed by the migrate-fill-delay
configuration. Lead migrations are typically delta-migrations addressing non-empty partition replica nodes. Delta-migrations generally consume far less storage IO.
gauge
integer
aerospike_namespace_migrate_tx_partitions_remaining
Number of migrations this node not yet sent during the current migration cycle for this namespace.
gauge
integer
aerospike_namespace_mrt_monitor_roll_back_error
Subset of mrt_roll_back_error
where monitor did the roll back.
gauge
integer
aerospike_namespace_mrt_monitor_roll_back_success
Subset of mrt_roll_back_success
where monitor did the roll back.
gauge
integer
aerospike_namespace_mrt_monitor_roll_back_timeout
Subset of mrt_roll_back_timeout
where monitor did the roll back.
gauge
integer
aerospike_namespace_mrt_monitor_roll_forward_error
Subset of mrt_roll_forward_error
where monitor did the roll forward.
gauge
integer
aerospike_namespace_mrt_monitor_roll_forward_success
Subset of mrt_roll_forward_success
where monitor did the roll forward.
gauge
integer
aerospike_namespace_mrt_monitor_roll_forward_timeout
Subset of mrt_roll_forward_timeout
where monitor did the roll forward.
gauge
integer
aerospike_namespace_mrt_monitor_roll_tombstone_creates
Number of times monitor transactions rolls (forward or back) generate tombstones from nothing – this is rare but normal.
gauge
integer
aerospike_namespace_mrt_monitors_active
Number of transactions currently being driven by a monitor.
gauge
integer
aerospike_namespace_mrt_provisionals
Number of provisional records in a transaction.
gauge
integer
aerospike_namespace_mrt_roll_back_error
Number of roll back transactions that failed.
gauge
integer
aerospike_namespace_mrt_roll_back_success
Number of roll back transactions that succeeded.
gauge
integer
aerospike_namespace_mrt_roll_back_timeout
Number of roll back transactions that timed out.
gauge
integer
aerospike_namespace_mrt_roll_forward_error
Number of roll forward transactions that failed.
gauge
integer
aerospike_namespace_mrt_roll_forward_success
Number of roll forward transactions that succeeded.
gauge
integer
aerospike_namespace_mrt_roll_forward_timeout
Number of roll forward transactions that timed out.
gauge
integer
aerospike_namespace_mrt_verify_read_error
Number of verify read commands that failed.
gauge
integer
aerospike_namespace_mrt_verify_read_success
Number of verify read commands that succeeded
gauge
integer
aerospike_namespace_mrt_verify_read_timeout
Number of verify read commands that timed out.
gauge
integer
aerospike_namespace_nodes_quiesced
The number of nodes observed to be quiesced as of the most recent reclustering event. If a single node received the quiesce
command, on the subsequent reclustering event, all nodes return 1 for this metric, and when the quiesced node is shutdown, triggering a new reclustering event, this metric returns to 0.
gauge
integer
aerospike_namespace_non_expirable_objects
Number of records in this namespace with non-expirable TTLs (TTLs of value 0).
gauge
integer
aerospike_namespace_non_replica_objects
Number of records on this node which are neither master nor replicas. This number is non-zero during migration, representing additional versions or copies of records. Those are records beyond the replication factor line and would be potentially used during migrations to duplicate resolve. This is not true for quiesced nodes, which retain their partitions after migrations have completed.
gauge
integer
aerospike_namespace_non_replica_tombstones
Number of tombstones on this node which are neither master nor replicas. This number is non-zero only during migration. This is not true for quiesced nodes, which retain their partitions after migrations have completed.
gauge
integer
aerospike_namespace_nsup_cycle_deleted_pct
Percent of records removed by NSUP in its last cycle.
gauge
float
nsup_cycle_deleted_pct
is calculated when the NSUP (Namespace SUPervisor) cycle finishes (nsup-done
is logged). It is calculated based on the total objects present at the beginning of the NSUP cycle and the number of objects that got deleted in that cycle (nsup_cycle_deleted_pct
= (objects removed by NSUP in its last cycle * 100) / number of total objects when the NSUP cycle started [expirable + non expirable]).
The calculation was different in older versions (it was changed in versions 6.3.0.21, 6.4.0.15, 7.0.0.8 and 7.1.0.0). In those older versions, nsup_cycle_deleted_pct
was calculated based on the total objects present after the NSUP cycle finished and the number of objects that got deleted in that cycle.
This led to 2 special cases when its value turned up to 100:
- When the number of objects is 0 after the NSUP cycle is finished, i.e., all objects get deleted, OR the number of objects deleted in the cycle is greater than or equal to the number of objects left.
- If NSUP is enabled for a namespace, i.e.
nsup-period
is greater than 0, and there is 0 record in the namespace.
aerospike_namespace_nsup_cycle_duration
Length of the last NSUP cycle in seconds.
gauge
integer
aerospike_namespace_nsup_xdr_key_busy
Number of NSUP deletes (expirations and evictions) that had to wait for a previous version to ship. This error is raised if either of the following occurs:
ship-versions-policy
isall
and the most recent update to the record has not yet successfully shipped to the destination.ship-versions-policy
isinterval
and XDR hasn’t successfully shipped at least one version of the record in the most recent ship-versions-interval in seconds.
counter
integer
aerospike_namespace_objects
Number of records in this namespace for this node. Includes non-replica. Does not include tombstones.
Trending objects
provides operations insight into this namespace’s record fluctuations over time.
gauge
integer
aerospike_namespace_ops_sub_tsvc_error
Number of times a background query operate command failed to access a record. For example, due to protocol or permission errors. Does not include timeouts. In strong-consistency
enabled namespaces, this includes attempts to access records in unavailable_partitions
and dead_partitions
.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc
errors happen before records are accessed for reads or writes. They’re counted separately from tsvc
timeouts.
counter
integer
aerospike_namespace_ops_sub_tsvc_timeout
Number of records accessed by a background query operate command that timed out in the transaction service.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc
errors happen before records are accessed for reads or writes. They’re counted separately from tsvc
timeouts.
counter
integer
aerospike_namespace_ops_sub_write_error
Number of records accessed by a background query operate command write subtransactions that failed with an error. Does not include timeouts.
counter
integer
aerospike_namespace_ops_sub_write_filtered_out
Number of records accessed by a background query operate command write subtransactions for which the write did not happen because the record was filtered out with Filter Expressions.
counter
integer
aerospike_namespace_ops_sub_write_success
Number of successful records accessed by a background query operate command write subtransactions.
counter
integer
aerospike_namespace_ops_sub_write_timeout
Number of records accessed by a background query operate command write subtransactions that timed out.
counter
integer
aerospike_namespace_pending_quiesce
Reports ‘true’ when the quiesce
info command has been received by a node, or if stay-quiesced
is true for the node. When true, the next clustering event will cause this node to quiesce. To trigger a clustering event, issue the recluster
info command. To disable, issue the quiesce-undo
info command.
gauge
integer
aerospike_namespace_pi_query_aggr_abort
Number of primary index query aggregations that were aborted.
counter
integer
aerospike_namespace_pi_query_aggr_complete
Number of primary index query aggregations that completed.
counter
integer
aerospike_namespace_pi_query_aggr_error
Number of primary index query aggregations that failed.
Compare pi_query_aggr_error
to pi_query_aggr_complete
.
If ratio is higher than acceptable, alert operations to investigate.
counter
integer
aerospike_namespace_pi_query_long_basic_abort
Number of basic long primary index queries that were aborted.
counter
integer
aerospike_namespace_pi_query_long_basic_complete
Number of basic long primary index queries that completed.
counter
integer
aerospike_namespace_pi_query_long_basic_error
Number of basic long primary index queries that failed.
Compare pi_query_long_basic_error
to pi_query_long_basic_complete
.
If ratio is higher than acceptable, alert operations to investigate.
counter
integer
aerospike_namespace_pi_query_ops_bg_abort
Number of ops background primary index queries that were aborted.
counter
integer
aerospike_namespace_pi_query_ops_bg_complete
Number of ops background primary index queries that completed.
counter
integer
aerospike_namespace_pi_query_ops_bg_error
Number of ops background primary index queries that failed.
Compare pi_query_ops_bg_error
to pi_query_ops_bg_complete
and If ratio is higher than acceptable, alert operations to investigate.
counter
integer
aerospike_namespace_pi_query_short_basic_complete
Number of basic short primary index queries that completed.
counter
integer
aerospike_namespace_pi_query_short_basic_error
Number of basic short primary index queries that failed.
Compare pi_query_short_basic_error
to pi_query_short_basic_complete
.
If ratio is higher than acceptable, alert operations to investigate.
counter
integer
aerospike_namespace_pi_query_short_basic_timeout
Short primary index queries are not monitored, so they cannot be aborted. They might time out, which is reflected in this statistic.
counter
integer
aerospike_namespace_pi_query_udf_bg_abort
Number of UDF background primary index queries that were aborted.
counter
integer
aerospike_namespace_pi_query_udf_bg_complete
Number of UDF background primary index queries that completed.
counter
integer
aerospike_namespace_pi_query_udf_bg_error
Number of UDF background queries that failed.
Compare pi_query_udf_bg_error
to pi_query_udf_bg_complete
.
If ratio is higher than acceptable, alert operations to investigate.
counter
integer
aerospike_namespace_pmem_available_pct
Measures the minimum contiguous pmem storage file space across all such files in a namespace. The namespace will be read only (stop writes) if this value falls below min-avail-pct
. It is important for all configured pmem storage files in a namespace to have the same size, otherwise, the pmem_available_pct
could be low even when a lot of space is available across other files.
If pmem_available_pct
drops below 20%, warn your operations group.
This condition might indicate that defrag is unable to keep up with the current load.
If pmem_available_pct
drops below 15%, critical ALERT.
If pmem_available_pct
drops below 5%, usable PMem resources are critically low. This condition might result in stop_writes
.
gauge
integer
Not to be confused with pmem_free_pct
which represents the amount of free space across all PMem storage files in a namespace and does not take account of the fragmentation.
Here is an example to represent the difference between pmem_free_pct
and pmem_available_pct
. Assume 5 files of 96MiB each for a given namespace, where each file has 24MiB of data that are spread across 6 write-blocks (with the 8MiB write-block-size):
- The pmem_free_pct
would be 75%. - The pmem_available_pct
would be 50%. - If the distribution is not uniform (it usually is not perfectly uniform) the pmem_available_pct
would represent the file that has the least free blocks.
aerospike_namespace_pmem_compression_ratio
Measures the average compressed size to uncompressed size ratio for PMem storage. 1.000
indicates no compression and 0.100
indicates a 1:10
compression ratio (90% reduction in size). pmem_compression_ratio
is not included if the compression
configuration parameter is set to none
.
moving average
integer
The compression ratio is a moving average, calculated based on the most recently written records. Read records do not factor into the ratio. If the written data changes over time then the compression ratio will change with it. In case of a sudden change in data, the indicated compression ratio may lag behind a bit. As a rule of thumb, assume that the compression ratio covers the most recently written 100,000 to 1,000,000 records.
aerospike_namespace_pmem_free_pct
Percentage of pmem storage capacity free for this namespace. This is the amount of free storage across all pmem storage files in the namespace. Evictions will be triggered when the used percentage across all storage files (which is represented by 100 - pmem_free_pct
) crosses the configured high-water-disk-pct
.
gauge
integer
Not to be confused with pmem_available_pct
which represents the amount of free contiguous space on the PMem storage file that has the least contiguous free space across the namespace.
Here is an example to represent the difference between pmem_free_pct
and pmem_available_pct
. Assume 5 files of 96MiB each for a given namespace, where each file has 24MiB of data that are spread across 6 write-blocks (with the 8MiB write-block size):
- The pmem_free_pct
would be 75%. - The pmem_available_pct
would be 50%. - If the distribution is not uniform (it usually is not perfectly uniform) the pmem_available_pct
would represent the file that has the least free blocks.
aerospike_namespace_pmem_total_bytes
Total bytes of pmem storage file space allocated to this namespace on this node.
gauge
integer
aerospike_namespace_pmem_used_bytes
Total bytes of pmem storage file space used by this namespace on this node.
Trending pmem_used_bytes
provides operations insight into how pmem storage usage changes over time for this namespace.
gauge
aerospike_namespace_prole_objects
Number of records on this node which are proles (replicas). Does not include tombstones.
gauge
integer
aerospike_namespace_prole_tombstones
Number of tombstones on this node which are proles (replicas) on this node.
gauge
integer
aerospike_namespace_query_agg
Number of query aggregations attempted. Removed in Database 5.7. Use query_aggr_complete
+ query_aggr_error
+ query_aggr_abort
instead.
counter
integer
aerospike_namespace_query_agg_abort
Number of query aggregations aborted by the user seen by this node. Renamed to query_aggr_abort
in Database 5.7.
counter
integer
aerospike_namespace_query_agg_avg_rec_count
Average number of records returned by the aggregations underlying query. Renamed to query_aggr_avg_rec_count
in Database 5.7.
gauge
integer
aerospike_namespace_query_agg_error
Number of query aggregations errors due to an internal error. Renamed to query_aggr_error
in Database 5.7.
counter
integer
aerospike_namespace_query_agg_success
Number of query aggregations completed. Renamed to query_aggr_complete
in Database 5.7.
counter
integer
aerospike_namespace_query_aggr_abort
Number of query aggregations aborted by the user seen by this node. Removed in Database 6.0, use si_query_aggr_abort
.
counter
integer
aerospike_namespace_query_aggr_avg_rec_count
Average number of records returned by the aggregations underlying query.
gauge
integer
aerospike_namespace_query_aggr_complete
Number of query aggregations completed. Removed in Database 6.0, use si_query_aggr_complete
.
counter
integer
aerospike_namespace_query_aggr_error
Number of query aggregation errors due to an internal error. Removed in Database 6.0, use si_query_aggr_error
.
counter
integer
aerospike_namespace_query_basic_abort
Number of secondary index basic queries that were aborted by a user. Removed in Database 6.0, use si_query_long_basic_abort
.
counter
integer
aerospike_namespace_query_basic_avg_rec_count
Average number of records returned by all secondary index basic queries.
gauge
integer
aerospike_namespace_query_basic_complete
Number of secondary index basic queries which completed successfully.
counter
integer
aerospike_namespace_query_basic_error
Number of secondary index basic queries that returned an error. Removed in Database 6.0, use si_query_long_basic_error
.
counter
integer
aerospike_namespace_query_fail
Number of queries which failed due to an internal error. Those are failures not part of query lookup (see query_lookup_error
), query aggregation (see query_agg_error
) or query background UDF (see query_udf_bg_failure
).
counter
aerospike_namespace_query_false_positives
Number of entries that were shortlisted from the secondary index but the bin values are not matching the query clause. This might happen when the bin value changes during query execution.
counter
integer
aerospike_namespace_query_long_queue_full
Number of long running queries queue full errors.
counter
integer
aerospike_namespace_query_long_reqs
Number of long running queries currently in process.
gauge
integer
aerospike_namespace_query_lookup_abort
Number of user aborted secondary index queries. Renamed to query_basic_abort
in Database 5.7.
counter
integer
aerospike_namespace_query_lookup_avg_rec_count
Average number of records returned by all secondary index query look-ups. Renamed to query_basic_avg_rec_count
in Database 5.7.
gauge
integer
aerospike_namespace_query_lookup_error
Number of secondary index query look-up errors. Renamed to query_basic_error
in Database 5.7.
counter
integer
aerospike_namespace_query_lookup_success
Number of secondary index look-ups which succeeded. Renamed to query_basic_complete
in Database 5.7.
counter
integer
aerospike_namespace_query_lookups
Number of secondary index lookups attempted. Removed in Database 5.7. Use query_basic_complete
+ query_basic_error
+ query_basic_abort
instead.
counter
integer
aerospike_namespace_query_ops_bg_abort
Number of ops background queries that were aborted. Removed in Database 6.0, use si_query_ops_bg_abort
.
counter
integer
aerospike_namespace_query_ops_bg_complete
Number of ops background queries that completed. Removed in Database 6.0, use si_query_ops_bg_complete
.
counter
integer
aerospike_namespace_query_ops_bg_error
Number of ops background queries that returned error. Removed in Database 6.0, use si_query_ops_bg_error
.
counter
integer
aerospike_namespace_query_ops_bg_failure
Number of ops background queries that failed. Removed from Database 5.7 and later, use query_ops_bg_error
+ query_ops_bg_abort
instead.
counter
integer
aerospike_namespace_query_ops_bg_success
Number of ops background queries that completed. Renamed to query_ops_bg_complete
in Database 5.7.
counter
integer
aerospike_namespace_query_proto_compression_ratio
Measures the average compressed size to uncompressed size ratio for protocol message data in query responses to the client. Thus 1.000
indicates no compression and 0.100
indicates a 1:10
compression ratio (90% reduction in size).
moving average
decimal
The compression ratio is a moving average. It is calculated based on the most recent client responses. If the response message data changes over time then the compression ratio will change with it. In case of a sudden change in response data, the indicated compression ratio may lag behind a bit. As a rule of thumb, assume that the compression ratio covers the most recent 100,000 to 1,000,000 client responses.
aerospike_namespace_query_proto_uncompressed_pct
Measures the percentage of query responses to the client with uncompressed protocol message data. Thus 0.000
indicates all responses with compressed data, and 100.000
indicates no responses with compressed data. For example, if protocol message data compression is not used, this metric will remain set to 0.000
. If protocol message data compression is then turned on and all responses are compressed, this metric will remain set to 0.000
. The only way this metric will ever be set to a value different than 0.000
is if compression is used, but some responses are not compressed (which happens when the uncompressed size is so small that the server does not try to compress, or when the compression fails).
gauge
instantaneous
The percentage is a moving average. It is calculated based on the most recent client responses. If the response message data changes over time then the percentage will change with it. In case of a sudden change in response data, the indicated percentage may lag behind a bit. As a rule of thumb, assume that the percentage covers the most recent 100,000 to 1,000,000 client responses.
aerospike_namespace_query_reqs
Number of query requests ever attempted on this node. Even very early failures would be counted here, as opposed to query_short_running
and query_long_running
which would increment a bit later.
counter
aerospike_namespace_query_short_queue_full
Number of short running queries queue full errors.
counter
integer
aerospike_namespace_query_short_reqs
Number of short running queries currently in process.
gauge
integer
aerospike_namespace_query_udf_bg_abort
Number of UDF background queries that were aborted. Removed in Database 6.0, use si_query_udf_bg_abort
.
counter
integer
aerospike_namespace_query_udf_bg_complete
Number of UDF background queries that completed. Removed in Database 6.0, use si_query_udf_bg_complete
.
counter
integer
aerospike_namespace_query_udf_bg_error
Number of UDF background queries which returned error. Removed in Database 6.0, use si_query_udf_bg_error
.
counter
integer
aerospike_namespace_query_udf_bg_failure
Number of UDF background queries that failed. Removed from Database 5.7 and later, use query_udf_bg_error
+ query_udf_bg_abort
instead.
counter
integer
aerospike_namespace_query_udf_bg_success
Number of UDF background queries that completed. Renamed to query_udf_bg_complete
in Database 5.7.
counter
integer
aerospike_namespace_re_repl_error
Number of re-replication errors which were not timeout. Re-replications would happen for namespaces operating under the strong-consistency
mode when a record does not successfully replicate on the initial attempt.
counter
integer
aerospike_namespace_re_repl_success
Number of successful re-replications. Re-replications would happen for namespaces operating under the strong-consistency
mode when a record does not successfully replicate on the initial attempt.
counter
integer
aerospike_namespace_re_repl_timeout
Number of re-replications that ended in timeout. Re-replications would happen for namespaces operating under the strong-consistency
mode when a record does not successfully replicate on the initial attempt. Starting with Database 6.3 this stat only counts timeouts that happened during the actual re-replication.
counter
integer
The transaction-ttl of a re-replication is 1 second by default (configurable through the transaction-max-ms
configuration parameter.
aerospike_namespace_re_repl_tsvc_error
Number of re-replication errors happening in the transaction queue which were not re_repl_tsvc_timeout
(before the re-replication attempt). Re-replications occur for namespaces operating under strong-consistency
mode when a record does not successfully replicate on the initial attempt.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc
errors happen before records are accessed for reads or writes. They’re counted separately from tsvc
timeouts.
counter
integer
aerospike_namespace_re_repl_tsvc_timeout
Number of re-replications that time out early in the internal transaction queue, while waiting to be picked up by a service thread. Re-replications occur for namespaces operating under strong-consistency
mode when a record does not successfully replicate on the initial attempt.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc
errors happen before records are accessed for reads or writes. They’re counted separately from tsvc
timeouts.
counter
integer
aerospike_namespace_record_proto_compression_ratio
Measures the average compressed size to uncompressed size ratio for protocol message data in single-record transaction client responses. Thus 1.000
indicates no compression and 0.100
indicates a 1:10
compression ratio (90% reduction in size).
gauge
decimal
The compression ratio is a moving average. It is calculated based on the most recent client responses. If the response message data changes over time then the compression ratio will change with it. In case of a sudden change in response data, the indicated compression ratio may lag behind a bit. As a rule of thumb, assume that the compression ratio covers the most recent 100,000 to 1,000,000 client responses.
aerospike_namespace_record_proto_uncompressed_pct
Measures the percentage of single-record transaction client responses with uncompressed protocol message data. Thus 0.000
indicates all responses with compressed data, and 100.000
indicates no responses with compressed data. For example, if protocol message data compression is not used, this metric will remain set to 0.000
. If protocol message data compression is then turned on and all responses are compressed, this metric will remain set to 0.000
. The only way this metric will ever be set to a value different than 0.000
is if compression is used, but some responses are not compressed (which happens when the uncompressed size is so small that the server does not try to compress, or when the compression fails).
moving average
decimal
The percentage is a moving average. It is calculated based on the most recent client responses. If the response message data changes over time then the percentage will change with it. In case of a sudden change in response data, the indicated percentage may lag behind a bit. As a rule of thumb, assume that the percentage covers the most recent 100,000 to 1,000,000 client responses.
aerospike_namespace_retransmit_all_batch_sub_delete_dup_res
Number of retransmits that occurred during batch delete subtransactions that were being duplicate-resolved. Includes retransmits originating on the client as well as proxying nodes.
counter
integer
Retransmission statistics are collected in the retransmits
ticker log line.
aerospike_namespace_retransmit_all_batch_sub_delete_repl_write
Number of retransmits that occurred during batch delete subtransactions that were being replica-written. Includes retransmits originating on the client as well as proxying nodes.
counter
integer
:Retransmission statistics are collected in the retransmits
ticker log line.
aerospike_namespace_retransmit_all_batch_sub_dup_res
Obsolete as of Database 6.0. In case of a failure to replicate a write transaction across all replicas, the record will be left in the ‘un-replicated’ state, forcing a ‘re-replication’ transaction prior to any subsequent read or write transaction on the record.
Number of retransmits that occurred during batch subtransactions that were being duplicate-resolved. Includes retransmits originating on the client as well as proxying nodes.
counter
integer
Starting with Database 6.0 when batch-writes were introduced, “repl-write retransmits” for batch writes are counted as “dup-res retransmits” which are included in the metric retransmit_all_batch_sub_dup_res
.
aerospike_namespace_retransmit_all_batch_sub_read_dup_res
Number of retransmits that occurred during batch read subtransactions that were being duplicate-resolved. Includes retransmits originating on the client as well as proxying nodes.
counter
integer
Retransmission statistics are collected in the retransmits
ticker log line.
aerospike_namespace_retransmit_all_batch_sub_read_repl_ping
Number of retransmits that occurred during SC linearized read subtransactions within batched commands. Includes retransmits originating on the client as well as proxying nodes.
counter
integer
Retransmission statistics are collected in the retransmits
ticker log line.
aerospike_namespace_retransmit_all_batch_sub_udf_dup_res
Number of retransmits that occurred during batch UDF subtransactions that were being duplicate-resolved. Includes retransmits originating on the client as well as proxying nodes.
counter
integer
Retransmission statistics are collected in the retransmits
ticker log line.
aerospike_namespace_retransmit_all_batch_sub_udf_repl_write
Number of retransmits that occurred during batch UDF subtransactions that were being replica-written. Includes retransmits originating on the client as well as proxying nodes.
counter
integer
Retransmission statistics are collected in the retransmits
ticker log line.
aerospike_namespace_retransmit_all_batch_sub_write_dup_res
Number of retransmits that occurred during batch write subtransactions that were being duplicate-resolved. Includes retransmits originating on the client as well as proxying nodes.
counter
integer
Retransmission statistics are collected in the retransmits
ticker log line.
aerospike_namespace_retransmit_all_batch_sub_write_repl_write
Number of retransmits that occurred during batch write (insert/update/upsert/replace) subtransactions that were being replica-written. Includes retransmits originating on the client as well as proxying nodes.
counter
integer
Retransmission statistics are collected in the retransmits
ticker log line.
aerospike_namespace_retransmit_all_delete_dup_res
Number of retransmits that occurred during delete transactions that were being duplicate-resolved. Includes retransmits originating on the client as well as proxying nodes.
counter
integer
Retransmission statistics are collected in the retransmits
ticker log line.
aerospike_namespace_retransmit_all_delete_repl_write
Number of retransmits that occurred during delete transactions that were being replica written. Includes retransmits originating on the client as well as proxying nodes.
counter
integer
Retransmission statistics are collected in the retransmits
ticker log line.
aerospike_namespace_retransmit_all_read_dup_res
Number of retransmits that occurred during read commands that were being duplicate-resolved. Includes retransmits originating on the client as well as proxying nodes.
counter
integer
Retransmission statistics are collected in the retransmits
ticker log line.
aerospike_namespace_retransmit_all_read_repl_ping
Number of retransmits that occurred during SC linearized reads. Includes retransmits originating on the client as well as proxying nodes.
counter
integer
Retransmission statistics are collected in the retransmits
ticker log line.
aerospike_namespace_retransmit_all_udf_dup_res
Number of retransmits that occurred during client initiated UDF transactions that were being duplicate-resolved. Includes retransmits originating on the client as well as proxying nodes.
counter
integer
Retransmission statistics are collected in the retransmits
ticker log line.
aerospike_namespace_retransmit_all_udf_repl_write
Number of retransmits that occurred during client initiated UDF transactions that were being replica written. Includes retransmits originating on the client as well as proxying nodes.
counter
integer
Retransmission statistics are collected in the retransmits
ticker log line.
aerospike_namespace_retransmit_all_write_dup_res
Number of retransmits that occurred during write transactions that were being duplicate-resolved. Includes retransmits originating on the client as well as proxying nodes.
counter
integer
Retransmission statistics are collected in the retransmits
ticker log line.
aerospike_namespace_retransmit_all_write_repl_write
Number of retransmits that occurred during write transactions that were being replica written. Includes retransmits originating on the client as well as proxying nodes.
counter
integer
Retransmission statistics are collected in the retransmits
ticker log line.
aerospike_namespace_retransmit_nsup_repl_write
Number of retransmits that occurred during NSUP initiated delete transactions that were being replica written.
counter
integer
Retransmission statistics are collected in the retransmits
ticker log line.
aerospike_namespace_retransmit_ops_sub_dup_res
Number of retransmits that occurred during write subtransactions of background ops scan/query jobs that were being duplicate-resolved.
counter
integer
Retransmission statistics are collected in the retransmits
ticker log line.
aerospike_namespace_retransmit_ops_sub_repl_write
Number of retransmits that occurred during write subtransactions of background ops scan/query jobs that were being replica written.
counter
integer
Retransmission statistics are collected in the retransmits
ticker log line.
aerospike_namespace_retransmit_udf_sub_dup_res
Number of retransmits that occurred during UDF subtransactions of scan/query background UDF jobs that were being duplicate-resolved.
counter
integer
Retransmission statistics are collected in the retransmits
ticker log line.
aerospike_namespace_retransmit_udf_sub_repl_write
Number of retransmits that occurred during UDF subtransactions of scan/query background UDF jobs that were being replica written.
counter
integer
Retransmission statistics are collected in the retransmits
ticker log line.
aerospike_namespace_scan_aggr_abort
Number of scan aggregations that were aborted. Removed in Database 6.0, use pi_query_aggr_abort
.
counter
integer
aerospike_namespace_scan_aggr_complete
Number of scan aggregations that completed. Removed in Database 6.0, use pi_query_aggr_complete
.
counter
integer
aerospike_namespace_scan_aggr_error
Number of scan aggregations that failed.
Compare scan_aggr_error
to scan_aggr_complete
.
If ratio is higher than acceptable, alert operations to investigate. Removed in Database 6.0, use pi_query_aggr_error
.
counter
integer
aerospike_namespace_scan_basic_abort
Number of basic scans that were aborted. Removed in Database 6.0, use pi_query_long_basic_abort
.
counter
integer
aerospike_namespace_scan_basic_complete
Number of basic scans that completed. Removed in Database 6.0, use pi_query_long_basic_complete
.
counter
integer
aerospike_namespace_scan_basic_error
Number of basic scans that failed.
Compare scan_basic_error
to scan_basic_complete
.
If ratio is higher than acceptable, alert operations to investigate. Removed in Database 6.0, use pi_query_long_basic_error
.
counter
integer
aerospike_namespace_scan_ops_bg_abort
Number of ops background scans that were aborted. Removed in Database 6.0, use pi_query_ops_bg_abort
.
counter
integer
aerospike_namespace_scan_ops_bg_complete
Number of ops background scans that completed. Removed in Database 6.0, use pi_query_ops_bg_complete
.
counter
integer
aerospike_namespace_scan_ops_bg_error
Number of ops background scans that failed.
Compare scan_ops_bg_error
to scan_ops_bg_complete
and If ratio is higher than acceptable alert operations to investigate. Removed in Database 6.0, use pi_query_ops_bg_error
.
counter
integer
aerospike_namespace_scan_proto_compression_ratio
Measures the average compressed size to uncompressed size ratio for protocol message data in basic scan or aggregation scan client responses. Thus 1.000
indicates no compression and 0.100
indicates a 1:10
compression ratio (90% reduction in size).
moving average
decimal
The compression ratio is a moving average. It is calculated based on the most recent client responses. If the response message data changes over time then the compression ratio will change with it. In case of a sudden change in response data, the indicated compression ratio may lag behind a bit. As a rule of thumb, assume that the compression ratio covers the most recent 100,000 to 1,000,000 client responses.
aerospike_namespace_scan_proto_uncompressed_pct
Measures the percentage of basic scan or aggregation scan client responses with uncompressed protocol message data. Thus 0.000
indicates all responses with compressed data, and 100.000
indicates no responses with compressed data. For example, if protocol message data compression is not used, this metric will remain set to 0.000
. If protocol message data compression is then turned on and all responses are compressed, this metric will remain set to 0.000
. The only way this metric will ever be set to a value different than 0.000
is if compression is used, but some responses are not compressed (which happens when the uncompressed size is so small that the server does not try to compress, or when the compression fails).
gauge
decimal
The percentage is a moving average. It is calculated based on the most recent client responses. If the response message data changes over time then the percentage will change with it. In case of a sudden change in response data, the indicated percentage may lag behind a bit. As a rule of thumb, assume that the percentage covers the most recent 100,000 to 1,000,000 client responses.
aerospike_namespace_scan_udf_bg_abort
Number of UDF background scans that were aborted. Removed in Database 6.0, use pi_query_udf_bg_abort.
counter
integer
aerospike_namespace_scan_udf_bg_complete
Number of UDF background scans that completed. Removed in Database 6.0, use pi_query_udf_bg_complete
.
counter
integer
aerospike_namespace_scan_udf_bg_error
Number of UDF background scans that failed.
Compare scan_udf_bg_error
to scan_udf_bg_complete
.
If ratio is higher than acceptable, alert operations to investigate. Removed in Database 6.0, use pi_query_udf_bg_error
.
counter
integer
aerospike_namespace_set-evicted-objects
Number of records evicted by a set.
counter
integer
aerospike_namespace_set_index_used_bytes
Amount of memory occupied by set indexes for this namespace on this node. See Finding total namespace memory for the total memory accounted for the namespace.
gauge
integer
aerospike_namespace_si_query_aggr_abort
Number of secondary index query aggregations aborted by the user seen by this node.
counter
integer
aerospike_namespace_si_query_aggr_complete
Number of secondary index query aggregations completed.
counter
integer
aerospike_namespace_si_query_aggr_error
Number of secondary index query aggregation errors due to an internal error.
counter
integer
aerospike_namespace_si_query_ops_bg_abort
Number of ops background secondary index queries that were aborted.
counter
integer
aerospike_namespace_si_query_ops_bg_complete
Number of ops background secondary index queries that completed.
counter
integer
aerospike_namespace_si_query_ops_bg_error
Number of ops background secondary index queries that returned error.
counter
integer
aerospike_namespace_si_query_udf_bg_abort
Number of UDF background secondary index queries that were aborted.
counter
integer
aerospike_namespace_si_query_udf_bg_complete
Number of UDF background secondary index queries that completed.
counter
integer
aerospike_namespace_si_query_udf_bg_error
Number of UDF background secondary index queries which returned error.
counter
integer
aerospike_namespace_sindex-type.mount[ix].age
Applies only to Enterprise Edition configured to sindex-type
flash
. This shows the percentage of lifetime (total usage) claimed by OEM for underlying device. Value is -1 unless underlying device is NVMe and may exceed 100. ‘ix’ is the device index. For example, storage-engine.file[0]=/opt/aerospike/test0.dat and storage-engine.file[1]=/opt/aerospike/test2.dat for 2 files specified in the configuration.
gauge
integer
aerospike_namespace_sindex_flash_used_bytes
Applies only to Enterprise Edition configured with sindex-type
flash
. Total bytes in-use on the mount(s) for the secondary indexes used by this namespace on this node. This is the same value memory_used_sindex_bytes
would have if the secondary indexes were not persisted.
gauge
integer
aerospike_namespace_sindex_flash_used_pct
Applies only to Enterprise Edition configured with sindex-type
flash
. Percentage of the mount(s) in-use for the secondary indexes used by this namespace on this node. Calculated as (sindex_pmem_used_bytes
/ sindex-type.mounts-size-limit
) * 100
gauge
integer
aerospike_namespace_sindex_gc_cleaned
Number of secondary index entries cleaned by sindex GC.
counter
integer
aerospike_namespace_sindex_mounts_used_pct
Applies only to Enterprise Edition configured with sindex-type
pmem
or flash
. Percentage of the mount(s) in-use for the secondary indexes used by this namespace on this node. Calculated as (sindex_used_bytes
/ sindex-type.mounts-budget
) * 100
gauge
integer
aerospike_namespace_sindex_pmem_used_bytes
Applies only to Enterprise Edition configured with sindex-type
pmem
. Total bytes in-use on the mount(s) for the secondary indexes used by this namespace on this node. This is the same value memory_used_sindex_bytes
would have if the secondary indexes were not persisted.
gauge
integer
aerospike_namespace_sindex_pmem_used_pct
Applies only to Enterprise Edition configured with sindex-type
pmem
. Percentage of the mount(s) in-use for the secondary indexes used by this namespace on this node. Calculated as (sindex_pmem_used_bytes
/ sindex-type.mounts-size-limit
) * 100
gauge
integer
aerospike_namespace_sindex_used_bytes
Total bytes in-use on the mount(s) for the secondary indexes used by this namespace on this node.
gauge
integer
aerospike_namespace_smd_evict_void_time
The cluster-wide specified eviction depth, expressed as a void time in seconds since 1 January 2010 UTC. This is distributed to all nodes via SMD. This may be larger than evict_void_time — evict_void_time will eventually advance to this value.
gauge
integer
aerospike_namespace_stop_writes
If true, this namespace is currently not allowing client-originated writes. Migration writes and prole writes are still allowed. Error code 22 is returned if any one of the following are breached: Prior to Database 7.0:
If stop-writes
is true, critical ALERT.
Until the cause is corrected, the system will reject all writes.
gauge
integer
aerospike_namespace_storage_engine_device_age
Shows percentage of lifetime (total usage) claimed by OEM for underlying storage-engine.device[ix] (may exceed 100). Value will be -1 unless underlying device is NVMe. It is a measure of how much of the drive’s projected lifetime according to the manufacturer has been used at any point in time. When the SSD is brand new, its value will report ‘0’ and when its projected lifetime has been reached, it shows ‘100’, reporting that 100% of the projected lifetime has been used. When the value gets over 100%, the SSD has reached the lifetime specified by the OEM.
gauge
integer
aerospike_namespace_storage_engine_device_defrag_partial_writes
The number of wblocks partial flushed to storage-engine.device[ix] by defrag.
counter
integer
aerospike_namespace_storage_engine_device_defrag_q
Number of wblocks queued to be defragged on [storage-engine.device[ix]](/database/reference/metrics ike&context=all&version=all&severity=all#namespace__storage-engine.device[ix]).
Measured per-device or per-file depending on the storage configuration.
If storage-engine.device[ix].defrag_q or storage-engine.file[ix].defrag_q continues to increase over time, alert operations to investigate.
gauge
integer
aerospike_namespace_storage_engine_device_defrag_reads
The number of wblocks that have been sent to the defrag_q from storage-engine.device[ix]. Blocks are selected for defragmentation when their usage falls below the configured defrag-lwm-pct.
counter
integer
aerospike_namespace_storage_engine_device_defrag_writes
The number of wblocks defrag has written to [storage-engine.device[ix]](/database/reference/metrics ike&context=all&version=all&severity=all#namespace__storage-engine.device[ix]).
counter
integer
aerospike_namespace_storage_engine_device_free_wblocks
The number of wblocks (write blocks) free on [storage-engine.device[ix]](/database/reference/metrics ike&context=all&version=all&severity=all#namespace__storage-engine.device[ix]).
gauge
integer
aerospike_namespace_storage_engine_device_partial_writes
The number of wblocks partial flushed to [storage-engine.device[ix]](/database/reference/metrics ike&context=all&version=all&severity=all#namespace__storage-engine.device[ix]) by writes.
counter
integer
aerospike_namespace_storage_engine_device_read_errors
Number of read errors encountered on storage-engine.device[ix].
counter
integer
aerospike_namespace_storage_engine_device_shadow_write_q
The number of wblocks queued to be written to the shadow device of storage-engine.device[ix].
gauge
integer
aerospike_namespace_storage_engine_device_used_bytes
The number of bytes used for data on [storage-engine.device[ix]](/database/reference/metrics ike&context=all&version=all&severity=all#namespace__storage-engine.device[ix]).
gauge
integer
aerospike_namespace_storage_engine_device_write_q
The number of wblocks queued to be written to [storage-engine.device[ix]](/database/reference/metrics ike&context=all&version=all&severity=all#namespace__storage-engine.device[ix]). Includes blocks written by the defragmentation sub-system.
gauge
integer
aerospike_namespace_storage_engine_device_writes
Number of wblocks written to [storage-engine.device[ix]](/database/reference/metrics ike&context=all&version=all&severity=all#namespace__storage-engine.device[ix]) since Aerospike started. Does not include defragmentation writes.
counter
integer
Label "device" and "device_index" in all aerospike_namespace_storage_engine_device_* metrics
The raw device that is configured in device configuration in namespace context and storage-engine subcontext. ‘ix’ is the device index. The index value starts from 0. For example, storage-engine.device[0]=/dev/xvd1 and storage-engine.device[1]=/dev/xvc1 for 2 devices specified in the configuration.
gauge
integer
aerospike_namespace_storage_engine_file_age
Shows the percentage of lifetime (total usage) claimed by OEM for the underlying device of storage-engine.file[ix]. Value will be -1 unless underlying device is NVMe and may exceed 100.
gauge
integer
aerospike_namespace_storage_engine_file_defrag_partial_writes
The number of wblocks partial flushed to [storage-engine.file[ix]](/database/reference/metrics e&context=all&version=all&severity=all#namespace__storage-engine.file[ix]) by defrag.
counter
integer
aerospike_namespace_storage_engine_file_defrag_q
The number of wblocks queued to be defragged on [storage-engine.file[ix]](/database/reference/metrics e&context=all&version=all&severity=all#namespace__storage-engine.file[ix]).
gauge
integer
aerospike_namespace_storage_engine_file_defrag_reads
Number of wblocks that have been sent to the defrag_q from storage-engine.file[ix].
Blocks are selected for defragmentation when their usage falls below the configured defrag-lwm-pct.
counter
integer
aerospike_namespace_storage_engine_file_defrag_writes
The number of wblocks defrag has written to [storage-engine.file[ix]](/database/reference/metrics e&context=all&version=all&severity=all#namespace__storage-engine.file[ix]).
counter
integer
aerospike_namespace_storage_engine_file_free_wblocks
The number of wblocks (write blocks) free on [storage-engine.file[ix]](/database/reference/metrics e&context=all&version=all&severity=all#namespace__storage-engine.file[ix]).
gauge
integer
aerospike_namespace_storage_engine_file_partial_writes
The number of wblocks partial flushed to [storage-engine.file[ix]](/database/reference/metrics e&context=all&version=all&severity=all#namespace__storage-engine.file[ix]) by writes.
counter
integer
aerospike_namespace_storage_engine_file_shadow_write_q
The number of wblocks queued to be written to the shadow file of storage-engine.file[ix].
gauge
integer
aerospike_namespace_storage_engine_file_used_bytes
Number of bytes used for data on [storage-engine.file[ix]](/database/reference/metrics e&context=all&version=all&severity=all#namespace__storage-engine.file[ix]).
gauge
integer
aerospike_namespace_storage_engine_file_write_q
Number of wblocks queued to be written to [storage-engine.file[ix]](/database/reference/metrics e&context=all&version=all&severity=all#namespace__storage-engine.file[ix]).
Measured per-device or per-file depending on the storage configuration.
If storage-engine.device[ix].write_q or storage-engine.file[ix].write_q is greater than 1, alert operations to investigate.
gauge
integer
aerospike_namespace_storage_engine_file_writes
The number of wblocks written to [storage-engine.file[ix]](/database/reference/metrics e&context=all&version=all&severity=all#namespace__storage-engine.file[ix]) since Aerospike started. When running with commit-to-device
set to true, this counter will only account for full blocks written and therefore will only count blocks written through the defragmentation process as client writes would write to disk individually rather than at a block level. Includes defragmentation writes.
counter
integer
Label "file" and "file_index" in all aerospike_namespace_storage_engine_file_* metrics
The data file path that is configured in file configuration in namespace context and storage-engine subcontext. ‘ix’ is the file index. The index value starts from 0. For example, storage-engine.file[0]=/opt/aerospike/test0.dat and storage-engine.file[1]=/opt/aerospike/test2.dat for 2 files specified in the configuration.
gauge
integer
aerospike_namespace_storage_engine_stripe_age
Shows the percentage of lifetime (total usage) claimed by OEM for the respective storage-backed persistence device of storage-engine.stripe[ix]. The value will be -1 unless the underlying device is NVMe and may exceed 100, check storage-engine.device[ix].age. This statistic is not available in the log ticker and is only applicable if a storage-backed persistence exists.
gauge
integer
More information about stripe allocation can be found on the “Configure Namespace Storage” page, under Setup for in-memory with storage-backed persistence and Setup for in-memory without storage-backed persistence.
aerospike_namespace_storage_engine_stripe_backing_write_q
The number of wblocks queued to be written to the respective storage-backed persistence of storage-engine.stripe[ix]. This statistic is available in the log ticker as write-q
, and is only applicable if a storage-backed persistence exists.
gauge
integer
More information about stripe allocation can be found on the “Configure Namespace Storage” page, under Setup for in-memory with storage-backed persistence and Setup for in-memory without storage-backed persistence.
Log ticker example with storage-backed persistence:
INFO (drv-mem): (drv_mem.c:3158) {bar} stripe-0.0xad001000: used-bytes 146499360 free-wblocks 492 write (18,0.2) defrag-q 0 defrag-read (1,0.0) defrag-write (0,0.0) write-q 0
Log ticker example without storage-backed persistence:
INFO (drv-mem): (drv_mem.c:3158) {test} stripe-2.0xad002002: used-bytes 887120 free-wblocks 62 write (0,0.0) defrag-q 0 defrag-read (0,0.0) defrag-write (0,0.0)INFO (drv-mem): (drv_mem.c:3158) {test} stripe-5.0xad002005: used-bytes 915280 free-wblocks 62 write (0,0.0) defrag-q 0 defrag-read (0,0.0) defrag-write (0,0.0)INFO (drv-mem): (drv_mem.c:3158) {test} stripe-1.0xad002001: used-bytes 900080 free-wblocks 62 write (0,0.0) defrag-q 0 defrag-read (0,0.0) defrag-write (0,0.0)INFO (drv-mem): (drv_mem.c:3158) {test} stripe-3.0xad002003: used-bytes 896720 free-wblocks 62 write (0,0.0) defrag-q 0 defrag-read (0,0.0) defrag-write (0,0.0)INFO (drv-mem): (drv_mem.c:3158) {test} stripe-0.0xad002000: used-bytes 909120 free-wblocks 62 write (0,0.0) defrag-q 0 defrag-read (0,0.0) defrag-write (0,0.0)INFO (drv-mem): (drv_mem.c:3158) {test} stripe-7.0xad002007: used-bytes 898960 free-wblocks 62 write (0,0.0) defrag-q 0 defrag-read (0,0.0) defrag-write (0,0.0)INFO (drv-mem): (drv_mem.c:3158) {test} stripe-6.0xad002006: used-bytes 897040 free-wblocks 62 write (0,0.0) defrag-q 0 defrag-read (0,0.0) defrag-write (0,0.0)INFO (drv-mem): (drv_mem.c:3158) {test} stripe-4.0xad002004: used-bytes 895680 free-wblocks 62 write (0,0.0) defrag-q 0 defrag-read (0,0.0) defrag-write (0,0.0)
aerospike_namespace_storage_engine_stripe_defrag_partial_writes
The number of wblocks partial flushed to [storage-engine.stripe[ix]](/database/reference/metrics ike&context=all&version=all&severity=all#namespace__storage-engine.stripe[ix]) by defrag.
counter
integer
aerospike_namespace_storage_engine_stripe_defrag_q
The number of wblocks queued to be defragged on storage-engine.stripe[ix].
gauge
integer
More information about stripe allocation can be found on the “Configure Namespace Storage” page, under Setup for in-memory with storage-backed persistence and Setup for in-memory without storage-backed persistence.
aerospike_namespace_storage-engine_stripe_defrag_reads
Number of wblocks that have been sent to the defrag_q from storage-engine.stripe[ix].
Blocks are selected for defragmentation when their usage falls below the configured defrag-lwm-pct.
counter
integer
More information about stripe allocation can be found on the “Configure Namespace Storage” page, under Setup for in-memory with storage-backed persistence and Setup for in-memory without storage-backed persistence.
aerospike_namespace_storage_engine_stripe_defrag_writes
The number of wblocks defrag has written to storage-engine.stripe[ix].
counter
integer
More information about stripe allocation can be found on the “Configure Namespace Storage” page, under Setup for in-memory with storage-backed persistence and Setup for in-memory without storage-backed persistence.
aerospike_namespace_storage-engine_stripe_free_wblocks
Number of wblocks (write blocks) free on storage-engine.stripe[ix].
gauge
integer
More information about stripe allocation can be found on the “Configure Namespace Storage” page, under Setup for in-memory with storage-backed persistence and Setup for in-memory without storage-backed persistence.
aerospike_namespace_storage_engine_stripe_partial_writes
The number of wblocks partial flushed to [storage-engine.stripe[ix]](/database/reference/metrics ike&context=all&version=all&severity=all#namespace__storage-engine.stripe[ix]) by writes.
counter
integer
aerospike_namespace_storage_engine_stripe_used_bytes
Number of bytes used for data on storage-engine.stripe[ix].
gauge
integer
More information about stripe allocation can be found on the “Configure Namespace Storage” page, under Setup for in-memory with storage-backed persistence and Setup for in-memory without storage-backed persistence.
aerospike_namespace_storage-engine.stripe[ix].writes
The number of wblocks written to storage-engine.stripe[ix] since Aerospike started.
When running with commit-to-device
set to true, this counter will only account for full blocks written and therefore will only count blocks written through the defragmentation process as the client writes would write to disk individually rather than at a block level. Includes defragmentation writes.
counter
integer
More information about stripe allocation can be found on the “Configure Namespace Storage” page, under Setup for in-memory with storage-backed persistence and Setup for in-memory without storage-backed persistence.
Label "stripe" and "stripe_index" in all aerospike_namespace_storage_engine_stripe_* metrics
Stripe is a shared memory segment. Each stripe will have its respective shared memory key, which is internally determined by the server. ‘ix’ is the stripe index. For example, if there are eight stripes, the index(ix) value will be from 0 to 7. So, storage-engine.stripe[0]=stripe-0.0xad002000 and storage-engine.stripe[1]=stripe-1.0xad002001 will show two shared memory segments (stripes) and their keys. This statistic applies to the namespaces configured with storage-engine memory
.
gauge
integer
More information about stripe allocation can be found on the “Configure Namespace Storage” page, under Setup for in-memory with storage-backed persistence and Setup for in-memory without storage-backed persistence.
aerospike_namespace_sub_objects
Number of LDT sub objects. Also aggregated at the service statistic level under the same name.
counter
integer
aerospike_namespace_tombstones
Total number tombstones in this namespace on this node.
gauge
integer
aerospike_namespace_truncate_lut
‘The most covering truncate_lut for this namespace. See truncate or truncate-namespace.’
gauge
integer
aerospike_namespace_truncated_records
The total number of records deleted by truncation for this namespace (includes set truncations). See truncate or truncate-namespace.
counter
integer
aerospike_namespace_truncating
Indicates when the namespace is in the process of being truncated.
gauge
boolean
aerospike_namespace_udf_sub_lang_delete_success
Number of successful UDF delete sub-transactions for scan/query background UDF jobs. See the udf_sub_udf_complete
, udf_sub_udf_error
, udf_sub_udf_filtered_out
, udf_sub_udf_timeout
statistics for the containing UDF operation statuses.
counter
integer
aerospike_namespace_udf_sub_lang_error
Number of UDF sub-transactions errors for scan/query background UDF jobs. See the udf_sub_udf_complete
, udf_sub_udf_error
, udf_sub_udf_filtered_out
, udf_sub_udf_timeout
statistics for the containing UDF operation statuses.
counter
integer
aerospike_namespace_udf_sub_lang_read_success
Number of successful UDF read sub-transactions for scan/query background UDF jobs. See the udf_sub_udf_complete
, udf_sub_udf_error
, udf_sub_udf_filtered_out
, udf_sub_udf_timeout
statistics for the containing UDF operation statuses.
counter
integer
aerospike_namespace_udf_sub_lang_write_success
Number of successful UDF write sub-transactions for scan/query background UDF jobs. See the udf_sub_udf_complete
, udf_sub_udf_error
, udf_sub_udf_filtered_out
, udf_sub_udf_timeout
statistics for the containing UDF operation statuses.
counter
integer
aerospike_namespace_udf_sub_tsvc_error
Number of UDF subtransactions that failed with an error in the transaction service, before attempting to handle the transaction for scan/query background UDF jobs. For example protocol errors or security permission mismatch. Does not include timeouts. In strong-consistency
enabled namespaces, this includes transactions against unavailable_partitions
and dead_partitions
.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc
errors happen before records are accessed for reads or writes. They’re counted separately from tsvc
timeouts.
counter
integer
aerospike_namespace_udf_sub_tsvc_timeout
Number of UDF subtransactions that timed out in the transaction service, before attempting to handle the transaction for scan/query background UDF jobs.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc
errors happen before records are accessed for reads or writes. They’re counted separately from tsvc
timeouts.
counter
integer
aerospike_namespace_udf_sub_udf_complete
Number of completed UDF subtransactions for scan/query background UDF jobs. See the following statistics for the underlying operation statuses: udf_sub_lang_delete_success
, udf_sub_lang_error
, udf_sub_lang_read_success
, udf_sub_lang_write_success
.
counter
integer
aerospike_namespace_udf_sub_udf_error
Number of failed UDF subtransactions for scan/query background UDF jobs. Does not include timeouts. See the following statistics for the underlying operation statuses:udf_sub_lang_delete_success
, udf_sub_lang_error
, udf_sub_lang_read_success
, udf_sub_lang_write_success
.
counter
integer
aerospike_namespace_udf_sub_udf_filtered_out
Number of UDF subtransactions that did not happen because the record was filtered out with Filter Expressions.
counter
integer
aerospike_namespace_udf_sub_udf_timeout
Number of UDF subtransactions that timed out for scan/query background UDF jobs. See the following statistics for the underlying operation statuses: udf_sub_lang_delete_success
, udf_sub_lang_error
, udf_sub_lang_read_success
, udf_sub_lang_write_success
.
counter
integer
aerospike_namespace_unavailable_partitions
Number of unavailable partitions for this namespace (when using strong-consistency
). This is the number of partitions that are unavailable when roster nodes are missing. Will turn into dead_partitions
if still unavailable when all roster nodes are present.
IF unavailable_partitions
is not zero, critical ALERT.
Check for network issues and make sure the cluster forms properly.
gauge
integer
aerospike_namespace_unreplicated_records
Number of unreplicated records in the namespace. Applicable only for namespaces operating under the strong-consistency
mode.
gauge
integer
aerospike_namespace_write-smoothing-period
Removed
gauge
integer
aerospike_namespace_xdr_bin_cemeteries
Number of tombstones with bin tombstones. They are generated when bin convergence is enabled and a record is durably deleted.
gauge
integer
aerospike_namespace_xdr_client_delete_error
Number of delete requests initiated by XDR that failed on the namespace on this node. For the total number of XDR initiated delete requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_delete_success
, xdr_client_delete_error
, xdr_client_delete_timeout
, xdr_client_delete_not_found
, xdr_from_proxy_delete_success
, xdr_from_proxy_delete_error
, xdr_from_proxy_delete_timeout
, xdr_from_proxy_delete_not_found
.
counter
integer
aerospike_namespace_xdr_client_delete_not_found
Number of delete requests initiated by XDR that failed on the namespace on this node due to the record not being found. For the total number of XDR initiated delete requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_delete_success
, [xdr_client_delete_error
](/database/reference/metrics#namespace__xdr_client_delete_error(, xdr_client_delete_timeout
, xdr_client_delete_not_found
, xdr_from_proxy_delete_success
, xdr_from_proxy_delete_error
, xdr_from_proxy_delete_timeout
, xdr_from_proxy_delete_not_found
.
counter
integer
aerospike_namespace_xdr_client_delete_success
Number of delete requests initiated by XDR that succeeded on the namespace on this node. For the total number of XDR initiated delete requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_delete_success
, xdr_client_delete_error
, xdr_client_delete_timeout
, xdr_client_delete_not_found
, xdr_from_proxy_delete_success
, xdr_from_proxy_delete_error
, xdr_from_proxy_delete_timeout
, xdr_from_proxy_delete_not_found
.
counter
integer
aerospike_namespace_xdr_client_delete_timeout
Number of delete requests initiated by XDR that timed out on the namespace on this node. For the total number of XDR initiated delete requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_delete_success
, xdr_client_delete_error
, xdr_client_delete_timeout
, xdr_client_delete_not_found
, xdr_from_proxy_delete_success
, xdr_from_proxy_delete_error
, xdr_from_proxy_delete_timeout
, xdr_from_proxy_delete_not_found
.
counter
integer
aerospike_namespace_xdr_client_write_error
Number of write requests initiated by XDR that failed on the namespace on this node. For the total number of XDR initiated write requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_write_success
, xdr_client_write_error
, xdr_client_write_timeout
, xdr_from_proxy_write_success
, xdr_from_proxy_write_error
, xdr_from_proxy_write_timeout
.
counter
integer
aerospike_namespace_xdr_client_write_success
Number of write requests initiated by XDR that succeeded on the namespace on this node. For the total number of XDR initiated write requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_write_success
, xdr_client_write_error
, xdr_client_write_timeout
, xdr_from_proxy_write_success
, xdr_from_proxy_write_error
, xdr_from_proxy_write_timeout
.
counter
integer
aerospike_namespace_xdr_client_write_timeout
Number of write requests initiated by XDR that timed out on the namespace on this node. For the total number of XDR initiated write requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_write_success
, xdr_client_write_error
, xdr_client_write_timeout
, xdr_from_proxy_write_success
, xdr_from_proxy_write_error
, xdr_from_proxy_write_timeout
.
counter
integer
aerospike_namespace_xdr_from_proxy_delete_error
Number of errors for XDR delete commands proxied from another node. For the total number of XDR initiated delete requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_delete_success
, xdr_client_delete_error
, xdr_client_delete_timeout
, xdr_client_delete_not_found
, xdr_from_proxy_delete_success
, xdr_from_proxy_delete_error
, xdr_from_proxy_delete_timeout
, xdr_from_proxy_delete_not_found
.
counter
integer
aerospike_namespace_xdr_from_proxy_delete_not_found
Number of XDR delete commands proxied from another node that resulted in not found. For the total number of XDR initiated delete requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_delete_success
, xdr_client_delete_error
, xdr_client_delete_timeout
, xdr_client_delete_not_found
, xdr_from_proxy_delete_success
, xdr_from_proxy_delete_error
, xdr_from_proxy_delete_timeout
, xdr_from_proxy_delete_not_found
.
counter
integer
aerospike_namespace_xdr_from_proxy_delete_success
Number of successful XDR delete commands proxied from another node. For the total number of XDR initiated delete requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_delete_success
, xdr_client_delete_error
, xdr_client_delete_timeout
, xdr_client_delete_not_found
, xdr_from_proxy_delete_success
, xdr_from_proxy_delete_error
, xdr_from_proxy_delete_timeout
, xdr_from_proxy_delete_not_found
.
counter
integer
aerospike_namespace_xdr_from_proxy_delete_timeout
Number of timeouts for XDR delete commands proxied from another node. For the total number of XDR initiated delete requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_delete_success
, xdr_client_delete_error
, xdr_client_delete_timeout
, xdr_client_delete_not_found
, xdr_from_proxy_delete_success
, xdr_from_proxy_delete_error
, xdr_from_proxy_delete_timeout
, xdr_from_proxy_delete_not_found
.
counter
integer
aerospike_namespace_xdr_from_proxy_write_error
Number of errors for XDR write commands proxied from another node. For the total number of XDR initiated write requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_write_success
, xdr_client_write_error
, xdr_client_write_timeout
, xdr_from_proxy_write_success
, xdr_from_proxy_write_error
, xdr_from_proxy_write_timeout
.
counter
integer
aerospike_namespace_xdr_from_proxy_write_success
Number of successful XDR write commands proxied from another node. For the total number of XDR initiated write requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_write_success
, xdr_client_write_error
, xdr_client_write_timeout
, xdr_from_proxy_write_success
, xdr_from_proxy_write_error
, xdr_from_proxy_write_timeout
.
counter
integer
aerospike_namespace_xdr_from_proxy_write_timeout
Number of timeouts for XDR write commands proxied from another node. For the total number of XDR initiated write requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_write_success
, xdr_client_write_error
, xdr_client_write_timeout
, xdr_from_proxy_write_success
, xdr_from_proxy_write_error
, xdr_from_proxy_write_timeout
.
counter
integer
aerospike_namespace_xdr_tombstones
Number of tombstones on this node which are created by XDR for non-durable client deletes. This includes both master and prole.
gauge
integer
For namespaces configured with XDR, non-durable delete transactions create XDR tombstones (not to be confused with the durable delete tombstones).
XDR tombstones are deleted after they have been shipped via XDR. The XDR tomb raider runs as specified in xdr-tomb-raider-period
and uses xdr-tomb-raider-threads
to reduce the index and delete XDR tombstones where the last update time (LUT) is older than the current global last ship time (GLST). The GLST is computed as the lowest value across the last ship time (LST) of all the partitions for the namespace. This is done by having each node send the LST for each partition they own to the principal node which then determines the lowest value and sends it back to all nodes in the cluster via the system metadata (SMD) fabric channel.
Node_stats
aerospike_node_stats_batch_index_complete
Number of batch index requests completed.
counter
integer
aerospike_node_stats_batch_index_created_buffers
Number of 128KB response buffers created. Response buffers are created when there are no buffers left in the pool. If this number consistently increases and there is available memory, you should increase batch-max-unused-buffers
.
counter
integer
aerospike_node_stats_batch_index_delay
Number of times a batch index response buffer has been delayed (WOULDBLOCK on the send). The number of times a batch index transaction is completely abandoned because it went over its overall allocated time after being delayed is counted under the batch_index_error
statistic and will have a WARNING log message associated.
counter
integer
aerospike_node_stats_batch_index_destroyed_buffers
Number of 128KB response buffers destroyed. Response buffers are destroyed when there is no slot left to put the buffer back into the pool. The maximum response buffer pool size is batch-max-unused-buffers
.
counter
integer
aerospike_node_stats_batch_index_error
Number of batch index requests that completed with an error when, for example, the client has timed out but the server is still attempting to send response buffers back. Another occurrence is if the server abandons the transaction due to encountering delays (WOULDBLOCK on send) of more than twice the total timeout set by the client, or 30 seconds if not set when sending response buffers back. This is accompanied by a WARNING log message. Starting with version 6.4, this statistic is incremented when a transaction experiences delays exceeding the client timeout by a factor of 1. Each encountered delay is counted under the batch_index_delay
statistic.
Compare batch_index_error
to batch_index_complete
. If ratio is higher than acceptable, alert Operations to investigate.
counter
integer
aerospike_node_stats_batch_index_huge_buffers
Number temporary response buffers created that exceeded 128KB. Huge buffers are created when one of the records is retrieved that is greater than 128KB. Huge records do not benefit from batching and can result in excessive memory thrashing on the server. The batch_index_created_buffers
and batch_index_destroyed_buffers
do include the huge buffers created and destroyed.
counter
integer
aerospike_node_stats_batch_index_initiate
Number of batch index requests received.
counter
integer
aerospike_node_stats_batch_index_proto_compression_ratio
Measures the average compressed size to uncompressed size ratio for protocol message data in batch index responses. Thus 1.000
indicates no compression and 0.100
indicates a 1:10
compression ratio (90% reduction in size).
moving average
decimal
The compression ratio is a moving average. It is calculated based on the most recent client responses. If the response message data changes over time then the compression ratio will change with it. In case of a sudden change in response data, the indicated compression ratio may lag behind a bit. As a rule of thumb, assume that the compression ratio covers the most recent 100,000 to 1,000,000 client responses.
aerospike_node_stats_batch_index_proto_uncompressed_pct
Measures the percentage of batch index responses with uncompressed protocol message data. Thus 0.000
indicates all responses with compressed data, and 100.000
indicates no responses with compressed data. For example, if protocol message data compression is not used, this metric will remain set to 0.000
. If protocol message data compression is then turned on and all responses are compressed, this metric will remain set to 0.000
. The only way this metric will ever be set to a value different than 0.000
is if compression is used, but some responses are not compressed (which happens when the uncompressed size is so small that the server does not try to compress, or when the compression fails).
gauge
decimal
The percentage is a moving average. It is calculated based on the most recent client responses. If the response message data changes over time then the percentage will change with it. In case of a sudden change in response data, the indicated percentage may lag behind a bit. As a rule of thumb, assume that the percentage covers the most recent 100,000 to 1,000,000 client responses.
aerospike_node_stats_batch_index_queue
Number of batch index requests (transactions count) processed and response buffer blocks used on each batch queue.
Format: Q1_REQUESTS:Q1_BUFFERS, Q2_REQUESTS:Q2_BUFFERS, ...
The buffer block counter is actually decremented on batch responses before the transaction count is decremented. Therefore, it is possible for a buffer slot becomes available on the queue and a new batch transaction count is incremented before the previous batch command count is decremented. It is also possible that multiple transactions came in for a thread for which none of the response buffers has been created yet. Finally, batch_index_huge_buffers
are counted as part of the buffer blocks used on each batch queue.
gauge
integer
aerospike_node_stats_batch_index_timeout
Number of batch index requests that timed-out on the server before being processed. Those would be caused by a batch subtransaction that has timed out for this batch index transaction. The overall time allowed for a batch-index transaction on the server is not bound, except if a delay is encountered (WOULDBLOCK on send).
For Database 4.1 through 6.3, the overall batch index transaction max delay time is twice the total timeout set by the client, or 30 seconds if there is no timeout set by the client.
For Database 6.4 and later, the overall batch index transaction max delay time is the same as set by the client, or 30 seconds if there is no timeout set by the client.
counter
integer
aerospike_node_stats_batch_index_unused_buffers
Number of available 128 KB response buffers currently in buffer pool.
gauge
integer
aerospike_node_stats_client_connections
Number of active client connections to this node. Also available in the log on the fds proto ticker line.
-
If
client_connections
is below an expected low value, then this condition might indicate a problem with the network between clients and server. -
If
client_connections
is greater than an expected high value, then this condition might indicate a problem with clients rapidly opening and closing sockets. -
If
client_connections
is at or nearproto_fd_max
, then the server is either currently unable to accept new connections or might soon be unable to do so.
gauge
integer
aerospike_node_stats_client_connections_closed
Number of client connections that have been closed. One of client_connections_opened
or client_connections_closed
should be closely monitored or alerted against. Also available in the log on the fds proto ticker line.
counter
integer
aerospike_node_stats_client_connections_opened
Number of client connections created to this node since the node was started. One of client_connections_opened
or client_connections_closed
should be closely monitored or alerted against. Also available in the log on the fds proto ticker line.
If client_connections_opened
changes unexpectedly without clients having been added or removed, or a significant change in workload having occurred, this condition might indicate a slow down on a node or a connectivity issue on the node.
counter
integer
aerospike_node_stats_cluster_clock_skew_ms
Current maximum clock skew in milliseconds between nodes in a cluster. Will trigger clock_skew_stop_writes
when breaching the cluster_clock_skew_stop_writes_sec
threshold. This threshold is normally 20 seconds for strong-consistency
namespaces on any Aerospike version, or 40 seconds for AP namespaces where NSUP is enabled (nsup-period
is not zero) in Database 4.5.1 or later.
gauge
integer
aerospike_node_stats_cluster_clock_skew_stop_writes_sec
The threshold at which any namespace that is set to strong-consistency
stops accepting writes due to clock skew (cluster_clock_skew_ms
).
This value is in seconds, not milliseconds.
Although this value shows as 0 for AP namespaces, starting with Database 4.5.1, these namespaces stop accepting writes if NSUP is enabled (nsup-period
is not zero) and the clock skew exceeds 40 seconds.
gauge
integer
aerospike_node_stats_cluster_generation
A 64 bit unsigned integer incremented on a node for every successful cluster partition re-balance or transition to orphan state. This is a node local value and does not need to be the same across the cluster.
counter
integer
aerospike_node_stats_cluster_integrity
When false
, indicates integrity issues within the cluster, meaning that some nodes are either faulty or dead. A node in the succession list is deemed faulty if the node is alive and it reports to be an orphan or is part of some other cluster. Another condition for a faulty node would be for it to be alive but having a clustering protocol identifier that does not match the rest of the cluster. When true
, indicates that the cluster is in a whole and complete state (as far as the nodes that it sees and is able to connect to all concerned). Information about a cluster integrity fault is also logged to the server log file repeatedly.
gauge
integer
aerospike_node_stats_cluster_is_member
When false
, indicates that the node is not joined to a cluster; that is, it is an orphan. When true
, indicates that the node is joined to a cluster.
gauge
integer
aerospike_node_stats_cluster_key
Randomly generated 64 bit hexadecimal string used to name the last Paxos cluster state agreement.
gauge
integer
aerospike_node_stats_cluster_max_compatibility_id
Each node has a compatibility ID that is an integer based on the node’s database version. During upgrades, this value is used to determine software compatibility. cluster_max_compatibility_id
indicates the cluster’s maximum software version. See cluster_min_compatibility_id
.
gauge
integer
aerospike_node_stats_cluster_min_compatibility_id
Each node has a compatibility ID that is an integer based on the node’s database version. During upgrades, this value is used to determine software compatibility. cluster_min_compatibility_id
indicates the cluster’s minimum software version. See cluster_max_compatibility_id
.
gauge
aerospike_node_stats_cluster_principal
This specifies the Node ID of the current cluster principal. Will be ‘0’ on an orphan node.
gauge
integer
aerospike_node_stats_cluster_size
Size of the cluster. Can be checked to make sure the size of the cluster is the expected one after adding or removing a node. Check across all nodes in a cluster.
If cluster_size
does not equal the expected cluster size and the cluster is not undergoing maintenance, your operations group needs to investigate.
gauge
integer
aerospike_node_stats_demarshal_error
Number of errors during the demarshal step.
counter
integer
aerospike_node_stats_early_tsvc_batch_sub_error
Number of errors early in the transaction for batch subtransactions. For example, bad/unknown namespace name or security authentication errors.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc
errors happen before records are accessed for reads or writes. They’re counted separately from tsvc
timeouts.
counter
integer
aerospike_node_stats_early_tsvc_client_error
Number of errors early in the transaction for direct client requests. Those include transactions hitting the proto-fd-max
, transactions with a bad/unknown namespace name or security authentication errors. Those also include cases where partitions are unavailable in AP mode, when clients attempt transactions against an orphan node.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc
errors happen before records are accessed for reads or writes. They’re counted separately from tsvc
timeouts.
counter
integer
aerospike_node_stats_early_tsvc_from_proxy_batch_sub_error
Number of errors early in the commands for batch subtransactions proxied from another node. For example, bad or unknown namespace name or security authentication errors.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc
errors happen before records are accessed for reads or writes. They’re counted separately from tsvc
timeouts.
counter
integer
aerospike_node_stats_early_tsvc_from_proxy_error
Number of errors early in the commands for commands, other than batch subtransactions, proxied from another node, for example, bad or unknown namespace name or security authentication errors.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc
errors happen before records are accessed for reads or writes. They’re counted separately from tsvc
timeouts.
counter
integer
aerospike_node_stats_early_tsvc_ops_sub_error
Number of errors early in an internal ops subtransaction (records accessed by a background query operate command). For example, bad or unknown namespace name or security authentication errors.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc
errors happen before records are accessed for reads or writes. They’re counted separately from tsvc
timeouts.
counter
integer
aerospike_node_stats_early_tsvc_udf_sub_error
Number of errors early in the transaction for UDF subtransactions. For example, bad or unknown namespace name or security authentication errors.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc
errors happen before records are accessed for reads or writes. They’re counted separately from tsvc
timeouts.
counter
integer
aerospike_node_stats_entries_per_bval
Ratio of entries to unique bvals (bin values) for a given secondary index on the node. The value is an integer (rounded to the nearest integer) and is calculated using hyperloglog estimates for unique bvals. The stat is generated by a background process. A value of 0 means the stat is not yet generated. The process runs at startup, every hour thereafter, and when a secondary index is created and populated.
This stat appears in the response to the ‘sindex-stat’ info command to retrieve statistics for a specified namespace and index. For example, asinfo -v 'sindex-stat:ns=namespace1;indexname=index21'
.
gauge
integer
aerospike_node_stats_entries_per_rec
Ratio of entries to unique records for a given secondary index on the node. This value will always be 1 if it is not a list or map secondary index. The value is an integer (rounded to the nearest integer) and is calculated using hyperloglog estimates for unique recs. The stat is generated by a background process. A value of 0 means the stat is not yet generated. The process runs at startup, every hour thereafter, and when a secondary index is created and populated.
This stat appears in the response to the ‘sindex-stat’ info command to retrieve statistics for a specified namespace and index. For example, asinfo -v 'sindex-stat:ns=namespace1;indexname=index21'
.
gauge
integer
aerospike_node_stats_err_storage_defrag_fd_get
Removed
counter
integer
aerospike_node_stats_err_sync_copy_null_node
Number of errors during cluster state exchange because of missing general node information.
counter
integer
aerospike_node_stats_fabric_bulk_recv_rate
Rate of traffic (bytes/sec) received by the fabric bulk channel during the last ticker-interval (every 10 seconds by default).
gauge
integer
aerospike_node_stats_fabric_bulk_send_rate
Rate of traffic (bytes/sec) sent by the fabric bulk channel during the last ticker-interval (every 10 seconds by default).
gauge
integer
aerospike_node_stats_fabric_connections
Number of active fabric connections to this node. Also available in the log on the fds proto ticker line.
gauge
integer
aerospike_node_stats_fabric_connections_closed
Number of fabric connections that have been closed. Also available in the log on the fds proto ticker line.
counter
integer
aerospike_node_stats_fabric_connections_opened
Number of fabric connections created to this node since the node was started. Also available in the log on the fds proto ticker line.
If fabric_connections_opened
is unexpectedly changing, alert as this condition would indicate a connectivity problem with a node or a cluster change.
counter
integer
aerospike_node_stats_fabric_ctrl_recv_rate
Rate of traffic (bytes/sec) received by the fabric ctrl channel during the last ticker-interval (every 10 seconds by default).
gauge
integer
aerospike_node_stats_fabric_ctrl_send_rate
Rate of traffic (bytes/sec) sent by the fabric ctrl channel during the last ticker-interval (every 10 seconds by default).
gauge
integer
aerospike_node_stats_fabric_meta_recv_rate
Rate of traffic (bytes/sec) received by the fabric meta channel during the last ticker-interval (every 10 seconds by default).
gauge
integer
aerospike_node_stats_fabric_meta_send_rate
Rate of traffic (bytes/sec) sent by the fabric meta channel during the last ticker-interval (every 10 seconds by default).
gauge
integer
aerospike_node_stats_fabric_rw_recv_rate
Rate of traffic (bytes/sec) received by the fabric meta channel during the last ticker-interval (every 10 seconds by default).
gauge
integer
aerospike_node_stats_fabric_rw_send_rate
Rate of traffic (bytes/sec) sent by the fabric rw channel during the last ticker-interval (every 10 seconds by default).
gauge
integer
aerospike_node_stats_failed_best_practices
Indicates true
if any of the best-practices, which are checked when the server starts, were violated, otherwise failed_best_practices
will indicate false
. Each failed best-practice will log a unique warning message and a list of failed best-practices can be queried using the best-practices
info command.
gauge
boolean
aerospike_node_stats_heap_active_kbytes
The amount of memory in in-use pages, in KiB. An in-use page is a page that has some allocated memory (either partial or full).
gauge
integer
aerospike_node_stats_heap_allocated_kbytes
The amount of memory, in KiB, allocated by the asd daemon. The heap_allocated_kbytes
/ heap_active_kbytes
ratio (6.0 or later) and heap_allocated_kbytes
/ heap_mapped_kbytes
ratio (prior to 6.0) (also provided under heap_efficiency_pct
) provide a picture of the fragmentation of the heap. This is for all memory usage except for the shared memory parts (for the primary index in the Enterprise Edition).
gauge
integer
aerospike_node_stats_heap_efficiency_pct
Provides an indication of the jemalloc heap fragmentation. This represents the heap_allocated_kbytes
/ heap_active_kbytes
ratio. A lower number indicates a higher fragmentation rate.
If heap_efficiency_pct
goes below 60% or 50% (depending on configuration, advise your operations group to investigate.
gauge
integer
aerospike_node_stats_heap_mapped_kbytes
Amount of memory in mapped pages in KiB, such as the amount of memory that JEM received from the Linux kernel. Should be a multiple of 4, which is the typical page size (4096 bytes).
gauge
integer
aerospike_node_stats_heap_site_count
Number of distinct sites in the server code (specific locations in server functions) that have allocated heap memory designated for tracking as governed by the debug-allocations
setting from the time when the server was started. The heap_site_count
is only nonzero when debug-allocations
is set to a value other than none. The heap_site_count
value can only increase.
counter
integer
aerospike_node_stats_heartbeat_connections
Number of active heartbeat connections to this node. Also available in the log on the fds proto ticker line.
gauge
integer
aerospike_node_stats_heartbeat_connections_closed
Number of heartbeat connections that have been closed. Also available in the log on the fds proto ticker line.
counter
integer
aerospike_node_stats_heartbeat_connections_opened
Number of heartbeat connections created to this node since the node was started. Also available in the log on the fds proto ticker line.
If heartbeat_connections_opened
is unexpectedly changing, alert as this condition would indicate a connectivity problem with a node or a cluster change.
counter
integer
aerospike_node_stats_heartbeat_received_foreign
Total number of heartbeats received from remote nodes.
counter
integer
aerospike_node_stats_heartbeat_received_self
Total number of multicast heartbeats from this node received by this node. Will be 0 for mesh.
counter
integer
aerospike_node_stats_info_complete
Number of info requests completed.
counter
integer
aerospike_node_stats_info_queue
Number of info requests pending in info queue.
gauge
integer
aerospike_node_stats_info_timeout
Tracks total timed-out info transactions. Related to info-max-ms.
counter
integer
aerospike_node_stats_long_queries_active
Number of queries currently active (formerly queries_active
or scans_active
). The long_queries_active
stat is shared by both primary index (PI) queries and secondary index (SI) queries. Only long queries are monitored.
gauge
integer
aerospike_node_stats_migrate_allowed
This indicates whether migrations are allowed or not on a node. true
when allowed, false
when not. When there is a change in a cluster, this statistic’s value will change to false until the rebalance is completed across all namespaces. The rebalance is the step that figures out all partition migrations that need to be scheduled. The rebalance is not the migrations itself but the process that precedes the partitions migrations. migrate_allowed
true
indicates that all migrations related statistics have been set and can be leveraged programmatically, for example, migrate_partitions_remaining
to check if migrations are ongoing or not).
gauge
integer
aerospike_node_stats_migrate_partitions_remaining
This is the number of partitions remaining to migrate (in either direction). When migrate_allowed
is true
, this is the stat which will accurately determine if migrations are complete for a single node across all namespaces. There could be a short period after a reclustering event when this statistic shows 0
but the migrations have not started yet. During such time, migrate_allowed
would return false
.
gauge
integer
aerospike_node_stats_objects
Total number of replicated objects on this node. Includes master and replica objects.
Trending objects
provides operations insight into object fluctuations over time.
gauge
integer
aerospike_node_stats_paxos_principal
Identifier for the node in which this node believes to be the Paxos Principal.
gauge
integer
aerospike_node_stats_process_cpu_pct
Percentage of CPU usage by the asd process.
gauge
integer
aerospike_node_stats_proxy_in_progress
Number of proxies in progress. Also called proxy hash. The command’s TTL (client set timeout or transaction-max-ms
is checked every 5ms (Database 6.0 and later) when waiting in the proxy-hash.
gauge
integer
aerospike_node_stats_queries_active
Number of queries currently active (formerly scans_active
). The bqueries_active
stat is shared by both primary index (PI) queries and secondary index (SI) queries. Only long queries are monitored. Removed in Database 6.1, use long_queries_active
.
gauge
integer
aerospike_node_stats_query_bad_records
Number of false positive entries in secondary index queries.
counter
integer
aerospike_node_stats_query_long_running
Number of long running queries ever attempted in the system (query selected record more than query_threshold).
counter
integer
aerospike_node_stats_query_short_running
Number of short running queries ever attempted in the system (query selected record less than query_threshold).
counter
integer
aerospike_node_stats_query_tracked
Number of queries tracked by the system. (Number of queries which ran more than query untracked_time (default 1 sec)).
counter
integer
aerospike_node_stats_read_touch_error
Number of read touch errors which were not timeouts.
counter
integer
aerospike_node_stats_read_touch_skip
Number of touches abandoned upon finding that another write (including an earlier touch) has taken place or is taking place, removing the need to proceed with the touch.
counter
integer
aerospike_node_stats_read_touch_success
Number of successful read touches.
counter
integer
aerospike_node_stats_read_touch_timeout
Number of touches that ended in timeout.
counter
integer
aerospike_node_stats_read_touch_tsvc_error
Number of read touch subtransactions that failed with an error in the internal transaction queue. Does not include timeouts.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc
errors happen before records are accessed for reads or writes. They’re counted separately from tsvc
timeouts.
counter
integer
aerospike_node_stats_read_touch_tsvc_timeout
Number of read touches that time out early in the internal transaction queue, while waiting to be picked up by a service thread.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc
errors happen before records are accessed for reads or writes. They’re counted separately from tsvc
timeouts.
counter
integer
aerospike_node_stats_reaped_fds
Number of idle client connections closed.
If reaped_fds
are growing more rapidly than normal , it may indicate client[s] are opening and closing sockets too rapidly — potential application issue.
counter
integer
aerospike_node_stats_rw_err_dup_write_cluster_key
Removed
counter
integer
aerospike_node_stats_rw_err_dup_write_internal
Removed
counter
integer
aerospike_node_stats_rw_in_progress
Number of rw transactions in progress. Also called rw hash. This tracks transaction parked on the rw hash while processing on other nodes (all write replicas, read duplicate resolutions). The transaction’s TTL (client set timeout or transaction-max-ms
is checked every 5ms in Database 6.0 and later when waiting in the rw-hash.
Depends on expected workload.
If rw_in_progress
is higher than expected, or if this deviates more than acceptable from the established baseline over time,alert operations to investigate the cause. May indicate a slowdown on a particular node or overloading on the fabric.
gauge
integer
While a transaction is parked in the rw-hash, other transactions for the same record will be queued (those queued transactions wouldn’t be counted in this metric). Once a transaction completes, queued transactions for the same records get re-started (as tracked in the xxxx-restart benchmark histograms (such as write-restart). At that point, the first transaction to be processed will take the rw-hash slot and the other ones will wait for the next round. Transactions that need to be serialized (such as writes for the same record or a read transaction in strong consistency mode while a write transaction is in progress or any transaction requiring duplicate resolution) would not be proceed until they get their slot in the rw-hash.
aerospike_node_stats_scans_active
Number of scans currently active. Removed in Database 6.0, use queries_active
.
gauge
integer
aerospike_node_stats_sindex_gc_garbage_cleaned
Sum of secondary index garbage entries cleaned by sindex GC. Moved to namespace level as sindex_gc_cleaned
in Database 5.7.
counter
integer
aerospike_node_stats_sindex_gc_garbage_found
Sum of secondary index garbage entries found by sindex GC.
counter
integer
aerospike_node_stats_sindex_gc_list_creation_time
Sum of time spent in finding secondary index garbage entries by sindex GC (millisecond).
counter
integer
aerospike_node_stats_sindex_gc_list_deletion_time
Sum of time spent in cleaning sindex garbage entries by sindex GC (millisecond).
counter
integer
aerospike_node_stats_sindex_gc_objects_validated
Number of secondary index entries processed by sindex GC.
counter
integer
aerospike_node_stats_sindex_gc_retries
Number of retries when sindex GC cannot get sprigs lock. Replaced sindex_gc_locktimedout.
counter
integer
aerospike_node_stats_sindex_ucgarbage_found
Number of un-cleanable garbage entries in the sindexes encountered through queries.
counter
integer
aerospike_node_stats_stat_cluster_key_err_ack_rw_trans_reenqueue
Number of Read/Write trans re-enqueued because of cluster key mismatch.
counter
integer
aerospike_node_stats_stat_cluster_key_partition_transaction_queue_count
Removed/unused
counter
integer
aerospike_node_stats_stat_cluster_key_prole_retry
Number of times a prole write was retried as a result of a cluster key mismatch.
counter
integer
aerospike_node_stats_stat_cluster_key_regular_processed
Number of successful transactions that passed the cluster key test.
counter
integer
aerospike_node_stats_stat_cluster_key_trans_to_proxy_retry
Number of times a proxy was redirected.
counter
integer
aerospike_node_stats_stat_cluster_key_transaction_reenqueue
Removed/unused
counter
integer
aerospike_node_stats_stat_evicted_set_objects
Number of objects evicted from a Set due to set limits defined in Aerospike configuration.
counter
integer
aerospike_node_stats_stat_single_bin_records
Removed: Number of single bin records.
counter
integer
aerospike_node_stats_stat_slow_trans_queue_batch_pop
Number of times we moved a batch of trans from slow queue to fast queue.
counter
integer
aerospike_node_stats_stat_slow_trans_queue_pop
Number of trans that were moved from slow queue to fast queue.
counter
integer
aerospike_node_stats_stat_slow_trans_queue_push
Number of trans that we pushed onto the slow queue.
counter
integer
aerospike_node_stats_storage_defrag_wait
Number of times the defrag waited (called sleep).
counter
integer
aerospike_node_stats_sub_objects
Number of LDT sub objects. Aggregated over the sub_objects
stat at the namespace level.
counter
integer
aerospike_node_stats_system_free_mem_kbytes
Amount of free system memory in kilobytes. Includes buffers and caches, but not shared memory.
If system_free_mem_kbytes
is abnormally low, could indicate the server is approaching the limits of the available RAM. Operations should investigate and potentially add nodes or increase per node RAM.
gauge
integer
aerospike_node_stats_system_free_mem_pct
Percentage of free system memory.
If system_free_mem_pct
is abnormally low, could indicate the server is approaching the limits of the available RAM. Operations should investigate and potentially add nodes or increase per node RAM.
gauge
integer
aerospike_node_stats_system_kernel_cpu_pct
Percentage of CPU usage by processes running in kernel mode.
gauge
integer
aerospike_node_stats_system_thp_mem_kbytes
Amount of memory in use by the Transparent Huge Page mechanism, in kilobytes.
gauge
integer
aerospike_node_stats_system_total_cpu_pct
Percentage of CPU usage by all running processes. Equal to system_user_cpu_pct
+ system_kernel_cpu_pct
.
gauge
integer
aerospike_node_stats_system_user_cpu_pct
Percentage of CPU usage by processes running in user mode.
gauge
integer
aerospike_node_stats_threads_detached
Number of detached server threads currently running.
gauge
integer
aerospike_node_stats_threads_joinable
Number of joinable server threads currently running.
gauge
integer
aerospike_node_stats_threads_pool_active
Number of currently active threads in the server thread pool.
gauge
integer
aerospike_node_stats_threads_pool_total
Total number of threads in the server thread pool.
gauge
integer
aerospike_node_stats_time_since_rebalance
Number of seconds since the last reclustering event, either triggered by the recluster
info command or by a cluster disruption (such as a node being add/removed or a network disruption).
gauge
integer
aerospike_node_stats_tree_gc_queue
This is the number of trees queued up, ready to be completely removed (partitions drop). Corresponds to the tree-gc-q
entry in the log ticker.
gauge
integer
aerospike_node_stats_tscan_aborted
Number of scans that were aborted. Removed as of 3.6.0.
counter
integer
aerospike_node_stats_tscan_initiate
Number of new scan requests initiated. Removed as of 3.6.0.
counter
integer
aerospike_node_stats_tscan_pending
Number of scan requests pending. Removed as of 3.6.0.
gauge
integer
aerospike_node_stats_tscan_succeeded
Number of scan requests that have successfully finished. Removed as of 3.6.0.
counter
integer
aerospike_node_stats_uptime
Time in seconds since last server restart.
If uptime
is below 300 and the cluster is not undergoing maintenance this node restarted within the last 5 minutes. Advise operations to investigate.
gauge
integer
Sets
aerospike_sets_device_data_bytes
Device storage used by this set in bytes, for the data part (does not include index part). Value will be 0 if data is not stored on device. For size used in memory, See memory_data_bytes
.
gauge
integer
aerospike_sets_memory_data_bytes
Memory used by this set in bytes, for the data part (does not include index part). Value will be 0 if data is not stored in memory. For size used on disk, See device_data_bytes
(available in Database 5.2 and later), or the set level object size histogram.
gauge
integer
aerospike_sets_ns
Namespace name this set belongs to.
gauge
integer
aerospike_sets_objects
Total number of objects (master and all replicas) in this set on this node. This is updated in real time and is not dependent on the nsup-period
or nsup-hist-period
configurations.
gauge
integer
aerospike_sets_set
Name of this set.
gauge
integer
aerospike_sets_tombstones
Total number of tombstones (master and all replicas) in this set on this node.
gauge
integer
aerospike_sets_truncate_lut
‘The most covering truncate_lut for this set. See truncate or truncate-namespace.’
gauge
integer
Sindex
aerospike_sindex_delete_error
Number of errors while processing a delete transaction for this secondary index.
counter
integer
aerospike_sindex_delete_success
Number of successful delete transactions processed for this secondary index.
counter
integer
aerospike_sindex_entries
Number of secondary index entries for this secondary index. This is the number of records that have been indexed by this secondary index.
gauge
integer
aerospike_sindex_ibtr_memory_used
Amount of memory, in bytes, the secondary index is consuming for the keys, as opposed to nbtr_memory_used
which is the amount of memory the secondary index is consuming for the entries. The total being reported by si_accounted_memory
.
gauge
integer
aerospike_sindex_keys
Number of secondary keys for this secondary index.
gauge
integer
aerospike_sindex_load_pct
Progress in percentage of the creation of secondary index.
gauge
integer
aerospike_sindex_load_time
Time it took for the secondary index to be fully created.
gauge
integer
aerospike_sindex_loadtime
Time it took for the secondary index to be fully created.
gauge
integer
aerospike_sindex_memory_used
Amount of memory, in bytes, consumed by the secondary index. Renamed to used_bytes
in Database 6.3. Do not use memory_used
in Database 6.3 and later.
gauge
integer
aerospike_sindex_nbtr_memory_used
Amount of memory, in bytes, the secondary index is consuming for the entries, as opposed to ibtr_memory_used
which is the amount of memory the secondary index is consuming for the keys. The total being reported by si_accounted_memory
.
gauge
integer
aerospike_sindex_query_agg
Number of query aggregations attempted for this secondary index on this node.
counter
integer
aerospike_sindex_query_agg_avg_rec_count
Average number of records returned by the aggregations underlying queries against this secondary index.
gauge
integer
aerospike_sindex_query_agg_avg_record_size
Average size of the records returned by the aggregations underlying queries against this secondary index.
gauge
integer
aerospike_sindex_query_avg_rec_count
Average number of records returned by the all queries against this secondary index (combines query_agg_avg_rec_count
and query_lookup_avg_rec_count
).
gauge
integer
aerospike_sindex_query_avg_record_size
Average size of the records returned by all the queries against this secondary index (combines query_agg_avg_record_size
and query_lookup_avg_record_size
)
gauge
integer
aerospike_sindex_query_basic_abort
Number of basic queries aborted for this secondary index. Removed in Database 6.0, use si_query_long_basic_abort
.
counter
integer
aerospike_sindex_query_basic_avg_rec_count
Average number of records returned by the lookup queries against this secondary index.
gauge
integer
aerospike_sindex_query_basic_complete
Number of basic queries completed for this secondary index. Removed in Database 6.0, use si_query_long_basic_complete
.
counter
integer
aerospike_sindex_query_basic_error
Number of basic queries that returned error for this secondary index. Removed in Database 6.0, use si_query_long_basic_error
.
counter
integer
aerospike_sindex_query_lookup_avg_rec_count
Average number of records returned by the lookup queries against this secondary index. Renamed to query_basic_avg_rec_count
in Database 5.7.
gauge
integer
aerospike_sindex_query_lookup_avg_record_size
Average size of the records returned by the lookup queries against this secondary index.
gauge
integer
aerospike_sindex_query_lookups
Number of lookup queries ever attempted for this secondary index on this node. Removed in Database 5.7. Use query_basic_complete
+ query_basic_error
+ query_basic_abort
instead.
counter
integer
aerospike_sindex_query_reqs
Number of query requests ever attempted for this secondary index on this node (combines query_lookups
and query_agg
).
counter
integer
aerospike_sindex_si_accounted_memory
Amount of memory, in bytes, the secondary index is consuming. Removed in Database 5.7 the sum of ibtr_memory_used
and nbtr_memory_used
.
gauge
integer
aerospike_sindex_si_query_long_basic_abort
Number of basic long secondary index queries aborted for this secondary index.
counter
integer
aerospike_sindex_si_query_long_basic_complete
Number of basic long secondary index queries completed for this secondary index.
counter
integer
aerospike_sindex_si_query_long_basic_error
Number of basic long secondary index queries that returned error for this secondary index.
counter
integer
aerospike_sindex_si_query_short_basic_complete
Number of basic short secondary index queries completed for this secondary index.
counter
integer
aerospike_sindex_si_query_short_basic_error
Number of basic short secondary index queries that returned error for this secondary index.
counter
integer
aerospike_sindex_si_query_short_basic_timeout
Short queries are not monitored, so they cannot be aborted. They might time out, which is reflected in this statistic.
counter
integer
aerospike_sindex_stat_gc_recs
Number of records that have been garbage collected out of the secondary index memory. See sindex-gc-period
and sindex-gc-max-rate
configuration parameters for tuning the secondary index garbage collection. ”
counter
integer
aerospike_sindex_stat_gc_time
Amount of time spent processing garbage collection for the secondary index. See sindex-gc-period
and sindex-gc-max-rate
configuration parameters for tuning the secondary index garbage collection.
counter
integer
aerospike_sindex_used_bytes
Amount of memory, in bytes, consumed by the secondary index.
NOTE: Renamed from memory_used
in Database 6.3.
gauge
integer
aerospike_sindex_write_error
Number of errors while processing a write transaction for this secondary index.
counter
integer
Users
aerospike_users_conns_in_use
Number of client connections for a given user.
To see metrics from asadm
use the command:
show users statistics
If you are using the Aerospike Prometheus Exporter these metrics are shown in the Users View.
gauge
integer
When security is enabled, per node user metrics are available from the security protocol.
aerospike_users_limitless_read_scan_query
Limitless read query requests per second for a given user.
To see metrics from asadm
use the command:
show users statistics
If you are using the Aerospike Prometheus Exporter these metrics are shown in the Users View.
moving average
When security is enabled and enable-quotas
is true, per node user metrics available from the security protocol. For more information, see Enable access control.
aerospike_users_limitless_write_scan_query
Limitless write query requests per second for a given user.
To see metrics from asadm
use the command:
show users statistics
If you are using the Aerospike Prometheus Exporter these metrics are shown in the Users View.
moving average
integer
When security is enabled and enable-quotas
is true, per node user metrics are available from the security protocol. For more information, see Enable access control.
aerospike_users_read_scan_query_rps
Read query requests per second for a given user.
To see metrics from asadm
use the command:
show users statistics
If you are using the Aerospike Prometheus Exporter these metrics are shown in the Users View.
gauge
integer
When security is enabled and enable-quotas
is true, per node user metrics are available from the security protocol. See Enable access control for more information about these metrics.
aerospike_users_read_single_record_tps
Read transactions per second for a given user.
To see metrics from asadm
use the command:
show users statistics
If you are using the Aerospike Prometheus Exporter these metrics are shown in the Users View.
moving average
integer
When security is enabled and enable-quotas
is true, per node user metrics are available from the security protocol. For more information, see Enable access control.
aerospike_users_write_scan_query_rps
Write query requests per second for a given user.
To see metrics from asadm
use the command:
show users statistics
If you are using the Aerospike Prometheus Exporter these metrics are shown in the Users View.
moving average
integer
When security is enabled and enable-quotas
is true, per node user metrics are available from the security protocol. For more information, see Enable access control.
aerospike_users_write_single_record_tps
Write transactions per second for a given user.
To see metrics from asadm
use the command:
show users statistics
If you are using the Aerospike Prometheus Exporter these metrics are shown in the Users View.
moving average
integer
When security is enabled and enable-quotas
is true, per node user metrics are available from the security protocol. For more information, see Enable access control.
Xdr
aerospike_xdr_abandoned
Number of records abandoned because of permanent failure at the destination. The destination configuration must be changed for these records to be successfully shipped.
If abandoned
is consistently higher than expected alert operations to investigate.
counter
integer
aerospike_xdr_active_failed_node_sessions
Number of active failed node sessions pending. A failed node session keeps track of node at the local cluster that have left the cluster and need other nodes to ship on their behalf until they join back.
gauge
integer
aerospike_xdr_active_link_down_sessions
Number of active link down sessions pending. A link down session keeps track of destination clusters that are not reachable for a given time window.
gauge
integer
aerospike_xdr_bytes_shipped
Number of bytes shipped for a namespace to a DC by XDR.
Use the asinfo
command get-stats
to report these metrics.
counter
decimal
aerospike_xdr_compression_ratio
Running average compression ratio. Example: asinfo -h localhost -l -v get-stats:context=xdr;dc=aerospike_b;namespace=test
moving average
decimal
aerospike_xdr_dc_as_open_conn
Number of open connection to the Aerospike DC. If the DC accepts pipeline writes, there will be 64 connections per destination node. Replaced dc_open_conn
starting with Database 4.4.
gauge
integer
aerospike_xdr_dc_as_size
The cluster size of the destination Aerospike DC. Replaced by dc_size
starting with Database 4.4.
gauge
integer
aerospike_xdr_dc_http_good_locations
Number of URLs that are considered healthy and being used by the change notification system. Part of the change notification.
gauge
integer
aerospike_xdr_dc_http_locations
Number of URLs configured for the HTTP destination. Part of the change notification.
gauge
integer
aerospike_xdr_dc_ship_attempt
Number of records that have been attempted to be shipped, but could have resulted in either success or error. See dc_ship_success
for successfully shipped records.
counter
integer
aerospike_xdr_dc_ship_bytes
Number of bytes shipped for this DC.
counter
integer
aerospike_xdr_dc_ship_delete_success
Number of delete transactions that have been successfully shipped. This is the per DC statistic for xdr_ship_delete_success
.
counter
integer
aerospike_xdr_dc_ship_destination_error
Number of errors from the remote cluster(s) while shipping records for this DC. Errors include out-of-space, key-busy, etc. This is the per DC statistic for xdr_ship_destination_error
.
counter
integer
aerospike_xdr_dc_ship_idle_avg
Average number of ms of sleep for each record being shipped. 0.000 if there is no throttling. Throttling will occur if the set throughput limit (xdr-max-ship-throughput
) has been reached or in case of unexpected slowdown at the destination cluster. This is part of the rsas
entry in the logs (xdr context).
gauge
integer
aerospike_xdr_dc_ship_idle_avg_pct
Representation in percent of total time spent for dc_ship_idle_avg
. This is part of the rsas
entry in the logs (xdr context).
gauge
integer
aerospike_xdr_dc_ship_inflight_objects
Number of records that are inflight (which have been shipped but for which a response from the remote DC has not yet been received).
gauge
integer
aerospike_xdr_dc_ship_latency_avg
Moving average of shipping latency for the specific DC.
moving average
integer
aerospike_xdr_dc_ship_source_error
Number of client layer errors while shipping records for this DC. Errors include timeout, bad network fd, etc. This is the per DC statistic for xdr_ship_source_error
.
counter
integer
aerospike_xdr_dc_ship_success
Number of records that have been successfully shipped. This is the per DC statistic for xdr_ship_success
.
counter
integer
aerospike_xdr_dc_state
State of the DC. Here are the different statuses: CLUSTER_INACTIVE
, CLUSTER_UP
, CLUSTER_DOWN
, CLUSTER_WINDOW_SHIP
.
- The CLUSTER_INACTIVE
state is for a DC that has not been seeded (configured) in the XDR stanza and would be a place holder for a future dynamic seeding.
- The CLUSTER_UP
state is the normal state for a DC that is able to receive records from an XDR client and is currently not having any records being shipped to it from a previous window where it was down (which would be the CLUSTER_WINDOW_SHIP
state).
- A cluster will be in CLUSTER_DOWN
when the source (XDR client) cannot connect to it for over 30 seconds. This would prevent the entries in the digestlog to be reclaimed. The XDR client will periodically try to reconnect and upon succeeding, will spawn a window shipper to ‘catch up’ then entries in the digestlog that were missed. The DC specific lag (dc_timelag) will increase in such state but will not be accounted for in the overall XDR timelag (xdr_timelag).
- A cluster states switches to CLUSTER_WINDOW_SHIP
when it can be re-connected to after being in CLUSTER_DOWN
state. The DC specific lag (dc_timelag) will be accounted for in the overall XDR timelag (xdr_timelag).
gauge
string
aerospike_xdr_dc_timelag
Time lag for this specific DC. See xdr_timelag for details of how this is calculated.
If dc_timelag
consistently greater than a few seconds it may indicate network connectivity issues or errors writing at a destination cluster.
gauge
integer
aerospike_xdr_dlog_free_pct
Percentage of the digest log free and available for use.
gauge
integer
aerospike_xdr_dlog_logged
Number of records logged into digest log.
Trending stat_recs_logged
allows operations insight into how many records are being enqueued for shipment over time.
counter
integer
aerospike_xdr_dlog_overwritten_error
Number of digest log entries that got overwritten.
counter
integer
aerospike_xdr_dlog_processed_link_down
Number of linkdown that were processed.
counter
integer
aerospike_xdr_dlog_processed_main
Number of records processed on the local Aerospike server.
counter
integer
aerospike_xdr_dlog_processed_replica
Number of records processed for a node in the cluster that is not the local node.
counter
integer
aerospike_xdr_dlog_relogged
Number of records relogged by this node into the digest log due to temporary issues when attempting to ship. A relogged digest log entry would be caused by one of three potential conditions: - An issue with the local client when attempting to ship (tracked by xdr_ship_source_error
). - An issue with the network or the destination cluster itself (tracked by xdr_ship_destination_error
). - An issue when reading the record on the local node(tracked by xdr_read_error
), but those would actually end up relogged on the node now owning the record (see relogged_outgoing
).
counter
integer
The XDR component typically processes only master record’s digest log entries on a given node (the exception being during failed node processing, when a node on the source cluster has failed). When relogging such master record’s dlog entry, the corresponding prole copy would also be relogged on the respective node holding the replicas. This would increment the relogged_outgoing
statistic on the current node and the relogged_incoming
on the receiving node. It is therefore expected to see the dlog_relogged
and relogged_outgoing
statistics matching for clusters that are stable (no migrations).
The relogs happening due to master partition ownership changes (migrations) are also tracked through relogged_incoming
and relogged_outgoing
.
Permanent errors will not be relogged but will have a WARNING log message at the destination cluster (for example, to name a few, invalid namespace, record too big if mismatched write-block-size between source and destination, authentication or permission error).
Some Permanent Errors: AEROSPIKE_ERR_RECORD_TOO_BIG, AEROSPIKE_ERR_REQUEST_INVALID, AEROSPIKE_ERR_ALWAYS_FORBIDDEN.
Some Transient Errors: AEROSPIKE_ERR_SERVER, AEROSPIKE_ERR_CLUSTER_CHANGE, AEROSPIKE_ERR_SERVER_FULL, AEROSPIKE_ERR_CLUSTER, AEROSPIKE_ERR_RECORD_BUSY, AEROSPIKE_ERR_DEVICE_OVERLOAD, AEROSPIKE_ERR_FAIL_FORBIDDEN.
See the C client errors for the exhaustive list.
aerospike_xdr_dlog_used_objects
Total number of records slots used in the digest log.
gauge
integer
aerospike_xdr_filtered_out
Number of local records that are skipped after having been read but before actual shipment. Such records might be skipped because of the configured shipping rules. For example, if the rules exclude all bins of a record, the record is skipped.
This counter does not include records not submitted to the XDR queue, such as a record that is not eligible for shipping because its set is disabled.
counter
integer
aerospike_xdr_global_lastshiptime
Minimum last ship time in millisecond (epoch) for XDR for across the cluster. Specifies to what point can slots in the digest log can be reclaimed, by tracking the oldest last ship time across all nodes in the cluster.
gauge
integer
aerospike_xdr_hot_keys
Number of times a record write is skipped from processing because that record is already pending processing. This value also includes the number of records skipped for replica partitions.
counter
integer
aerospike_xdr_hotkey_fetch
If there are hot keys in the system (same record updated quite frequently), XDR optimizes by not shipping all the updates. This stat represents the number of record’s digest that are actually shipped because their cache entries expired and were dirty. Interpret in conjunction with xdr_hotkey_skip
. The timeout of the cache entries is controlled by xdr-hotkey-time-ms.
counter
integer
aerospike_xdr_hotkey_skip
Replaces noship_recs_dup_intrabatch
and noship_recs_genmismatch
. If there are hot keys in the system (same record updated quite frequently), XDR optimizes by not shipping all the updates. This stat represents the number of record’s digests that are skipped due to an already existing entry in the reader’s thread cache (meaning a version of this record was just shipped). Interpret in conjunction with xdr_hotkey_fetch
. The timeout of the cache entries is controlled by xdr-hotkey-time-ms.
counter
integer
aerospike_xdr_in_progress
Number of records that are pending completion. Records can be in different stages like local read, network send, pending acknowledgment. If a record is being retried (see retry_conn_reset
, retry_dest
, and retry_no_node
, it is not considered complete and repeats the cycle.
gauge
integer
aerospike_xdr_in_queue
Number of records in the in-memory transaction queue still to be processed. These are the records which have been written into the xdr transaction-queue but have not been picked up yet to processed further by XDR.
gauge
integer
aerospike_xdr_lag
Lag in seconds between the destination and the source datacenters. This gives an indication of how much behind the source lags in term of shipping records, or, in other terms, how long have records been waiting at the source before being shipped to that DC.
Here are a bit more details:
The lag is the difference between the last update time of the records being shipped (called ‘last ship time’ or LST) and the current time. The LST is internally maintained per partition and aggregated at the namespace level (minimum across all partitions). The lag can seem unsettled (step function) while recoveries are in progress (See the recoveries_pending
statistic). This is because the recovery for a partition can take a while and the LST is updated only on completion of a recovery pass (as opposed to per record). A recovery pass is considered complete only after the batch of records for a given partition is completely and successfully shipped (no elements left in the retry queue).
If lag
is consistently greater than a few seconds, this condition might indicate network connectivity issues or errors writing at a destination cluster.<br /
gauge
integer
aerospike_xdr_lap_us
Time in microseconds (μsecs) taken to process records across partitions in one lap (processing cycle). This is diagnostic information. A higher number indicates slowness of source in processing the records.
Available only at the dc
level, not namespace
level. Example: asinfo -h localhost -l -v get-stats:context=xdr;dc=aerospike_b
If lap_us
is consistently higher than expected alert operations to investigate.
gauge
integer
aerospike_xdr_latency_ms
Average network latency for the successfully shipped latency. This value does not include timed-out shipment attempts or any other errors. Updated every log ticker interval (10 seconds by default).
Available only at the dc
level, not namespace
level. Example: asinfo -h localhost -l -v get-stats:context=xdr;dc=aerospike_b
Depending on configuration, latency_ms
should be within the latency of the link between the DCs.
If latency_ms
increases beyond the expectations based on the distance (or known link latency) between clusters, alert operations to investigate.
gauge
moving average
aerospike_xdr_local_recs_migration_retry
Number of records missing in a batch call, generally a result of migrations, but can also be caused by expiration and eviction.
counter
integer
aerospike_xdr_nodes
Number of nodes in the destination DC as seen by XDR. There may be some delay for the remote changes to be reflected in this stat, especially on node departure, as XDR gives some grace period before removing a node.
gauge
integer
aerospike_xdr_not_found
Number of local records not found by XDR when attempting to read them. Such records might have been expired, evicted, or deleted.
counter
integer
aerospike_xdr_queue_overflow_error
Number of XDR queue overflow errors. Typically happens when there are no physical space available on the storage holding the digest log, or if the writes are happening at such a rate that elements are not written fast enough to the digest log. The number of entries this queue can hold is 1 million.
counter
integer
aerospike_xdr_read_active_avg_pct
This statistics reflects how busy the XDR read threads are by calculating, the average time in percent of total time that the XDR read threads spend actually processing something vs. waiting for a new digest log entry to arrive on their queues from the dlogreader / failed node shippers / window shippers.
moving average
integer
aerospike_xdr_read_error
Number of read requests initiated by XDR that failed. Those are rare, but if present, would typically be caused by reservation failures (node lost master and/or prole ownership of the partition the record belonged to during migrations). This will cause the record’s digest log entry to be relogged to the node now owning the partition (tracked under relogged_outgoing
). Other rare cases would be for example when running out of memory or failure to access the storage layer. For the total number of XDR initiated read requests, sum up the xdr_read_success
, xdr_read_notfound
and xdr_read_error
statistics.
counter
integer
aerospike_xdr_read_idle_avg_pct
This is a sister statistic to xdr_read_active_avg_pct
and represents the average time in percent of total time that the XDR read threads waits for a new digest log entry to arrive on their queues from the dlogreader / failed node shippers / window shippers.
moving average
integer
aerospike_xdr_read_latency_avg
Moving average latency in milliseconds for XDR to read a record.
moving average
integer
aerospike_xdr_read_notfound
Number of read requests initiated by XDR that were not found. These do not get relogged. This would typically happen if a record is updated and then deleted, but a lag caused the entry to for the record update to be processed after the record has been deleted. For the total number of XDR initiated read requests, sum the xdr_read_success
, xdr_read_notfound
and xdr_read_error
statistics.
counter
integer
aerospike_xdr_read_reqq_used
How many digest log entries are currently in the XDR read threads queues. Each XDR read thread has an in-memory queue with a capacity of 1,000 log entries associated with it. See also related statistic xdr_read_reqq_used_pct
. When the dlogreader / failed node shipper / window shipper cannot write to a queue, because the queue is full, it blocks, until there’s space in the queue again.
gauge
integer
aerospike_xdr_read_reqq_used_pct
Sister statistic to xdr_read_reqq_used
to represent how full in percent the XDR read request queues are.
gauge
integer
aerospike_xdr_read_respq_used
How many entries are being used in the XDR read response queues. Those queues are used to hand back records after they have been locally fetched. Those queues are similar to the queues referred to in the xdr_read_reqq_used
stat except for the fact that they are not bounded. The throttling would happen at the XDR read request queues.
gauge
integer
aerospike_xdr_read_success
Number of read requests initiated by XDR that succeeded. For the total number of XDR initiated read requests, sum up the xdr_read_success
, xdr_read_notfound
and xdr_read_error
statistics.
counter
integer
aerospike_xdr_read_txnq_used
Number of XDR read commands that are in flight in the local transaction queue. XDR limits to 10,000 the number of outstanding XDR read requests. The requests are placed in an internal transaction queue. See xdr_read_txnq_used_pct
for the percent used in this queue.
gauge
integer
aerospike_xdr_read_txnq_used_pct
Percent used of the XDR read commands that are in flight (out of a maximum allowed of 10,000) in the transaction queue. It is an internal transaction queue. See xdr_read_txnq_used
for the number of XDR issued reads that are in flight.
gauge
integer
aerospike_xdr_recoveries
Number of partitions that are recovered by reducing the primary index of that partition. Recovery is done when the in-memory transaction queue of the partition is either full or if necessary records are not present in the in-memory transaction queue.
See also recoveries_pending.
If recoveries
is consistently increasing alert operations to investigate.
counter
integer
aerospike_xdr_recoveries_pending
Number of recoveries currently pending.
If recoveries_pending
is zero, there are no recoveries in progress. Non-zero indicates the number of recoveries in progress.
If recoveries_pending
is unexpectedly increasing alert operations to investigate.
gauge
integer
aerospike_xdr_relogged_incoming
Number of records relogged into this node’s digest log by another node. This typically happens during the following situations:
-
migrations at the source cluster, when there are outstanding digest log entries and the partition ownership changes by the time they are processed, if the local node does not own master or prole copy of the partition such record belongs to, the node now owning the master copy of the partition would get an incoming digest log entry relogged to it.
-
when a node relogs record’s digest log entries to itself (
dlog_relogged
), it will also relog those for the node owning the prole counterpart.
counter
integer
The sending node will then have its relogged_outgoing
statistic incremented.
aerospike_xdr_relogged_outgoing
Number of records relogged to another node’s digest log. This typically happens during the following situations:
- migrations at the source cluster, when there are outstanding digest log entries for which the local node does not own either master or prole partition for the record anymore (xdr_read_error
)
- when a node relogs record’s digest log entries to itself (dlog_relogged
), it will also relog those for the node owning the prole counterpart.
counter
integer
The receiving node will then have its relogged_incoming
statistic incremented.
aerospike_xdr_retry_conn_reset
Number of records whose shipment is retried due to a reset of the connection to the remote datacenter. A connection can be reset due to timeouts (10s), network problems, or destination node restarts.
This statistic can increase in bursts. Because of the XDR pipeline, there can be many records that are retried when a connection is reset.
If retry_conn_reset
is consistently higher than expected alert operations to investigate.
counter
integer
aerospike_xdr_retry_dest
Number of records retried due to a temporary error returned by destination node. The destination node has responded with a specific error code; therefore, such errors are not related to the network. Such errors include key busy and device overload.
If retry_dest
is consistently higher than expected alert operations to investigate.
counter
integer
aerospike_xdr_retry_no_node
Number of records retried because XDR cannot determine which destination node is the master.
This typically happens when XDR does not discover the full cluster of the destination, perhaps due to firewall settings. In such a case, the master for all partitions cannot be known. The other possibility is that the entire namespace is not present on the destination cluster.
If retry_no_node
is consistently higher than expected alert operations to investigate.
counter
integer
aerospike_xdr_ship_bytes
Estimated number of bytes XDR has shipped to remote clusters.
counter
integer
aerospike_xdr_ship_compression_avg_pct
Used to determine how beneficial compression is (higher is better).
moving average
integer
aerospike_xdr_ship_delete_success
Number of delete operations that were successfully shipped.
aerospike_xdr_ship_destination_error
Number of errors from the remote cluster(s) while shipping records. Errors include timeout, out-of-space, key-busy, etc. Those would be typically relogged, except in case of permanent error (tracked under xdr_ship_destination_permanent_error
— for example records too big or some bad namespace configuration), in which case they trigger a WARNING log message at the destination. For the total number of records XDR attempted to ship, sum up xdr_ship_success
, xdr_ship_source_error
and xdr_ship_destination_error
. Those do not count errors while attempting to read the record locally, but only errors after a record to be shipped has been passed to XDR’s underlying C client. For errors reading records locally, See xdr_read_error
.
counter
integer
aerospike_xdr_ship_destination_permanent_error
Number of permanent errors from the remote cluster(s) while shipping records. Example errors include records too big or some bad namespace configuration, in which case they trigger a WARNING log message at the destination and will not be relogged. These do not count errors while attempting to read the record locally, but only errors after a record to be shipped has been passed to XDR’s underlying C client. For errors reading records locally, See xdr_read_error
. For all errors while shipping to a destination, see xdr_ship_destination_error
.
counter
integer
aerospike_xdr_ship_fullrecord
Number of records that did not take advantage of bin level shipping (see xdr-ship-bins).
gauge
integer
aerospike_xdr_ship_inflight_objects
Number of objects that are inflight (which have been shipped but for which a response from the remote DC has not yet been received).
gauge
integer
aerospike_xdr_ship_latency_avg
Moving average latency in milliseconds to ship a record to remote Aerospike clusters. This is computed by dividing time into 1 second intervals.
Depending on configuration, xdr_ship_latency_avg
should be within the latency of the link between the DCs.
If xdr_ship_latency_avg
increases beyond the expectations based on the distance (or known link latency) between clusters, alert operations to investigate.
gauge
integer
The average is calculated over each 1 second interval separately and then thrown into the exponential moving average. The exponential moving average is actually a moving average of independent 1-second averages. This is done to avoid having some time intervals where there is a much higher volume of transactions having a heavier weight compared to time intervals with much fewer transactions.
aerospike_xdr_ship_outstanding_objects
Number of outstanding records not yet processed. This only applies to the main thread and will not account for digest log entries pending window shipper or failed node processing. It represents the difference between the write pointer position and the read pointer position. It also does not account for entries pending in the queue prior to being flushed to the digest log, which can go up to 100 entries or 500ms if not full by that time (configurable through xdr-digestlog-iowait-ms
).
Trending xdr_ship_outstanding_objects
allows operations insight into how the XDR record transmit queue size changes over time.
gauge
integer
aerospike_xdr_ship_source_error
Number of client layer errors while shipping records. Errors include connection errors, bad network fd, etc. For the total number of records XDR attempted to ship, sum up xdr_ship_success
, xdr_ship_source_error
and xdr_ship_destination_error
. Those do not count errors while attempting to read the record locally, but only errors after a record to be shipped has been passed to XDR’s underlying C client. For errors reading records locally, See xdr_read_error
.
counter
integer
aerospike_xdr_ship_success
Number of records successfully shipped to remote Aerospike clusters (across all datacenters configured, meaning one record successfully shipped to 3 different datacenters will increment this counter by 3). Includes xdr_ship_delete_success
. For the total number of records XDR attempted to ship, sum up xdr_ship_success
, xdr_ship_source_error
and xdr_ship_destination_error
. Those do not count errors while attempting to read the record locally, but only errors after a record to be shipped has been passed to XDR’s underlying C client. For errors reading records locally, See xdr_read_error
.
counter
integer
aerospike_xdr_stat_pipe_reads_diginfo
Number of digest information read from the named pipe.
counter
integer
aerospike_xdr_success
Number of records successfully shipped to remote datacenters.
If success
is consistently lower than expected alert operations to investigate.
counter
integer
aerospike_xdr_throughput
Number of records successfully shipped per second. Updated every log ticker interval (10 secs by default).
gauge
integer
aerospike_xdr_timelag
Time in seconds it took the latest shipped record from the moment it was first written at the source until it was attempted to be shipped to the destination cluster. This is equivalent to the time its digestlog entry waited in the digestlog before being processed. Each record written at the source is timestamped as it gets written into the XDR digestlog.
[Removed in 5.0] If xdr_timelag
is consistently greater than a few seconds, this condition might indicate network connectivity issues or errors writing at a destination cluster.
The knowledge base article on FAQ - What are the causes of XDR throttling might be helpful.
gauge
integer
When having multiple destination DCs, this represents the maximum time lag across all the remote DCs that are not in the CLUSTER_INACTIVE or CLUSTER_DOWN states (see dc_state
). Under normal operations, though, the timelag for each DC that are in the CLUSTER_UP state will be the same, given that XDR ships records in lock-step. The timelag at each DC would be different when a DC is in the CLUSTER_DOWN or in the CLUSTER_WINDOW_SHIP state. This does not represent the time it will take for XDR to ‘catch up’, nor does it necessarily relate to the number of outstanding digests in the digest log still to be processed. For per DC time lag, see dc_timelag
.
aerospike_xdr_uncompressed_pct
Running average percentage of records not compressed because they are below the compression threshold (100) or failed to be compressed at all. See also related parameter enable-compression.
moving average
decimal
aerospike_xdr_uninitialized_destination_error
Number of records in the digest log not shipped because the destination cluster has not been initialized for a DC that is configured for a namespace. This should not happen. Those errors are not counted as xdr_ship_*_error
.
counter
integer
aerospike_xdr_unknown_namespace_error
Number of records in the digest log not shipped because they belong to an unknown namespace, on the source cluster. One situation where this would happen is if a namespace is removed (or the order of namespaces is changed in the configuration) while there are some entries in the digest log not processed yet. This should not happen in most cases. Those errors are not counted as xdr_ship_*_error
.
counter
integer