Database/Reference/Metrics

Metrics Reference

See the Metrics command examples for information on usage.

Full text Name only

version

context

severity

tag

Namespace

`appeals_records_exonerated`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_appeals_records_exonerated
Datadog: aerospike.server.namespace.appeals_records_exonerated

Description

Number of records that were marked replicated as result of an appeal. Partition appeals will happen for namespaces operating under the strong-consistency mode when a node needs to validate the records it has when joining the cluster.

Introduced

4.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`appeals_rx_active`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_appeals_rx_active
Datadog: aerospike.server.namespace.appeals_rx_active

Description

Number of partition appeals currently being received. Partition appeals will happen for namespaces operating under the strong-consistency mode when a node needs to validate the records it has when joining the cluster.

Introduced

4.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`appeals_tx_active`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_appeals_tx_active
Datadog: aerospike.server.namespace.appeals_tx_active

Description

Number of partition appeals currently being sent. Partition appeals will happen for namespaces operating under the strong-consistency mode when a node needs to validate the records it has when joining the cluster.

Introduced

4.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`appeals_tx_remaining`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_appeals_tx_remaining
Datadog: aerospike.server.namespace.appeals_tx_remaining

Description

Number of partition appeals not yet sent. Partition appeals will happen for namespaces operating under the strong-consistency mode when a node needs to validate the records it has when joining the cluster. Appeals occur after a node has been cold-started. The replication state of each record is lost on cold-start and all records must assume an unreplicated state. An appeal resolves replication state from the partition’s acting master. These are important for performance; an unreplicated record will need to re-replicate to be read which adds latency. During a rolling cold-restart, an operator may want to wait for the appeal phase to complete after each restart to minimize the performance impact of the procedure.

Introduced

4.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

`auto_revived_partitions`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_auto_revived_partitions
Datadog: aerospike.server.namespace.auto_revived_partitions

Description

Number of partitions that the auto-revive feature revived at startup.

Introduced

7.1.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancens

`available_bin_names`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_available_bin_names

Description

Remaining number of unique bins that the user can create for this namespace.

The formula for the associated metrics is as follows:

bin_names_quota - bin_names = available_bin_names

Introduced

3.9.0

Removed

7.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`batch_sub_delete_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_batch_sub_delete_error
Datadog: aerospike.server.namespace.batch_sub_delete_error

Description

Number of batch-index delete sub-batches that failed with an error. For example, invalid set name, unavailable (if SC), failure to apply a predexp filter, key mismatch if key was sent), device error (i/o error), key busy (duplicate resolution or if SC), problem during bitwise, HLL or CDT.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`batch_sub_delete_filtered_out`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_batch_sub_delete_filtered_out
Datadog: aerospike.server.namespace.batch_sub_delete_filtered_out

Description

Number of batch-index delete sub-batches that did not happen because the record was filtered out with Filter Expressions.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`batch_sub_delete_not_found`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_batch_sub_delete_not_found
Datadog: aerospike.server.namespace.batch_sub_delete_not_found

Description

Number of batch-index delete sub-batches that resulted in not found.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`batch_sub_delete_success`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_batch_sub_delete_success
Datadog: aerospike.server.namespace.batch_sub_delete_success

Description

Number of records successfully deleted by batch-index sub-batches.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`batch_sub_delete_timeout`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_batch_sub_delete_timeout
Datadog: aerospike.server.namespace.batch_sub_delete_timeout

Description

Number of batch-index delete sub-batches that timed out.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`batch_sub_lang_delete_success`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_batch_sub_lang_delete_success
Datadog: aerospike.server.namespace.batch_sub_lang_delete_success

Description

Number of successful batch-index UDF delete sub-batches.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`batch_sub_lang_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_batch_sub_lang_error
Datadog: aerospike.server.namespace.batch_sub_lang_error

Description

Number of language (Lua) batch-index errors for UDF sub-transactions.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`batch_sub_lang_read_success`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_batch_sub_lang_read_success
Datadog: aerospike.server.namespace.batch_sub_lang_read_success

Description

Number of successful batch-index UDF read sub-batches.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`batch_sub_lang_write_success`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_batch_sub_lang_write_success
Datadog: aerospike.server.namespace.batch_sub_lang_write_success

Description

Number of successful batch-index UDF write sub-batches.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`batch_sub_proxy_complete`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_batch_sub_proxy_complete
Datadog: aerospike.server.namespace.batch_sub_proxy_complete

Description

Number of proxied batch-index sub-batches that completed.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`batch_sub_proxy_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_batch_sub_proxy_error
Datadog: aerospike.server.namespace.batch_sub_proxy_error

Description

Number of proxied batch-index sub transactions that failed with an error.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`batch_sub_proxy_timeout`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_batch_sub_proxy_timeout
Datadog: aerospike.server.namespace.batch_sub_proxy_timeout

Description

Number of proxied batch-index sub-batches that timed out.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`batch_sub_read_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_batch_sub_read_error
Datadog: aerospike.server.namespace.batch_sub_read_error

Description

Number of batch-index read subtransaction that failed with an error. For example: invalid set name, unavailable (if SC), failure to apply a predexp filter, key mismatch if key was sent), device error (i/o error), key busy (duplicate resolution or if SC), problem during bitwise, HLL or CDT.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`batch_sub_read_filtered_out`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_batch_sub_read_filtered_out
Datadog: aerospike.server.namespace.batch_sub_read_filtered_out

Description

Number of batch-index read sub-batches that were skipped because the record was filtered out with Filter Expressions.

Introduced

4.7.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`batch_sub_read_not_found`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_batch_sub_read_not_found
Datadog: aerospike.server.namespace.batch_sub_read_not_found

Description

Number of batch-index read subtransaction that resulted in not found.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`batch_sub_read_success`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_batch_sub_read_success
Datadog: aerospike.server.namespace.batch_sub_read_success

Description

Number of records successfully read by batch-index sub-batches.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`batch_sub_read_timeout`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_batch_sub_read_timeout
Datadog: aerospike.server.namespace.batch_sub_read_timeout

Description

Number of batch-index read sub-batches that timed out.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`batch_sub_tsvc_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_batch_sub_tsvc_error
Datadog: aerospike.server.namespace.batch_sub_tsvc_error

Description

Number of batch-index sub-batches that failed with an error in the transaction service, before attempting to handle the transaction. For example, protocol errors or security permission mismatches. In strong-consistency enabled namespaces, this includes transactions against unavailable_partitions and dead_partitions.

The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes, and they are counted separately from tsvc timeouts.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`batch_sub_tsvc_timeout`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_batch_sub_tsvc_timeout
Datadog: aerospike.server.namespace.batch_sub_tsvc_timeout

Description

Number of batch-index sub-batches that timed out in the transaction service, before attempting to handle the transaction.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`batch_sub_udf_complete`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_batch_sub_udf_complete
Datadog: aerospike.server.namespace.batch_sub_udf_complete

Description

Number of completed batch-index UDF sub-batches for scan/query background UDF jobs. See the following statistics for the underlying operation statuses batch_sub_lang_delete_success, batch_sub_lang_error, batch_sub_lang_read_success, batch_sub_lang_write_success .

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`batch_sub_udf_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_batch_sub_udf_error
Datadog: aerospike.server.namespace.batch_sub_udf_error

Description

Number of failed batch-index UDF sub-batches for scan/query background UDF jobs. Does not include timeouts. See the following statistics for the underlying operation statuses: batch_sub_lang_delete_success, batch_sub_lang_error, batch_sub_lang_read_success, batch_sub_lang_write_success.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`batch_sub_udf_filtered_out`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_batch_sub_udf_filtered_out
Datadog: aerospike.server.namespace.batch_sub_udf_filtered_out

Description

Number of batch-index UDF sub-batches that did not happen because the record was filtered out with Filter Expressions.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`batch_sub_udf_timeout`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_batch_sub_udf_timeout
Datadog: aerospike.server.namespace.batch_sub_udf_timeout

Description

Number of batch-index UDF sub-batches that timed out for scan/query background UDF jobs. See the following statistics for the underlying operation statuses: batch_sub_lang_delete_success, batch_sub_lang_error, batch_sub_lang_read_success, batch_sub_lang_write_success.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`batch_sub_write_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_batch_sub_write_error
Datadog: aerospike.server.namespace.batch_sub_write_error

Description

Number of batch-index write sub-batches that failed with an error. For example, invalid set name, unavailable (if SC), failure to apply a predexp filter, key mismatch if key was sent), device error (i/o error), key busy (duplicate resolution or if SC), problem during bitwise, HLL or CDT.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`batch_sub_write_filtered_out`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_batch_sub_write_filtered_out
Datadog: aerospike.server.namespace.batch_sub_write_filtered_out

Description

Number of batch-index write sub-batches that did not happen because the record was filtered out with Filter Expressions.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`batch_sub_write_success`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_batch_sub_write_success
Datadog: aerospike.server.namespace.batch_sub_write_success

Description

Number of records successfully written by batch-index sub-batches.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`batch_sub_write_timeout`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_batch_sub_write_timeout
Datadog: aerospike.server.namespace.batch_sub_write_timeout

Description

Number of batch-index write sub-batches that timed out.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`bin_names`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_bin_names

Description

Number of bin names used for the namespace.

The formula for the associated metrics is as follows:

bin_names_quota - bin_names = available_bin_names

Introduced

3.9.0

Removed

7.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`bin_names_quota`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_bin_names_quota

Description

Quota of bin names for the namespace. Starting with Database 7.0.0, there is no limit on bin names per namespace. In Database 5.0.0 and 6.0.0, the limit was 65,535. The formula for the associated metrics is as follows:

bin_names_quota - bin_names = available_bin_names

If you have met the quota, see KB article How to clear up bin names when they exceed the limits.

Introduced

3.9.0

Removed

7.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`cache_read_pct`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_cache_read_pct
Datadog: aerospike.server.namespace.cache_read_pct

Description

Percentage of read commands that are hitting the post-write-cache or the blocks in the max-write-cache and will save an IO to the underlying storage device.

See the post-write-cache and read-page-cache documentation for ways to improve read-intensive workloads latency by leveraging those 2 different caching options.

Reads from update commands as well as migrations, scans, XDR reads and anything that tries to load a record off the device are accounted for in the cache_read_pct figures.

Introduced

3.9.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`client_delete_error`

warn

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_client_delete_error
Datadog: aerospike.server.namespace.client_delete_error

Description

Number of client delete commands that failed with an error.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Monitoring

Compare client_delete_error to client_delete_success.

If ratio is higher than acceptable, alert operations to investigate.

Labels

cluster_namejobserviceinstancelongitudelatitudens

`client_delete_filtered_out`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_client_delete_filtered_out
Datadog: aerospike.server.namespace.client_delete_filtered_out

Description

Number of client delete commands that did not happen because the record was filtered out with Filter Expression.

Introduced

4.7.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`client_delete_not_found`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_client_delete_not_found
Datadog: aerospike.server.namespace.client_delete_not_found

Description

Number of client delete commands that resulted in a not found.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`client_delete_success`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_client_delete_success
Datadog: aerospike.server.namespace.client_delete_success

Description

Number of successful client delete commands.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`client_delete_timeout`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_client_delete_timeout
Datadog: aerospike.server.namespace.client_delete_timeout

Description

Number of client delete commands that timed out.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`client_lang_delete_success`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_client_lang_delete_success
Datadog: aerospike.server.namespace.client_lang_delete_success

Description

Number of UDF commands that successfully deleted a record.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`client_lang_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_client_lang_error
Datadog: aerospike.server.namespace.client_lang_error

Description

Number of UDF commands that failed with a language (Lua) error during UDF execution.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`client_lang_read_success`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_client_lang_read_success
Datadog: aerospike.server.namespace.client_lang_read_success

Description

Number of successful record reads caused by a UDF command.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`client_lang_write_success`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_client_lang_write_success
Datadog: aerospike.server.namespace.client_lang_write_success

Description

Number of successful record writes caused by a UDF command.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`client_proxy_complete`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_client_proxy_complete
Datadog: aerospike.server.namespace.client_proxy_complete

Description

Number of client commands proxied to another node.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`client_proxy_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_client_proxy_error
Datadog: aerospike.server.namespace.client_proxy_error

Description

Number of client commands that failed to proxy to another node.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`client_proxy_timeout`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_client_proxy_timeout
Datadog: aerospike.server.namespace.client_proxy_timeout

Description

Number of client commands that timed out while being proxied to another node.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`client_read_error`

warn

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_client_read_error
Datadog: aerospike.server.namespace.client_read_error

Description

Number of read commands that failed with an error. For example, invalid set name, unavailable (if SC), failure to apply a predexp filter, key mismatch if key was sent), device error (i/o error), key busy (duplicate resolution or if SC), problem during bitwise, HLL or CDT.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Monitoring

Compare client_read_error to client_read_success.

If ratio is higher than acceptable, alert operations to investigate.

Labels

cluster_namejobserviceinstancelongitudelatitudens

`client_read_filtered_out`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_client_read_filtered_out
Datadog: aerospike.server.namespace.client_read_filtered_out

Description

Number of read commands that did not happen because they were filtered out.

Introduced

4.7.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`client_read_not_found`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_client_read_not_found
Datadog: aerospike.server.namespace.client_read_not_found

Description

Number of client read commands that resulted in not found.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`client_read_success`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_client_read_success
Datadog: aerospike.server.namespace.client_read_success

Description

Number of successful client read commands. Does not include records read by batch-reads or scans. batch-reads have the separate batch_sub_read_success metric. Scans have separate metrics depending on the type of scan between scan_basic_complete, scan_aggr_complete, scan_ops_bg_complete, and scan_udf_bg_complete metrics.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`client_read_timeout`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_client_read_timeout
Datadog: aerospike.server.namespace.client_read_timeout

Description

Number of client read commands that timed out.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`client_tsvc_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_client_tsvc_error
Datadog: aerospike.server.namespace.client_tsvc_error

Description

Number of client commands that failed in the transaction service, before attempting to handle the transaction. For example, protocol errors or security permission mismatch. In strong-consistency enabled namespaces, this includes commands against unavailable_partitions and dead_partitions.

The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`client_tsvc_timeout`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_client_tsvc_timeout
Datadog: aerospike.server.namespace.client_tsvc_timeout

Description

Number of client commands that timed out while in the transaction service, before attempting to handle the command. At this stage the commands has not yet been identified as a read or a write, but the namespace is known. Likely cause, there may not be enough service threads to keep pace with the workload. Other common situations falling into this category would be commands that have to be retried after waiting in the rw-hash (for example hotkeys) and use cases where the timeout set by the client is too aggressive.

The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`client_udf_complete`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_client_udf_complete
Datadog: aerospike.server.namespace.client_udf_complete

Description

Number of completed UDF commands initiated by the client.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`client_udf_error`

warn

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_client_udf_error
Datadog: aerospike.server.namespace.client_udf_error

Description

Number of failed UDF commands initiated by the client. Does not include timeouts. Error is also returned to the client.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Monitoring

Compare client_udf_error to client_udf_complete.

If ratio is higher than acceptable, alert operations to investigate.

Labels

cluster_namejobserviceinstancelongitudelatitudens

`client_udf_filtered_out`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_client_udf_filtered_out
Datadog: aerospike.server.namespace.client_udf_filtered_out

Description

Number of client UDF commands that did not happen because the record was filtered out with Filter Expressions.

Introduced

4.7.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`client_udf_timeout`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_client_udf_timeout
Datadog: aerospike.server.namespace.client_udf_timeout

Description

Number of UDF commands initiated by the client that timed out. The timeout error is returned to the client.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`client_write_error`

warn

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_client_write_error
Datadog: aerospike.server.namespace.client_write_error

Description

Number of client write commands that failed with an error. Includes common errors like fail_generation, fail_key_busy, fail_record_too_big, fail_xdr_forbidden and some less common errors. Includes xdr_client_write_error. See Why is my client_write_error metrics incrementing? for details on the type of errors that increment this statistic.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Monitoring

Compare client_write_error to client_write_success.

If ratio is higher than acceptable,alert operations to investigate.

For more details, see to the knowledge base article Why is my client_write_error metrics incrementing?.

Labels

cluster_namejobserviceinstancelongitudelatitudens

`client_write_filtered_out`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_client_write_filtered_out
Datadog: aerospike.server.namespace.client_write_filtered_out

Description

Number of client write commands that did not happen because the record was filtered out with Filter Expressions.

Introduced

4.7.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`client_write_success`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_client_write_success
Datadog: aerospike.server.namespace.client_write_success

Description

Number of successful client write commands. Includes xdr_client_write_success.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`client_write_timeout`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_client_write_timeout
Datadog: aerospike.server.namespace.client_write_timeout

Description

Number of client write commands that timed out on the server. On a stable cluster with no migrations in progress, this metric indicates the number of replica write timeouts. A timeout error is returned to the client. In strong-consistency enabled namespaces, the record is marked as unreplicated and will re-replicate. Includes xdr_client_write_timeout.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

The following conditions can cause this metric to increment:

Every single write replica failure (master failing to replicate) increments the client_write_timeout metric.
If duplicate resolution is enabled for writes (default), during migrations, the client_write_timeout metric also increments if there is a timeout during duplicate resolution and could occur before we apply the write on the master side.
See transaction-max-ms for details on when the server checks for timeout. Transactions can also timeout earlier in the transaction flow, in which case, the client_tsvc_timeout statistic increments.

`clock_skew_stop_writes`

critical

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_clock_skew_stop_writes
Datadog: aerospike.server.namespace.clock_skew_stop_writes

Description

Namespace will stop accepting client writes when true.

For strong-consistency enabled namespaces, will be true if the clock skew is outside of tolerance, typically 20 seconds.

For Available mode (AP) namespaces running Database 4.5.1 or later, and where NSUP is enabled (nsup-period not zero), will be true if the cluster clock skew exceeds 40 seconds. In such occurrences, NSUP will also not run, disabling record expirations and evictions until the clock skew falls back in the tolerated range.

Introduced

4.0.0

Removed

Measurement type

gauge

Data type

boolean

Monitoring

If clock_skew_stop_writes is true, it is a critical ALERT.

Verify that clocks are synchronized across the cluster.

Labels

cluster_namejobserviceinstancelongitudelatitudens

`current_time`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_current_time
Datadog: aerospike.server.namespace.current_time

Description

Current time represented as Aerospike epoch time.

Introduced

3.9.0

Removed

Measurement type

gauge

Data type

integer

Monitoring

If cluster_max(current_time) and cluster_min(current_time) differ by more than 10 seconds, critical ALERT.

Server time skew might indicate that NTP or similar service is not running on this node.

Labels

cluster_namejobserviceinstancelongitudelatitudens

`data_avail_pct`

critical

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_data_avail_pct
Datadog: aerospike.server.namespace.data_avail_pct

Description

Measures the minimum contiguous storage-engine device, pmem, or memory storage file space across all such files in a namespace. The namespace is read-only if this value falls below stop-writes-avail-pct. It is important for all configured storage files in a namespace to have the same size, otherwise, data_avail_pct could be low even when a lot of space is available across other files.

Introduced

7.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

Example: Where 5 files of 96MiB each for a given namespace, and each file has 24MiB of data spread across 6 write blocks (with the 8MiB write-block size):

The data_used_pct is 75%.
The data_avail_pct is 50%.
If the distribution is not perfectly uniform (which is usual), data_avail_pct represents the file that has the fewest free blocks.

`data_compression_ratio`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_data_compression_ratio
Datadog: aerospike.server.namespace.data_compression_ratio

Description

Measures the average compressed size to uncompressed size ratio. Thus 1.000 indicates no compression and 0.100 indicates a 1:10 compression ratio (90% reduction in size). device_compression_ratio is not included if the compression configuration parameter is set to none.

Introduced

7.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

The compression ratio is a moving average calculated based on the most recently written records. Read records do not factor into the ratio. Records that don’t try to compress are not included in the moving average. If the written data changes over time, then the compression ratio changes with it. In case of a sudden change in data, the indicated compression ratio may lag. As a rule of thumb, assume that the compression ratio covers the most recently written 100,000 to 1,000,000 records.

`data_total_bytes`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_data_total_bytes
Datadog: aerospike.server.namespace.data_total_bytes

Description

Regardless of storage-engine, the total allocated storage.

Introduced

7.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

`data_used_bytes`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_data_used_bytes
Datadog: aerospike.server.sets.data_used_bytes

Description

Regardless of storage-engine, the total storage allocated is data_total_bytes, and the amount of data used in that storage is data_used_bytes, which includes both user data and record overhead. For more details, see Calculating data storage.

Introduced

7.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

`data_used_pct`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_data_used_pct
Datadog: aerospike.server.namespace.data_used_pct

Description

Percentage of used storage capacity for this namespace. Calculated as data_used_bytes * 100 / data_total_bytes. Evictions will be triggered when this percentage crosses the configured evict-used-pct.

Introduced

7.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

`dead_partitions`

critical

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_dead_partitions
Datadog: aerospike.server.namespace.dead_partitions

Description

Number of dead partitions for this namespace when using strong-consistency. This is the number of partitions that are unavailable when all roster nodes are present. Requires the use of the revive command to make them available again. Revived nodes restore availability only when all nodes are trusted.

Introduced

4.0.0

Removed

Measurement type

gauge

Data type

integer

Monitoring

If dead_partitions is not zero, critical ALERT. If you are certain that there are no potential data inconsistencies or if data inconsistencies are acceptable, consider issuing revive and recluster commands.

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

Note

A typical scenario where partitions would be marked as dead for a strong-consistency enabled namespace would be when a number of nodes greater than replication-factor are taken out of the cluster without a clean shutdown, or have their storage erased (even if migrations complete between each node). Even though the data is fully present in the cluster, the remaining nodes in the cluster wouldn’t know whether the departed nodes potentially did accept any write commands and therefore cannot guarantee the integrity of the partitions that had all their replicas across those nodes. For example, for a replication factor 2 namespace configured as strong consistent on a 10 node cluster, shutting down one node, waiting for migrations to complete, then shutting down a second node, erasing storage and bringing both nodes back in results in approximately 90 partitions [2x(4096/(10x9))] being marked as dead. Invoking the revive and recluster commands provide 100% availability, and, in this particular case, no data inconsistencies. Dead partitions turn into unavailable_partitions every time the roster is not complete for a namespace. See Configuring strong consistency and Consistency management for further details.

`deleted_last_bin`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_deleted_last_bin
Datadog: aerospike.server.namespace.deleted_last_bin

Description

Number of objects deleted because their last bin was deleted.

Introduced

3.9.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`device_available_pct`

critical

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_device_available_pct
Datadog: aerospike.server.namespace.device_available_pct

Description

Measures the minimum contiguous disk space across all devices in a namespace. The namespace will be read only (stop writes) if this value falls below min-avail-pct. It is important for all configured devices in a namespace to have the same size, otherwise, the device_available_pct could be low even when a lot of space is available across other devices.

Introduced

3.9.0

Removed

7.0.0

Measurement type

gauge

Data type

integer

Monitoring

If device_available_pct drops below 20%, warn your operations group, this condition might indicate that defrag is unable to keep up with the current load.
If device_available_pct drops below 15%, critical ALERT.
If device_available_pct drops below 5%, usable disk resources are critically low. This condition might result in stop_writes.

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

Not to be confused with device_free_pct which represents the amount of free space across all devices in a namespace and does not take account of the fragmentation. Here is an example to represent the difference between device_free_pct and device_available_pct. Assume 5 devices of 100MiB each for a given namespace, where each device has 20MiB of data that are spread across 5 write-blocks (where each write-block is 8MiB):

The device_free_pctwould be 80%.
The device_available_pct would be 60%.
If the distribution is not uniform (it usually is not perfectly uniform) the device_available_pct would represent the device that has the least free blocks.

`device_compression_ratio`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_device_compression_ratio

Description

Measures the average compressed size to uncompressed size ratio. 1.000 indicates no compression and 0.100 indicates a 1:10 compression ratio (90% reduction in size). device_compression_ratio will not be included if compression is set to none.

Introduced

4.5.0.1

Removed

7.0.0

Measurement type

moving average

Data type

decimal

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

The compression ratio is a moving average. It is calculated based on the most recently written records. Read records do not factor into the ratio. Records that don’t try to compress are not included in the moving average. If the written data changes over time then the compression ratio will change with it. In case of a sudden change in data, the indicated compression ratio may lag behind a bit. As a rule of thumb, assume that the compression ratio covers the most recently written 100,000 to 1,000,000 records.

`device_free_pct`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_device_free_pct

Description

Percentage of disk capacity free for this namespace. This is the amount of free storage across all devices in the namespace. Evictions will be triggered when the used percentage across all devices (which is represented by 100 - device_free_pct) crosses the configured high-water-disk-pct.

Introduced

3.9.0

Removed

7.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

Not to be confused with device_available_pct which represents the amount of free contiguous space on the device that has the least contiguous free space across the namespace. Here is an example to represent the difference between device_free_pct and device_available_pct. Assume 5 devices of 100MB each for a given namespace, where each device has 25MB of data that are spread across 50 write blocks (let’s assume a 1MB write-block-size):

The device_free_pct would be 75%.
The device_available_pct would be 50%.
If the distribution is not uniform (it usually is not perfectly uniform) the device_available_pct would represent the device that has the least free blocks.

`device_total_bytes`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_device_total_bytes

Description

Total bytes of disk space allocated to this namespace on this node.

Introduced

3.9.0

Removed

7.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

`device_used_bytes`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_device_used_bytes

Description

Total bytes of disk space used by this namespace on this node.

Introduced

3.9.0

Removed

7.0.0

Measurement type

gauge

Data type

integer

Monitoring

Trending device_used_bytes provides operations insight into how disk usage changes over time for this namespace.

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

`dup_res_ask`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_dup_res_ask
Datadog: aerospike.server.namespace.dup_res_ask

Description

Number of duplicate resolution requests made by the node to other individual nodes.

Introduced

5.5.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`dup_res_respond_no_read`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_dup_res_respond_no_read
Datadog: aerospike.server.namespace.dup_res_respond_no_read

Description

Number of duplicate resolution requests handled by the node without reading the record.

Introduced

5.5.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`dup_res_respond_read`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_dup_res_respond_read
Datadog: aerospike.server.namespace.dup_res_respond_read

Description

Number of duplicate resolution requests handled by the node where the record was read.

Introduced

5.5.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`effective_active_rack`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_effective_active_rack
Datadog: aerospike.server.namespace.effective_active_rack

Description

The effective active-rack for the namespace. The configured active rack owns all of the master partition copies.

For strong consistency-enabled namespaces, this is the roster’s current active rack. Otherwise, it is the configured active-rack.

Introduced

7.2.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`effective_is_quiesced`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_effective_is_quiesced
Datadog: aerospike.server.namespace.effective_is_quiesced

Description

Reports ‘true’ when the namespace has rebalanced after previously receiving a quiesce info request.

Introduced

4.3.1

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`effective_prefer_uniform_balance`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_effective_prefer_uniform_balance
Datadog: aerospike.server.namespace.effective_prefer_uniform_balance

Description

Applies only to Enterprise Edition. Value can be true or false. If Aerospike applied the uniform balance algorithm for the current cluster state, the value returned is true. If any node having this namespace isn’t configured with prefer-uniform-balance true, the value returned is false and uniform balance algorithm is disabled for this namespace on all participating nodes.

Introduced

4.3.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`effective_replication_factor`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_effective_replication_factor
Datadog: aerospike.server.namespace.effective_replication_factor

Description

The effective replication factor for the namespace, included with the namespace info command metrics.

The effective replication factor is less than the replication-factor if the cluster size is smaller than the RF, in which case the effective replication factor would match the cluster size.

In Database 5.7.0 and earlier, if the paxos-single-replica-limit size is reached, the effective replication factor is 1.

The effective replication factor is 0 for a node that has been orphaned by the cluster. For example, if a node tries to join a cluster but that node is unable to communicate with every other node in the cluster, the principal node rejects the request and the node marks itself as an orphan.

Introduced

3.15.1.3

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

For AP namespaces in Database 7.1.0 and earlier, the effective replication factor drops when a node is shut down or crashes, and the remaining nodes are fewer than the RF. In Database 5.7.0 and earlier, if the paxos-single-replica-limit size is reached, the effective replication factor is 1.

`evict_ttl`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_evict_ttl
Datadog: aerospike.server.namespace.evict_ttl

Description

The current eviction depth, or the highest ttl of records that have been evicted, in seconds.

Introduced

3.9.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`evict_void_time`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_evict_void_time
Datadog: aerospike.server.namespace.evict_void_time

Description

The current eviction depth, expressed as a void time in seconds since 1 January 2010 UTC.

Introduced

4.5.1

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`evicted_objects`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_evicted_objects
Datadog: aerospike.server.namespace.evicted_objects

Description

Number of objects evicted from this namespace on this node since the server started.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`expired_objects`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_expired_objects
Datadog: aerospike.server.namespace.expired_objects

Description

The number of objects expired from this namespace on this node since the server started.

Introduced

3.9

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`fail_client_lost_conflict`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_fail_client_lost_conflict
Datadog: aerospike.server.namespace.fail_client_lost_conflict

Description

Number of non-XDR write commands that failed because some bin’s last-update-time is greater than the write command’s time. Error code 28 is returned. This can happen only when the XDR bin convergence feature is enabled. This can happen due to either:

a clock skew across DCs causing XDR write commands to write bins with a future timestamp compared to local time.
a race condition between an incoming XDR write command and a local client write command.

See fail_xdr_lost_conflict and cluster_max_compatibility_id.

Introduced

5.6.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`fail_generation`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_fail_generation
Datadog: aerospike.server.namespace.fail_generation

Description

Number of read/write commands failed on generation check.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`fail_key_busy`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_fail_key_busy
Datadog: aerospike.server.namespace.fail_key_busy

Description

Number of read/write commands that failed on ‘hot keys’, meaning there were already a number of commands queued up higher than transaction-pending-limit for the same record waiting in the rw-hash or rw_in_progress. For read this can only happen when duplicate resolution is necessary.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Monitoring

If the application is not expected to have hot keys and fail_key_busy rate of change exceeds expectations, this condition might indicate a problem with the application.

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

Detail level logging for the rw context will log transactions (digest) triggering this error. Read transactions would only fail if they had to go through the rw-hash (for example if duplicate resolution are in effect).

`fail_mrt_blocked`

enterprise watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_fail_mrt_blocked
Datadog: aerospike.server.namespace.fail_mrt_blocked

Description

Number of transactions or read/write commands blocked by an ongoing transaction.

Introduced

8.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancens

`fail_mrt_version_mismatch`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_fail_mrt_version_mismatch
Datadog: aerospike.server.namespace.fail_mrt_version_mismatch

Description

Number of version mismatches - usually in verify reads, but also individual commands (reads/writes/deletes/UDFs) where version checks occur if the record had previously been read in the transaction.

Introduced

8.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancens

`fail_record_too_big`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_fail_record_too_big
Datadog: aerospike.server.namespace.fail_record_too_big

Description

Number of write commands that failed because a record was larger than max-record-size. Only counts client writes failures on master side.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

Detail level logging for the rw context will log transactions (digest) triggering this error (originating from client side master writes). Enabling detail level logging for the drv_ssd context will log all attempts at writing records that are too big, including replica-writes, immigration (migrations) writes and applying duplicate resolution winners. See “How do I change the write-block-size configuration?” for more information.

`fail_xdr_forbidden`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_fail_xdr_forbidden
Datadog: aerospike.server.namespace.fail_xdr_forbidden

Description

Number of read/write commands that failed due to configuration restriction. Error code 22 is returned. This counts any of the traffic rejected due to either of the following:

incoming XDR traffic (xdr-write stat) and allow-xdr-writes set to false.
non-XDR write traffic and allow-nonxdr-writes set to false.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`fail_xdr_key_busy`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_fail_xdr_key_busy
Datadog: aerospike.server.namespace.fail_xdr_key_busy

Description

Number of XDR key-busy errors (code 32) that have occurred. This error is raised if either of the following occurs:

ship-versions-policy is all and a new write is attempted before the most recent update to the record successfully shipped to the destination.
ship-versions-policy is interval and a new write is attempted before at least one version has shipped in the most recent ship-versions-interval.

Introduced

7.2.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`fail_xdr_lost_conflict`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_fail_xdr_lost_conflict
Datadog: aerospike.server.namespace.fail_xdr_lost_conflict

Description

Number of XDR write commands that did not succeed in updating all the attempted bins. Only a subset of bin updates might have failed or all the bin updates might have failed. This can happen only when the XDR bin convergence feature is enabled. If a conflicting write happens on the same record across two or more data centers, the bin with the earlier last update time will lose during XDR shipping. An XDR retry due to a timeout, where a record that has already been successfully updated at a destination is received again, would fail and this metric will be updated. In other retry scenarios, such as key busy or device busy, the remote record will not be updated. Only a timeout-based retry can lead to this situation. See fail_client_lost_conflict.

Introduced

5.6.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_batch_sub_delete_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_batch_sub_delete_error
Datadog: aerospike.server.namespace.from_proxy_batch_sub_delete_error

Description

Number of batch-index delete subtransactions proxied from another node that failed with an error.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_batch_sub_delete_filtered_out`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_batch_sub_delete_filtered_out
Datadog: aerospike.server.namespace.from_proxy_batch_sub_delete_filtered_out

Description

Number of batch-index delete subtransactions proxied from another node that did not happen because the record was filtered out with Filter Expressions.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_batch_sub_delete_not_found`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_batch_sub_delete_not_found
Datadog: aerospike.server.namespace.from_proxy_batch_sub_delete_not_found

Description

Number of batch-index delete subtransactions proxied from another node that resulted in not found.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_batch_sub_delete_success`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_batch_sub_delete_success
Datadog: aerospike.server.namespace.from_proxy_batch_sub_delete_success

Description

Number of records successfully deleted by batch-index subtransactions proxied from another node.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_batch_sub_delete_timeout`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_batch_sub_delete_timeout
Datadog: aerospike.server.namespace.from_proxy_batch_sub_delete_timeout

Description

Number of batch-index delete subtransactions proxied from another node that timed out.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_batch_sub_lang_delete_success`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_batch_sub_lang_delete_success
Datadog: aerospike.server.namespace.from_proxy_batch_sub_lang_delete_success

Description

Number of successful batch-index UDF delete subtransactions proxied from another node.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_batch_sub_lang_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_batch_sub_lang_error
Datadog: aerospike.server.namespace.from_proxy_batch_sub_lang_error

Description

Number of language (Lua) batch-index errors for UDF sub-transactions proxied from another node.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_batch_sub_lang_read_success`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_batch_sub_lang_read_success
Datadog: aerospike.server.namespace.from_proxy_batch_sub_lang_read_success

Description

Number of successful batch-index UDF read subtransactions proxied from another node.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_batch_sub_lang_write_success`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_batch_sub_lang_write_success
Datadog: aerospike.server.namespace.from_proxy_batch_sub_lang_write_success

Description

Number of successful batch-index UDF write subtransactions proxied from another node.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_batch_sub_read_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_batch_sub_read_error
Datadog: aerospike.server.namespace.from_proxy_batch_sub_read_error

Description

Number of batch-index read sub-transactions proxied from another node that failed with an error.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_batch_sub_read_filtered_out`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_batch_sub_read_filtered_out
Datadog: aerospike.server.namespace.from_proxy_batch_sub_read_filtered_out

Description

Number of batch-index read subtransactions proxied from another node that did not happen because the record was filtered out with Filter Expressions.

Introduced

4.7.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_batch_sub_read_not_found`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_batch_sub_read_not_found
Datadog: aerospike.server.namespace.from_proxy_batch_sub_read_not_found

Description

Number of batch-index read subtransactions proxied from another node that resulted in not found.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_batch_sub_read_success`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_batch_sub_read_success
Datadog: aerospike.server.namespace.from_proxy_batch_sub_read_success

Description

Number of records successfully read by batch-index subtransactions proxied from another node.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_batch_sub_read_timeout`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_batch_sub_read_timeout
Datadog: aerospike.server.namespace.from_proxy_batch_sub_read_timeout

Description

Number of batch-index read subtransactions proxied from another node that timed out.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_batch_sub_tsvc_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_batch_sub_tsvc_error
Datadog: aerospike.server.namespace.from_proxy_batch_sub_tsvc_error

Description

Number of batch-index subtransactions proxied from another node that failed with an error in the transaction service, before attempting to handle the transaction. For example, protocol errors or security permission mismatch. In strong-consistency enabled namespaces, this will include transactions against unavailable_partitions and dead_partitions.

The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_batch_sub_tsvc_timeout`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_batch_sub_tsvc_timeout
Datadog: aerospike.server.namespace.from_proxy_batch_sub_tsvc_timeout

Description

Number of batch-index subtransactions proxied from another node that timed out in the transaction service, before attempting to handle the transaction.

The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_batch_sub_udf_complete`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_batch_sub_udf_complete
Datadog: aerospike.server.namespace.from_proxy_batch_sub_udf_complete

Description

Number of completed batch-index UDF subtransactions proxied from another node for scan/query background UDF jobs. See the following statistics for the underlying operation statuses: from_proxy_batch_sub_lang_delete_success, from_proxy_batch_sub_lang_error, from_proxy_batch_sub_lang_read_success, from_proxy_batch_sub_lang_write_success.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_batch_sub_udf_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_batch_sub_udf_error
Datadog: aerospike.server.namespace.from_proxy_batch_sub_udf_error

Description

Number of failed batch-index UDF subtransactions proxied from another node for scan/query background UDF jobs. Does not include timeouts. See the following statistics for the underlying operation statuses: from_proxy_batch_sub_lang_delete_success, from_proxy_batch_sub_lang_error, from_proxy_batch_sub_lang_read_success, from_proxy_batch_sub_lang_write_success.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_batch_sub_udf_filtered_out`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_batch_sub_udf_filtered_out
Datadog: aerospike.server.namespace.from_proxy_batch_sub_udf_filtered_out

Description

Number of batch-index UDF subtransactions proxied from another node that did not happen because the record was filtered out with Filter Expressions.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_batch_sub_udf_timeout`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_batch_sub_udf_timeout
Datadog: aerospike.server.namespace.from_proxy_batch_sub_udf_timeout

Description

Number of batch-index UDF subtransactions proxied from another node that timed out for scan/query background UDF jobs. See the following statistics for the underlying operation statuses: from_proxy_batch_sub_lang_delete_success, from_proxy_batch_sub_lang_error, from_proxy_batch_sub_lang_read_success, from_proxy_batch_sub_lang_write_success.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_batch_sub_write_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_batch_sub_write_error
Datadog: aerospike.server.namespace.from_proxy_batch_sub_write_error

Description

Number of batch-index write subtransactions proxied from another node that failed with an error.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_batch_sub_write_filtered_out`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_batch_sub_write_filtered_out
Datadog: aerospike.server.namespace.from_proxy_batch_sub_write_filtered_out

Description

Number of batch-index write subtransactions proxied from another node that did not happen because the record was filtered out with Filter Expressions.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_batch_sub_write_success`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_batch_sub_write_success
Datadog: aerospike.server.namespace.from_proxy_batch_sub_write_success

Description

Number of records successfully written by batch-index subtransactions proxied from another node.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_batch_sub_write_timeout`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_batch_sub_write_timeout
Datadog: aerospike.server.namespace.from_proxy_batch_sub_write_timeout

Description

Number of batch-index write subtransactions proxied from another node that timed out.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_delete_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_delete_error
Datadog: aerospike.server.namespace.from_proxy_delete_error

Description

Number of errors for delete transactions proxied from another node. This includes xdr_from_proxy_delete_error.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_delete_filtered_out`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_delete_filtered_out
Datadog: aerospike.server.namespace.from_proxy_delete_filtered_out

Description

Number of delete transactions proxied from another node that did not happen because the record was filtered out with Filter Expressions.

Introduced

4.7.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_delete_not_found`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_delete_not_found
Datadog: aerospike.server.namespace.from_proxy_delete_not_found

Description

Number of delete transactions proxied from another node that resulted in not found. This includes xdr_from_proxy_delete_not_found.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_delete_success`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_delete_success
Datadog: aerospike.server.namespace.from_proxy_delete_success

Description

Number of successful delete transactions proxied from another node. This includes xdr_from_proxy_delete_success.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_delete_timeout`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_delete_timeout
Datadog: aerospike.server.namespace.from_proxy_delete_timeout

Description

Number of timeouts for delete transactions proxied from another node. This includes xdr_from_proxy_delete_timeout.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_lang_delete_success`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_lang_delete_success
Datadog: aerospike.server.namespace.from_proxy_lang_delete_success

Description

Number of successful UDF delete transactions proxied from another node.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_lang_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_lang_error
Datadog: aerospike.server.namespace.from_proxy_lang_error

Description

Number of language (Lua) errors for UDF transactions proxied from another node.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_lang_read_success`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_lang_read_success
Datadog: aerospike.server.namespace.from_proxy_lang_read_success

Description

Number of successful UDF read commands proxied from another node.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_lang_write_success`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_lang_write_success
Datadog: aerospike.server.namespace.from_proxy_lang_write_success

Description

Number of successful UDF write commands proxied from another node.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_read_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_read_error
Datadog: aerospike.server.namespace.from_proxy_read_error

Description

Number of errors for read commands proxied from another node.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_read_filtered_out`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_read_filtered_out
Datadog: aerospike.server.namespace.from_proxy_read_filtered_out

Description

Number of read commands proxied from another node that did not happen because they were filtered out with Filter Expressions.

Introduced

4.7.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_read_not_found`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_read_not_found
Datadog: aerospike.server.namespace.from_proxy_read_not_found

Description

Number of read commands proxied from another node that resulted in not found.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_read_success`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_read_success
Datadog: aerospike.server.namespace.from_proxy_read_success

Description

Number of successful read commands proxied from another node.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_read_timeout`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_read_timeout
Datadog: aerospike.server.namespace.from_proxy_read_timeout

Description

Number of timeouts for read commands proxied from another node.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_tsvc_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_tsvc_error
Datadog: aerospike.server.namespace.from_proxy_tsvc_error

Description

Number of commands proxied from another node that failed in the transaction service, before attempting to handle the commands. For example protocol errors or security permission mismatch. In strong-consistency enabled namespaces, this will include commands against unavailable_partitions and dead_partitions.

The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_tsvc_timeout`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_tsvc_timeout
Datadog: aerospike.server.namespace.from_proxy_tsvc_timeout

Description

Number of commands proxied from another node that timed out while in the transaction service, before attempting to handle the commands. At this stage the commands has not yet been identified as a read or a write, but the namespace is known. There could be congestion in the internal transaction queue, or it could be that the timeout set by the client is too aggressive.

The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_udf_complete`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_udf_complete
Datadog: aerospike.server.namespace.from_proxy_udf_complete

Description

Number of successful UDF commands proxied from another node.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_udf_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_udf_error
Datadog: aerospike.server.namespace.from_proxy_udf_error

Description

Number of errors for UDF commands proxied from another node.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_udf_filtered_out`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_udf_filtered_out
Datadog: aerospike.server.namespace.from_proxy_udf_filtered_out

Description

Number of UDF commands proxied from another node that did not happen because the record was filtered out with Filter Expressions.

Introduced

4.7.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_udf_timeout`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_udf_timeout
Datadog: aerospike.server.namespace.from_proxy_udf_timeout

Description

Number of timeouts for UDF commands proxied from another node.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_write_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_write_error
Datadog: aerospike.server.namespace.from_proxy_write_error

Description

Number of errors for write commands proxied from another node. This includes xdr_from_proxy_write_error.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_write_filtered_out`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_write_filtered_out
Datadog: aerospike.server.namespace.from_proxy_write_filtered_out

Description

Number of write commands proxied from another node that did not happen because the record was filtered out with Filter Expressions.

Introduced

4.7.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_write_success`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_write_success
Datadog: aerospike.server.namespace.from_proxy_write_success

Description

Number of successful write commands proxied from another node. This includes xdr_from_proxy_write_success.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`from_proxy_write_timeout`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_from_proxy_write_timeout
Datadog: aerospike.server.namespace.from_proxy_write_timeout

Description

Number of timeouts for write commands proxied from another node. This includes xdr_from_proxy_write_timeout.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`geo_region_query_cells`

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_geo_region_query_cells
Datadog: aerospike.server.namespace.geo_region_query_cells

Description

Number of cell coverings for query region queried.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`geo_region_query_falsepos`

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_geo_region_query_falsepos
Datadog: aerospike.server.namespace.geo_region_query_falsepos

Description

Number of points outside the region. Total query result points is geo_region_query_points + geo_region_query_falsepos.

Introduced

3.9.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`geo_region_query_points`

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_geo_region_query_points
Datadog: aerospike.server.namespace.geo_region_query_points

Description

Number of points within the region. Total query result points is geo_region_query_points + geo_region_query_falsepos.

Introduced

3.9.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`geo_region_query_reqs`

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_geo_region_query_reqs
Datadog: aerospike.server.namespace.geo_region_query_reqs

Description

Number of geo queries on the system since the uptime of the node.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`hwm_breached`

critical

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_hwm_breached
Datadog: aerospike.server.namespace.hwm_breached

Description

If true, Aerospike has breached ‘high-water-[disk|memory]-pct’ for this namespace.

Introduced

3.9.0

Removed

Measurement type

gauge

Data type

boolean

Monitoring

If hwm_breached is true, alert your operations group that memory or disk resources are strained. This condition might indicate the need to increase cluster capacity.

Labels

cluster_namejobserviceinstancelongitudelatitudens

`index-type.mount[ix].age`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_index-type.mount[ix].age

Description

Applies only to Enterprise Edition configured to index-type flash. This shows the percentage of lifetime (total usage) claimed by OEM for underlying device. Value is -1 unless underlying device is NVMe and may exceed 100. ‘ix’ is the device index. For example, storage-engine.file[0]=/opt/aerospike/test0.dat and storage-engine.file[1]=/opt/aerospike/test2.dat for 2 files specified in the configuration.

Introduced

4.3.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`index_flash_alloc_bytes`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_index_flash_alloc_bytes

Description

Applies only to Enterprise Edition configured with index-type flash. Total bytes allocated on the mount(s) for the primary index used by this namespace on this node. This statistic represents entire 4KiB chunks which have at least one element in use. Also available in the log on the index-flash-usage ticker entry.

Introduced

5.6.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`index_flash_alloc_pct`

warn

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_index_flash_alloc_pct
Datadog: aerospike.server.namespace.index_flash_alloc_pct

Description

Applies only to Enterprise Edition configured with index-type flash. Percentage of the mount(s) allocated for the primary index used by this namespace on this node. Prior to Database 7.0.0, calculated as (index_flash_alloc_bytes / index-type.mounts-size-limit) * 100. In Database 7.0.0 and later, calculated as (index_flash_alloc_bytes / index-type.mounts-budget) * 100. This statistic represents entire 4KiB chunks which have at least one element in use. Also available in the log on the index-flash-usage ticker entry.

Introduced

5.6.0

Removed

Measurement type

gauge

Data type

integer

Monitoring

If index_flash_alloc_pct gets close to or greater than 100%, alert operations to review the sizing of the namespace.

Labels

cluster_namejobserviceinstancelongitudelatitudens

`index_flash_used_bytes`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_index_flash_used_bytes

Description

Applies only to Enterprise Edition configured with index-type flash. Total bytes in-use on the mount(s) for the primary index used by this namespace on this node. This is the same value memory_used_index_bytes would have if the index were not persisted.

Introduced

4.3.0

Removed

7.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

`index_flash_used_pct`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_index_flash_used_pct

Description

Applies only to Enterprise Edition configured with index-type flash. Percentage of the mount(s) in-use for the primary index used by this namespace on this node. Calculated as (index_flash_used_bytes / index-type.mounts-size-limit) * 100.

Introduced

4.3.0

Removed

7.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

`index_mounts_used_pct`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_index_mounts_used_pct

Description

Applies only to Enterprise Edition configured with index-type pmem or flash. Percentage of the mount(s) in-use for the primary index used by this namespace on this node.

Introduced

7.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

`index_pmem_used_bytes`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_index_pmem_used_bytes

Description

Applies only to Enterprise Edition configured with index-type pmem. Total bytes in-use on the mount(s) for the primary index used by this namespace on this node. This is the same value memory_used_index_bytes would have if the index were not persisted.

Introduced

4.5.0

Removed

7.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

`index_pmem_used_pct`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_index_pmem_used_pct

Description

Applies only to Enterprise Edition configured with index-type pmem. Percentage of the mount(s) in-use for the primary index used by this namespace on this node. Calculated as (index_pmem_used_bytes / index-type.mounts-size-limit) * 100

Introduced

4.5.0

Removed

7.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

`index_used_bytes`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_index_used_bytes
Datadog: aerospike.server.namespace.index_used_bytes

Description

Amount of memory occupied by the primary index for this namespace. Applies to all types of index storage (index-type.

Introduced

7.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

`indexes_memory_used_pct`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_indexes_memory_used_pct

Description

Combined RAM indexes’ size as a percentage of indexes-memory-budget when indexes-memory-budget is configured nonzero.

Introduced

7.1.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancens

Detail

`master_tombstones`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_master_tombstones
Datadog: aerospike.server.namespace.master_tombstones

Description

Number of tombstones on this node which are active masters.

Introduced

3.10.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`max-evicted-ttl`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_max-evicted-ttl

Description

The highest record TTL that Aerospike has evicted from this namespace.

Introduced

Removed

Yes

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`max_void_time`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_max_void_time
Datadog: aerospike.server.namespace.max_void_time

Description

Maximum record TTL ever inserted into this namespace.

Introduced

3.9.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`memory_free_pct`

critical

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_memory_free_pct

Description

Percentage of memory capacity free for this namespace.

Introduced

3.9.0

Removed

7.0.0

Measurement type

gauge

Data type

integer

Monitoring

If memory_free_pct approaches the configured value for high-water-memory-pct or stop-writes-pct, alert operations to investigate the cause. Might indicate a need to reduce the object count or increase capacity and may require further investigation into memory_used_sindex_bytes if secondary indexes are in use, into memory_used_set_index_bytes if set indexes are used, or into heap_efficiency_pct if data is stored in memory.

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

`memory_used_bytes`

warn

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_memory_used_bytes

Description

Total bytes of memory used by this namespace on this node. Used against the high-water-memory-pct and stop-writes-pct thresholds. It represents the sum of the following values:

memory_used_data_bytes
memory_used_index_bytes
memory_used_sindex_bytes
memory_used_set_index_bytes (Database 5.6.0 and later)

See heap_allocated_kbytes for the total amount of memory allocated on a node other than primary index shared memory in Enterprise Edition and, for Database 6.1.0 and later, secondary index shared memory in Enterprise Edition.

Introduced

3.9.0

Removed

7.0.0

Measurement type

gauge

Data type

integer

Monitoring

Trending used-bytes-memory provides operations insight into how memory usage changes over time for this namespace.

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

`memory_used_data_bytes`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_memory_used_data_bytes

Description

Amount of memory occupied by data. See memory_used_bytes for the total memory accounted for the namespace.

Introduced

3.9.0

Removed

7.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

`memory_used_index_bytes`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_memory_used_index_bytes

Description

Amount of memory occupied by the index for this namespace. Allocated in shared memory by default (index-type shmem) for the Enterprise Edition.
If your index is persisted, either in block storage (index-type flash, or in persistent memory (index-type pmem, (Database 4.5.0 and later), refer instead to index_flash_used_bytes or index_pmem_used_bytes. For these persisted index configurations, the value of memory_used_index_bytes is 0.

See memory_used_bytes for the total memory accounted for the namespace.

Introduced

3.9.0

Removed

7.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

`memory_used_set_index_bytes`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_memory_used_set_index_bytes

Description

Amount of memory occupied by set indexes for this namespace on this node. See memory_used_bytes for the total memory accounted for the namespace.

Introduced

5.6.0

Removed

7.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

`memory_used_sindex_bytes`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_memory_used_sindex_bytes

Description

Amount of memory occupied by secondary indexes for this namespace on this node. See memory_used_bytes for the total memory accounted for the namespace.

Introduced

3.9.0

Removed

7.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

`migrate_fresh_partitions`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_migrate_fresh_partitions
Datadog: aerospike.server.namespace.migrate_fresh_partitions

Description

Number of partitions that are created fresh or empty because a number of nodes, greater than the replication factor, have left the cluster. Applies to AP and SC namespaces.

Introduced

7.1.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancens

`migrate_record_receives`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_migrate_record_receives
Datadog: aerospike.server.namespace.migrate_record_receives

Description

Number of record insert request received by immigration.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`migrate_record_retransmits`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_migrate_record_retransmits
Datadog: aerospike.server.namespace.migrate_record_retransmits

Description

Number of times emigration has retransmitted records.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

Retransmission statistics are collected in the retransmits ticker log line.

`migrate_records_skipped`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_migrate_records_skipped
Datadog: aerospike.server.namespace.migrate_records_skipped

Description

Number of times emigration did not ship a record because the remote node was already up-to-date.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`migrate_records_transmitted`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_migrate_records_transmitted
Datadog: aerospike.server.namespace.migrate_records_transmitted

Description

Number of records emigration has read and sent.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`migrate_records_unreadable`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_migrate_records_unreadable
Datadog: aerospike.server.namespace.migrate_records_unreadable

Description

Number of records skipped during migration because they were unreadable when migrate-skip-unreadable is enabled.

Introduced

7.0.0.18, 7.1.0.9, 7.2.0.3

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`migrate_rx_instance_count`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_migrate_rx_instance_count

Description

Number of instance objects managing immigrations.

Introduced

3.9.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`migrate_rx_partitions_active`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_migrate_rx_partitions_active
Datadog: aerospike.server.namespace.migrate_rx_partitions_active

Description

Number of partitions currently immigrating to this node. If migrate_rx_partitions_active is greater than 0 and cluster is not in maintenance, Operations needs to identify why migrations are running.

Introduced

3.9.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`migrate_rx_partitions_initial`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_migrate_rx_partitions_initial
Datadog: aerospike.server.namespace.migrate_rx_partitions_initial

Description

Total number of migrations this node will receive during the current migration cycle for this namespace.

Introduced

3.9.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`migrate_rx_partitions_remaining`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_migrate_rx_partitions_remaining
Datadog: aerospike.server.namespace.migrate_rx_partitions_remaining

Description

Number of migrations this node has not yet received during the current migration cycle for this namespace.

Introduced

3.9.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`migrate_signals_active`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_migrate_signals_active
Datadog: aerospike.server.namespace.migrate_signals_active

Description

For finished partition migrations on this node, number of outstanding clean-up signals, sent to participating member nodes, waiting for clean-up acknowledgment. Signals are messages that are sent from a partition’s master node to all other nodes that currently have data for the partition. The signals are used to notify all nodes that migrations have completed for this partitions and if they aren’t a replica they can now drop the partition.

Introduced

3.13.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`migrate_signals_remaining`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_migrate_signals_remaining
Datadog: aerospike.server.namespace.migrate_signals_remaining

Description

For unfinished partition migrations on this node, number of clean-up signals to send to participating member nodes, as migration completes. Signals are messages that are sent from a partition’s master node to all other nodes that currently have data for the partition. The signals are used to notify all nodes that migrations have completed for this partitions and if they aren’t a replica they can now drop the partition.

Introduced

3.13.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`migrate_tx_instance_count`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_migrate_tx_instance_count
Datadog: aerospike.server.namespace.migrate_tx_instance_count

Description

Number of instance objects managing emigrations.

Introduced

3.9.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`migrate_tx_partitions_active`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_migrate_tx_partitions_active
Datadog: aerospike.server.namespace.migrate_tx_partitions_active

Description

Number of partitions currently emigrating from this node. If migrate_tx_partitions_active is greater than 0 and cluster is not in maintenance, Operations needs to identify why migrations are running.

Introduced

3.9.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`migrate_tx_partitions_imbalance`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_migrate_tx_partitions_imbalance
Datadog: aerospike.server.namespace.migrate_tx_partitions_imbalance

Description

Number of partition migrations failures which could lead to partitions being imbalanced. For each increment there will also be a warning logged.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`migrate_tx_partitions_initial`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_migrate_tx_partitions_initial
Datadog: aerospike.server.namespace.migrate_tx_partitions_initial

Description

Total number of migrations this node will send during the current migration cycle for this namespace.

Introduced

3.9.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`migrate_tx_partitions_lead_remaining`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_migrate_tx_partitions_lead_remaining
Datadog: aerospike.server.namespace.migrate_tx_partitions_lead_remaining

Description

Number of initially scheduled emigrations which are not delayed by the migrate-fill-delay configuration. Lead migrations are typically delta-migrations addressing non-empty partition replica nodes. Delta-migrations generally consume far less storage IO.

Introduced

4.3.1

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`migrate_tx_partitions_remaining`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_migrate_tx_partitions_remaining
Datadog: aerospike.server.namespace.migrate_tx_partitions_remaining

Description

Number of migrations this node not yet sent during the current migration cycle for this namespace.

Introduced

3.9.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`mrt_monitor_roll_back_error`

enterprise watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_mrt_monitor_roll_back_error
Datadog: aerospike.server.namespace.mrt_monitor_roll_back_error

Description

Subset of mrt_roll_back_error where monitor did the roll back.

Introduced

8.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancens

`mrt_monitor_roll_back_success`

enterprise watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_mrt_monitor_roll_back_success
Datadog: aerospike.server.namespace.mrt_monitor_roll_back_success

Description

Subset of mrt_roll_back_success where monitor did the roll back.

Introduced

8.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancens

`mrt_monitor_roll_back_timeout`

enterprise watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_mrt_monitor_roll_back_timeout
Datadog: aerospike.server.namespace.mrt_monitor_roll_back_timeout

Description

Subset of mrt_roll_back_timeout where monitor did the roll back.

Introduced

8.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancens

`mrt_monitor_roll_forward_error`

enterprise watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_mrt_monitor_roll_forward_error
Datadog: aerospike.server.namespace.mrt_monitor_roll_forward_error

Description

Subset of mrt_roll_forward_error where monitor did the roll forward.

Introduced

8.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancens

`mrt_monitor_roll_forward_success`

enterprise watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_mrt_monitor_roll_forward_success
Datadog: aerospike.server.namespace.mrt_monitor_roll_forward_success

Description

Subset of mrt_roll_forward_success where monitor did the roll forward.

Introduced

8.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancens

`mrt_monitor_roll_forward_timeout`

enterprise watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_mrt_monitor_roll_forward_timeout
Datadog: aerospike.server.namespace.mrt_monitor_roll_forward_timeout

Description

Subset of mrt_roll_forward_timeout where monitor did the roll forward.

Introduced

8.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancens

`mrt_monitor_roll_tombstone_creates`

enterprise optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_mrt_monitor_roll_tombstone_creates
Datadog: aerospike.server.namespace.mrt_monitor_roll_tombstone_creates

Description

Number of times monitor transactions rolls (forward or back) generate tombstones from nothing – this is rare but normal.

Introduced

8.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancens

`mrt_monitors`

enterprise watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_mrt_monitors

Description

The number of mrt_monitors records in a namespace.

Introduced

8.0.0

Removed

Measurement type

gauge

Data type

integer

`mrt_monitors_active`

enterprise watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_mrt_monitors_active
Datadog: aerospike.server.namespace.mrt_monitors_active

Description

Number of monitors currently driving roll forwards or roll backs after a transaction timeout.

Introduced

8.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancens

`mrt_provisionals`

enterprise watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_mrt_provisionals
Datadog: aerospike.server.namespace.mrt_provisionals

Description

Number of provisional records in a transaction.

Introduced

8.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancens

`mrt_roll_back_error`

enterprise watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_mrt_roll_back_error
Datadog: aerospike.server.namespace.mrt_roll_back_error

Description

Number of roll back transactions that failed.

Introduced

8.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancens

`mrt_roll_back_success`

enterprise watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_mrt_roll_back_success
Datadog: aerospike.server.namespace.mrt_roll_back_success

Description

Number of roll back transactions that succeeded.

Introduced

8.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancens

`mrt_roll_back_timeout`

enterprise watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_mrt_roll_back_timeout
Datadog: aerospike.server.namespace.mrt_roll_back_timeout

Description

Number of roll back transactions that timed out.

Introduced

8.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancens

`mrt_roll_forward_error`

enterprise watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_mrt_roll_forward_error
Datadog: aerospike.server.namespace.mrt_roll_forward_error

Description

Number of roll forward transactions that failed.

Introduced

8.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancens

`mrt_roll_forward_success`

enterprise watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_mrt_roll_forward_success
Datadog: aerospike.server.namespace.mrt_roll_forward_success

Description

Number of roll forward transactions that succeeded.

Introduced

8.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancens

`mrt_roll_forward_timeout`

enterprise watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_mrt_roll_forward_timeout
Datadog: aerospike.server.namespace.mrt_roll_forward_timeout

Description

Number of roll forward transactions that timed out.

Introduced

8.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancens

`mrt_verify_read_error`

enterprise watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_mrt_verify_read_error
Datadog: aerospike.server.namespace.mrt_verify_read_error

Description

Number of verify read commands that failed.

Introduced

8.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancens

`mrt_verify_read_success`

enterprise watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_mrt_verify_read_success
Datadog: aerospike.server.namespace.mrt_verify_read_success

Description

Number of verify read commands that succeeded

Introduced

8.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancens

`mrt_verify_read_timeout`

enterprise watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_mrt_verify_read_timeout
Datadog: aerospike.server.namespace.mrt_verify_read_timeout

Description

Number of verify read commands that timed out.

Introduced

8.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancens

`nodes_quiesced`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_nodes_quiesced
Datadog: aerospike.server.namespace.nodes_quiesced

Description

The number of nodes observed to be quiesced as of the most recent reclustering event. If a single node received the quiesce command, on the subsequent reclustering event, all nodes return 1 for this metric, and when the quiesced node is shutdown, triggering a new reclustering event, this metric returns to 0.

Introduced

4.4.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`non_expirable_objects`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_non_expirable_objects
Datadog: aerospike.server.namespace.non_expirable_objects

Description

Number of records in this namespace with non-expirable TTLs (TTLs of value 0).

Introduced

3.9.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`non_replica_objects`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_non_replica_objects
Datadog: aerospike.server.namespace.non_replica_objects

Description

Number of records on this node which are neither master nor replicas. This number is non-zero during migration, representing additional versions or copies of records. Those are records beyond the replication factor line and would be potentially used during migrations to duplicate resolve. This is not true for quiesced nodes, which retain their partitions after migrations have completed. This immediately reflects when a partition is dropped, but the total number of objects (objects stat) is updated only when the partition is purged.

Introduced

3.13.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`non_replica_tombstones`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_non_replica_tombstones
Datadog: aerospike.server.namespace.non_replica_tombstones

Description

Number of tombstones on this node which are neither master nor replicas. This number is non-zero only during migration. This is not true for quiesced nodes, which retain their partitions after migrations have completed.

Introduced

3.13.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`nsup_cycle_deleted_pct`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_nsup_cycle_deleted_pct
Datadog: aerospike.server.namespace.nsup_cycle_deleted_pct

Description

Percent of records removed by NSUP in its last cycle.

Introduced

6.3.0

Removed

Measurement type

gauge

Data type

float

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

nsup_cycle_deleted_pct is calculated when the NSUP (Namespace SUPervisor) cycle finishes (nsup-done is logged). It is calculated based on the total objects present at the beginning of the NSUP cycle and the number of objects that got deleted in that cycle (nsup_cycle_deleted_pct = (objects removed by NSUP in its last cycle * 100) / number of total objects when the NSUP cycle started [expirable + non expirable]).

Note

Some use cases for this metric could be :

If a namespace is expected to not have more than 30% of objects deleted by NSUP, alert if this metric is greater than that number (for example, due to an application bug suddenly setting the wrong TTL on some of the records)
If NSUP takes much longer than expected (like 1 hour instead of 10 mins), then in the next cycle the number of deleted objects may increase by a large number (like more than 60% got eligible compared to the expected usual value of 15% for example), then it would indicate a lag in the NSUP cycle. Although this increase in number could be a genuine increase as well if those many objects are getting eligible for deletion in that cycle)
You can leverage the metric nsup_cycle_duration to alert for unexpected NSUP cycle times, although it is also post facto, but at least it alerts that the last cycle took longer than expected)

`nsup_cycle_duration`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_nsup_cycle_duration
Datadog: aerospike.server.namespace.nsup_cycle_duration

Description

Length of the last NSUP cycle in seconds.

Introduced

3.9.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`nsup_xdr_key_busy`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_nsup_xdr_key_busy
Datadog: aerospike.server.namespace.nsup_xdr_key_busy

Description

Number of NSUP deletes (expirations and evictions) that had to wait for a previous version to ship. This error is raised if either of the following occurs:

ship-versions-policy is all and the most recent update to the record has not yet successfully shipped to the destination.
ship-versions-policy is interval and XDR hasn’t successfully shipped at least one version of the record in the most recent ship-versions-interval in seconds.

Introduced

7.2.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`objects`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_objects
Datadog: aerospike.server.sets.objects

Description

Number of records in this namespace for this node. Includes non-replica. Does not include tombstones.

Introduced

Deprecated

8.1.0

Removed

Measurement type

gauge

Data type

integer

Monitoring

Trending objects provides operations insight into this namespace’s record fluctuations over time.

Labels

cluster_namejobserviceinstancelongitudelatitudens

`ops_sub_tsvc_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_ops_sub_tsvc_error
Datadog: aerospike.server.namespace.ops_sub_tsvc_error

Description

Number of times a background query operate command failed to access a record. For example, due to protocol or permission errors. Does not include timeouts. In strong-consistency enabled namespaces, this includes attempts to access records in unavailable_partitions and dead_partitions.

The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.

Introduced

4.7.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`ops_sub_tsvc_timeout`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_ops_sub_tsvc_timeout
Datadog: aerospike.server.namespace.ops_sub_tsvc_timeout

Description

Number of records accessed by a background query operate command that timed out in the transaction service.

The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.

Introduced

4.7.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`ops_sub_write_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_ops_sub_write_error
Datadog: aerospike.server.namespace.ops_sub_write_error

Description

Number of records accessed by a background query operate command write subtransactions that failed with an error. Does not include timeouts.

Introduced

4.7.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`ops_sub_write_filtered_out`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_ops_sub_write_filtered_out
Datadog: aerospike.server.namespace.ops_sub_write_filtered_out

Description

Number of records accessed by a background query operate command write subtransactions for which the write did not happen because the record was filtered out with Filter Expressions.

Introduced

4.7.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`ops_sub_write_success`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_ops_sub_write_success
Datadog: aerospike.server.namespace.ops_sub_write_success

Description

Number of successful records accessed by a background query operate command write subtransactions.

Introduced

4.7.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`ops_sub_write_timeout`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_ops_sub_write_timeout
Datadog: aerospike.server.namespace.ops_sub_write_timeout

Description

Number of records accessed by a background query operate command write subtransactions that timed out.

Introduced

4.7.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`pending_quiesce`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_pending_quiesce
Datadog: aerospike.server.namespace.pending_quiesce

Description

Reports ‘true’ when the quiesce info command has been received by a node, or if stay-quiesced is true for the node. When true, the next clustering event will cause this node to quiesce. To trigger a clustering event, issue the recluster info command. To disable, issue the quiesce-undo info command.

Introduced

4.3.1

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`pi_query_aggr_abort`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_pi_query_aggr_abort
Datadog: aerospike.server.namespace.pi_query_aggr_abort

Description

Number of primary index query aggregations that were aborted.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`pi_query_aggr_complete`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_pi_query_aggr_complete
Datadog: aerospike.server.namespace.pi_query_aggr_complete

Description

Number of primary index query aggregations that completed.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`pi_query_aggr_error`

warn

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_pi_query_aggr_error
Datadog: aerospike.server.namespace.pi_query_aggr_error

Description

Number of primary index query aggregations that failed.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Monitoring

Compare pi_query_aggr_error to pi_query_aggr_complete.

If ratio is higher than acceptable, alert operations to investigate.

Labels

cluster_namejobserviceinstancelongitudelatitudens

`pi_query_long_basic_abort`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_pi_query_long_basic_abort
Datadog: aerospike.server.namespace.pi_query_long_basic_abort

Description

Number of basic long primary index queries that were aborted.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`pi_query_long_basic_complete`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_pi_query_long_basic_complete
Datadog: aerospike.server.namespace.pi_query_long_basic_complete

Description

Number of basic long primary index queries that completed.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`pi_query_long_basic_error`

warn

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_pi_query_long_basic_error
Datadog: aerospike.server.namespace.pi_query_long_basic_error

Description

Number of basic long primary index queries that failed.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Monitoring

Compare pi_query_long_basic_error to pi_query_long_basic_complete.

If ratio is higher than acceptable, alert operations to investigate.

Labels

cluster_namejobserviceinstancelongitudelatitudens

`pi_query_ops_bg_abort`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_pi_query_ops_bg_abort
Datadog: aerospike.server.namespace.pi_query_ops_bg_abort

Description

Number of ops background primary index queries that were aborted.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`pi_query_ops_bg_complete`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_pi_query_ops_bg_complete
Datadog: aerospike.server.namespace.pi_query_ops_bg_complete

Description

Number of ops background primary index queries that completed.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`pi_query_ops_bg_error`

warn

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_pi_query_ops_bg_error
Datadog: aerospike.server.namespace.pi_query_ops_bg_error

Description

Number of ops background primary index queries that failed.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Monitoring

Compare pi_query_ops_bg_error to pi_query_ops_bg_complete and If ratio is higher than acceptable, alert operations to investigate.

Labels

cluster_namejobserviceinstancelongitudelatitudens

`pi_query_short_basic_complete`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_pi_query_short_basic_complete
Datadog: aerospike.server.namespace.pi_query_short_basic_complete

Description

Number of basic short primary index queries that completed.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`pi_query_short_basic_error`

warn

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_pi_query_short_basic_error
Datadog: aerospike.server.namespace.pi_query_short_basic_error

Description

Number of basic short primary index queries that failed.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Monitoring

Compare pi_query_short_basic_error to pi_query_short_basic_complete.

If ratio is higher than acceptable, alert operations to investigate.

Labels

cluster_namejobserviceinstancelongitudelatitudens

`pi_query_short_basic_timeout`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_pi_query_short_basic_timeout
Datadog: aerospike.server.namespace.pi_query_short_basic_timeout

Description

Short primary index queries are not monitored, so they cannot be aborted. They might time out, which is reflected in this statistic.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`pi_query_udf_bg_abort`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_pi_query_udf_bg_abort
Datadog: aerospike.server.namespace.pi_query_udf_bg_abort

Description

Number of UDF background primary index queries that were aborted.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`pi_query_udf_bg_complete`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_pi_query_udf_bg_complete
Datadog: aerospike.server.namespace.pi_query_udf_bg_complete

Description

Number of UDF background primary index queries that completed.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`pi_query_udf_bg_error`

warn

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_pi_query_udf_bg_error
Datadog: aerospike.server.namespace.pi_query_udf_bg_error

Description

Number of UDF background queries that failed.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Monitoring

Compare pi_query_udf_bg_error to pi_query_udf_bg_complete.

If ratio is higher than acceptable, alert operations to investigate.

Labels

cluster_namejobserviceinstancelongitudelatitudens

`pmem_available_pct`

critical

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_pmem_available_pct

Description

Measures the minimum contiguous pmem storage file space across all such files in a namespace. The namespace will be read only (stop writes) if this value falls below min-avail-pct. It is important for all configured pmem storage files in a namespace to have the same size, otherwise, the pmem_available_pct could be low even when a lot of space is available across other files.

Introduced

4.8.0

Removed

7.0.0

Measurement type

gauge

Data type

integer

Monitoring

If pmem_available_pct drops below 20%, warn your operations group.

This condition might indicate that defrag is unable to keep up with the current load.

If pmem_available_pct drops below 15%, critical ALERT.

If pmem_available_pct drops below 5%, usable PMem resources are critically low. This condition might result in stop_writes.

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

Not to be confused with pmem_free_pct which represents the amount of free space across all PMem storage files in a namespace and does not take account of the fragmentation.
Here is an example to represent the difference between pmem_free_pct and pmem_available_pct. Assume 5 files of 96MiB each for a given namespace, where each file has 24MiB of data that are spread across 6 write-blocks (with the 8MiB write-block-size):
- The pmem_free_pct would be 75%. - The pmem_available_pct would be 50%. - If the distribution is not uniform (it usually is not perfectly uniform) the pmem_available_pct would represent the file that has the least free blocks.

`pmem_compression_ratio`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_pmem_compression_ratio

Description

Measures the average compressed size to uncompressed size ratio for PMem storage. 1.000 indicates no compression and 0.100 indicates a 1:10 compression ratio (90% reduction in size). pmem_compression_ratio is not included if the compression configuration parameter is set to none.

Introduced

4.8.0

Removed

7.0.0

Measurement type

moving average

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

The compression ratio is a moving average, calculated based on the most recently written records. Read records do not factor into the ratio. If the written data changes over time then the compression ratio will change with it. In case of a sudden change in data, the indicated compression ratio may lag behind a bit. As a rule of thumb, assume that the compression ratio covers the most recently written 100,000 to 1,000,000 records.

`pmem_free_pct`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_pmem_free_pct

Description

Percentage of pmem storage capacity free for this namespace. This is the amount of free storage across all pmem storage files in the namespace. Evictions will be triggered when the used percentage across all storage files (which is represented by 100 - pmem_free_pct) crosses the configured high-water-disk-pct.

Introduced

4.8.0

Removed

7.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

Not to be confused with pmem_available_pct which represents the amount of free contiguous space on the PMem storage file that has the least contiguous free space across the namespace.
Here is an example to represent the difference between pmem_free_pct and pmem_available_pct. Assume 5 files of 96MiB each for a given namespace, where each file has 24MiB of data that are spread across 6 write-blocks (with the 8MiB write-block size):
- The pmem_free_pct would be 75%. - The pmem_available_pct would be 50%. - If the distribution is not uniform (it usually is not perfectly uniform) the pmem_available_pct would represent the file that has the least free blocks.

`pmem_total_bytes`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_pmem_total_bytes

Description

Total bytes of pmem storage file space allocated to this namespace on this node.

Introduced

4.8.0

Removed

7.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

`pmem_used_bytes`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_pmem_used_bytes

Description

Total bytes of pmem storage file space used by this namespace on this node.

Introduced

4.8.0

Removed

7.0.0

Measurement type

gauge

Monitoring

Trending pmem_used_bytes provides operations insight into how pmem storage usage changes over time for this namespace.

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

`prole_objects`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_prole_objects
Datadog: aerospike.server.namespace.prole_objects

Description

Number of records on this node which are proles (replicas). Does not include tombstones.

Introduced

3.9.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`prole_tombstones`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_prole_tombstones
Datadog: aerospike.server.namespace.prole_tombstones

Description

Number of tombstones on this node which are proles (replicas) on this node.

Introduced

3.10.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_agg`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_agg

Description

Number of query aggregations attempted. Removed in Database 5.7.0. Use query_aggr_complete + query_aggr_error + query_aggr_abort instead.

Introduced

3.9.0

Removed

5.7.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_agg_abort`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_agg_abort

Description

Number of query aggregations aborted by the user seen by this node. Renamed to query_aggr_abort in Database 5.7.0.

Introduced

3.9.0

Removed

5.7.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_agg_avg_rec_count`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_agg_avg_rec_count

Description

Average number of records returned by the aggregations underlying query. Renamed to query_aggr_avg_rec_count in Database 5.7.0.

Introduced

3.9.0

Removed

5.7.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_agg_error`

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_agg_error

Description

Number of query aggregations errors due to an internal error. Renamed to query_aggr_error in Database 5.7.0.

Introduced

3.9.0

Removed

5.7.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_agg_success`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_agg_success

Description

Number of query aggregations completed. Renamed to query_aggr_complete in Database 5.7.

Introduced

3.9.0

Removed

5.7.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_aggr_abort`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_aggr_abort

Description

Number of query aggregations aborted by the user seen by this node. Removed in Database 6.0.0, use si_query_aggr_abort.

Introduced

5.7.0

Removed

6.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_aggr_avg_rec_count`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_aggr_avg_rec_count

Description

Average number of records returned by the aggregations underlying query.

Introduced

5.7.0

Removed

6.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_aggr_complete`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_aggr_complete

Description

Number of query aggregations completed. Removed in Database 6.0.0, use si_query_aggr_complete.

Introduced

5.7.0

Removed

6.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_aggr_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_aggr_error

Description

Number of query aggregation errors due to an internal error. Removed in Database 6.0.0, use si_query_aggr_error.

Introduced

5.7.0

Removed

6.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_basic_abort`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_basic_abort

Description

Number of secondary index basic queries that were aborted by a user. Removed in Database 6.0.0, use si_query_long_basic_abort.

Introduced

5.7.0

Removed

6.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_basic_avg_rec_count`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_basic_avg_rec_count

Description

Average number of records returned by all secondary index basic queries.

Introduced

5.7.0

Removed

6.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_basic_complete`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_basic_complete

Description

Number of secondary index basic queries which completed successfully.

Introduced

5.7.0

Removed

6.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_basic_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_basic_error

Description

Number of secondary index basic queries that returned an error. Removed in Database 6.0.0, use si_query_long_basic_error.

Introduced

5.7.0

Removed

6.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_fail`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_fail

Description

Number of queries which failed due to an internal error. Those are failures not part of query lookup (see query_lookup_error), query aggregation (see query_agg_error) or query background UDF (see query_udf_bg_failure).

Introduced

3.9.0

Removed

6.0.0

Measurement type

counter

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_false_positives`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_false_positives

Description

Number of entries that were shortlisted from the secondary index but the bin values are not matching the query clause. This might happen when the bin value changes during query execution.

Introduced

5.7.0

Removed

6.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_long_queue_full`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_long_queue_full

Description

Number of long running queries queue full errors.

Introduced

3.9.0

Removed

6.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_long_reqs`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_long_reqs

Description

Number of long running queries ever attempted in the system (query selected record more than query_threshold).

Introduced

3.9.0

Removed

6.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_lookup_abort`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_lookup_abort

Description

Number of user aborted secondary index queries. Renamed to query_basic_abort in Database 5.7.0.

Introduced

3.9.0

Removed

5.7.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_lookup_avg_rec_count`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_lookup_avg_rec_count

Description

Average number of records returned by all secondary index query look-ups. Renamed to query_basic_avg_rec_count in Database 5.7.0.

Introduced

3.9.0

Removed

5.7.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_lookup_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_lookup_error

Description

Number of secondary index query look-up errors. Renamed to query_basic_error in Database 5.7.0.

Introduced

3.9.0

Removed

5.7.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_lookup_success`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_lookup_success

Description

Number of secondary index look-ups which succeeded. Renamed to query_basic_complete in Database 5.7.

Introduced

3.9.0

Removed

5.7.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_lookups`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_lookups

Description

Number of secondary index lookups attempted. Removed in Database 5.7. Use query_basic_complete + query_basic_error + query_basic_abort instead.

Introduced

3.9.0

Removed

5.7.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_ops_bg_abort`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_ops_bg_abort

Description

Number of ops background queries that were aborted. Removed in Database 6.0.0, use si_query_ops_bg_abort.

Introduced

5.7.0

Removed

6.0:

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_ops_bg_complete`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_ops_bg_complete

Description

Number of ops background queries that completed. Removed in Database 6.0.0, use si_query_ops_bg_complete.

Introduced

5.7.0

Removed

6.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_ops_bg_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_ops_bg_error

Description

Number of ops background queries that returned error. Removed in Database 6.0.0, use si_query_ops_bg_error.

Introduced

5.7.0

Removed

6.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_ops_bg_failure`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_ops_bg_failure

Description

Number of ops background queries that failed. Removed from Database 5.7 and later, use query_ops_bg_error + query_ops_bg_abort instead.

Introduced

4.7.0

Removed

5.7.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_ops_bg_success`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_ops_bg_success

Description

Number of ops background queries that completed. Renamed to query_ops_bg_complete in Database 5.7.0.

Introduced

4.7.0

Removed

5.7.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_proto_compression_ratio`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_proto_compression_ratio
Datadog: aerospike.server.namespace.query_proto_compression_ratio

Description

Measures the average compressed size to uncompressed size ratio for protocol message data in query responses to the client. Thus 1.000 indicates no compression and 0.100 indicates a 1:10 compression ratio (90% reduction in size).

Introduced

4.8.0

Removed

Measurement type

moving average

Data type

decimal

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

The compression ratio is a moving average. It is calculated based on the most recent client responses. If the response message data changes over time then the compression ratio will change with it. In case of a sudden change in response data, the indicated compression ratio may lag behind a bit. As a rule of thumb, assume that the compression ratio covers the most recent 100,000 to 1,000,000 client responses.

`query_proto_uncompressed_pct`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_proto_uncompressed_pct
Datadog: aerospike.server.namespace.query_proto_uncompressed_pct

Description

Measures the percentage of query responses to the client with uncompressed protocol message data. Thus 0.000 indicates all responses with compressed data, and 100.000 indicates no responses with compressed data. For example, if protocol message data compression is not used, this metric will remain set to 0.000. If protocol message data compression is then turned on and all responses are compressed, this metric will remain set to 0.000. The only way this metric will ever be set to a value different than 0.000 is if compression is used, but some responses are not compressed (which happens when the uncompressed size is so small that the server does not try to compress, or when the compression fails).

Introduced

4.8.0

Removed

Measurement type

gauge

Data type

instantaneous

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

The percentage is a moving average. It is calculated based on the most recent client responses. If the response message data changes over time then the percentage will change with it. In case of a sudden change in response data, the indicated percentage may lag behind a bit. As a rule of thumb, assume that the percentage covers the most recent 100,000 to 1,000,000 client responses.

`query_reqs`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_reqs

Description

Number of query requests ever attempted on this node. Even very early failures would be counted here, as opposed to query_short_running and query_long_running which would increment a bit later.

Introduced

3.9.0

Removed

6.0.0

Measurement type

counter

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_short_queue_full`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_short_queue_full

Description

Number of short running queries queue full errors.

Introduced

3.9.0

Removed

6.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_short_reqs`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_short_reqs

Description

Number of short running queries ever attempted in the system (query selected record less than query_threshold).

Introduced

3.9.0

Removed

6.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_udf_bg_abort`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_udf_bg_abort

Description

Number of UDF background queries that were aborted. Removed in Database 6.0.0, use si_query_udf_bg_abort.

Introduced

5.7.0

Removed

6.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_udf_bg_complete`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_udf_bg_complete

Description

Number of UDF background queries that completed. Removed in Database 6.0.0, use si_query_udf_bg_complete.

Introduced

5.7.0

Removed

6.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_udf_bg_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_udf_bg_error

Description

Number of UDF background queries which returned error. Removed in Database 6.0.0, use si_query_udf_bg_error.

Introduced

5.7.0

Removed

6.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_udf_bg_failure`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_udf_bg_failure

Description

Number of UDF background queries that failed. Removed from Database 5.7 and later, use query_udf_bg_error + query_udf_bg_abort instead.

Introduced

3.9.0

Removed

5.7.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_udf_bg_success`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_query_udf_bg_success

Description

Number of UDF background queries that completed. Renamed to query_udf_bg_complete in Database 5.7.0.

Introduced

3.9.0

Removed

5.7.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`re_repl_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_re_repl_error
Datadog: aerospike.server.namespace.re_repl_error

Description

Number of re-replication errors which were not timeout. Re-replications would happen for namespaces operating under the strong-consistency mode when a record does not successfully replicate on the initial attempt.

Introduced

4.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`re_repl_success`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_re_repl_success
Datadog: aerospike.server.namespace.re_repl_success

Description

Number of successful re-replications. Re-replications would happen for namespaces operating under the strong-consistency mode when a record does not successfully replicate on the initial attempt.

Introduced

4.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`re_repl_timeout`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_re_repl_timeout
Datadog: aerospike.server.namespace.re_repl_timeout

Description

Number of re-replications that ended in timeout. Re-replications would happen for namespaces operating under the strong-consistency mode when a record does not successfully replicate on the initial attempt. Starting with Database 6.3.0 this stat only counts timeouts that happened during the actual re-replication.

Introduced

4.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

The transaction-ttl of a re-replication is 1 second by default (configurable through the transaction-max-ms configuration parameter.

`re_repl_tsvc_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_re_repl_tsvc_error
Datadog: aerospike.server.namespace.re_repl_tsvc_error

Description

Number of re-replication errors happening in the transaction queue which were not re_repl_tsvc_timeout (before the re-replication attempt). Re-replications occur for namespaces operating under strong-consistency mode when a record does not successfully replicate on the initial attempt.

The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.

Introduced

6.3.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`re_repl_tsvc_timeout`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_re_repl_tsvc_timeout
Datadog: aerospike.server.namespace.re_repl_tsvc_timeout

Description

Number of re-replications that time out early in the internal transaction queue, while waiting to be picked up by a service thread. Re-replications occur for namespaces operating under strong-consistency mode when a record does not successfully replicate on the initial attempt.

The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.

Introduced

6.3.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`record_proto_compression_ratio`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_record_proto_compression_ratio
Datadog: aerospike.server.namespace.record_proto_compression_ratio

Description

Measures the average compressed size to uncompressed size ratio for protocol message data in single-record transaction client responses. Thus 1.000 indicates no compression and 0.100 indicates a 1:10 compression ratio (90% reduction in size).

Introduced

4.8.0

Removed

Measurement type

gauge

Data type

decimal

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

`record_proto_uncompressed_pct`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_record_proto_uncompressed_pct
Datadog: aerospike.server.namespace.record_proto_uncompressed_pct

Description

Measures the percentage of single-record transaction client responses with uncompressed protocol message data. Thus 0.000 indicates all responses with compressed data, and 100.000 indicates no responses with compressed data. For example, if protocol message data compression is not used, this metric will remain set to 0.000. If protocol message data compression is then turned on and all responses are compressed, this metric will remain set to 0.000. The only way this metric will ever be set to a value different than 0.000 is if compression is used, but some responses are not compressed (which happens when the uncompressed size is so small that the server does not try to compress, or when the compression fails).

Introduced

4.8.0

Removed

Measurement type

moving average

Data type

decimal

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

`retransmit_all_batch_sub_delete_dup_res`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_retransmit_all_batch_sub_delete_dup_res
Datadog: aerospike.server.namespace.retransmit_all_batch_sub_delete_dup_res

Description

Number of retransmits that occurred during batch delete subtransactions that were being duplicate-resolved. Includes retransmits originating on the client as well as proxying nodes.

Introduced

6.1.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

Retransmission statistics are collected in the retransmits ticker log line.

`retransmit_all_batch_sub_delete_repl_write`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_retransmit_all_batch_sub_delete_repl_write
Datadog: aerospike.server.namespace.retransmit_all_batch_sub_delete_repl_write

Description

Number of retransmits that occurred during batch delete subtransactions that were being replica-written. Includes retransmits originating on the client as well as proxying nodes.

Introduced

6.1.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

:Retransmission statistics are collected in the retransmits ticker log line.

`retransmit_all_batch_sub_dup_res`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_retransmit_all_batch_sub_dup_res
Datadog: aerospike.server.namespace.retransmit_all_batch_sub_dup_res

Description

Obsolete as of Database 6.0.0. In case of a failure to replicate a write transaction across all replicas, the record will be left in the ‘un-replicated’ state, forcing a ‘re-replication’ transaction prior to any subsequent read or write transaction on the record.

Number of retransmits that occurred during batch subtransactions that were being duplicate-resolved. Includes retransmits originating on the client as well as proxying nodes.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

Starting with Database 6.0.0 when batch-writes were introduced, “repl-write retransmits” for batch writes are counted as “dup-res retransmits” which are included in the metric retransmit_all_batch_sub_dup_res.

`retransmit_all_batch_sub_read_dup_res`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_retransmit_all_batch_sub_read_dup_res
Datadog: aerospike.server.namespace.retransmit_all_batch_sub_read_dup_res

Description

Number of retransmits that occurred during batch read subtransactions that were being duplicate-resolved. Includes retransmits originating on the client as well as proxying nodes.

Introduced

6.1.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

Retransmission statistics are collected in the retransmits ticker log line.

`retransmit_all_batch_sub_read_repl_ping`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_retransmit_all_batch_sub_read_repl_ping
Datadog: aerospike.server.namespace.retransmit_all_batch_sub_read_repl_ping

Description

Number of retransmits that occurred during SC linearized read subtransactions within batched commands. Includes retransmits originating on the client as well as proxying nodes.

Introduced

4.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

Retransmission statistics are collected in the retransmits ticker log line.

`retransmit_all_batch_sub_udf_dup_res`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_retransmit_all_batch_sub_udf_dup_res
Datadog: aerospike.server.namespace.retransmit_all_batch_sub_udf_dup_res

Description

Number of retransmits that occurred during batch UDF subtransactions that were being duplicate-resolved. Includes retransmits originating on the client as well as proxying nodes.

Introduced

6.1.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

Retransmission statistics are collected in the retransmits ticker log line.

`retransmit_all_batch_sub_udf_repl_write`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_retransmit_all_batch_sub_udf_repl_write
Datadog: aerospike.server.namespace.retransmit_all_batch_sub_udf_repl_write

Description

Number of retransmits that occurred during batch UDF subtransactions that were being replica-written. Includes retransmits originating on the client as well as proxying nodes.

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

Retransmission statistics are collected in the retransmits ticker log line.

`retransmit_all_batch_sub_write_dup_res`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_retransmit_all_batch_sub_write_dup_res
Datadog: aerospike.server.namespace.retransmit_all_batch_sub_write_dup_res

Description

Number of retransmits that occurred during batch write subtransactions that were being duplicate-resolved. Includes retransmits originating on the client as well as proxying nodes.

Introduced

6.1.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

Retransmission statistics are collected in the retransmits ticker log line.

`retransmit_all_batch_sub_write_repl_write`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_retransmit_all_batch_sub_write_repl_write
Datadog: aerospike.server.namespace.retransmit_all_batch_sub_write_repl_write

Description

Number of retransmits that occurred during batch write (insert/update/upsert/replace) subtransactions that were being replica-written. Includes retransmits originating on the client as well as proxying nodes.

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

Retransmission statistics are collected in the retransmits ticker log line.

`retransmit_all_delete_dup_res`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_retransmit_all_delete_dup_res
Datadog: aerospike.server.namespace.retransmit_all_delete_dup_res

Description

Number of retransmits that occurred during delete transactions that were being duplicate-resolved. Includes retransmits originating on the client as well as proxying nodes.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

Retransmission statistics are collected in the retransmits ticker log line.

`retransmit_all_delete_repl_write`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_retransmit_all_delete_repl_write
Datadog: aerospike.server.namespace.retransmit_all_delete_repl_write

Description

Number of retransmits that occurred during delete transactions that were being replica written. Includes retransmits originating on the client as well as proxying nodes.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

Retransmission statistics are collected in the retransmits ticker log line.

`retransmit_all_read_dup_res`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_retransmit_all_read_dup_res
Datadog: aerospike.server.namespace.retransmit_all_read_dup_res

Description

Number of retransmits that occurred during read commands that were being duplicate-resolved. Includes retransmits originating on the client as well as proxying nodes.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

Retransmission statistics are collected in the retransmits ticker log line.

`retransmit_all_read_repl_ping`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_retransmit_all_read_repl_ping
Datadog: aerospike.server.namespace.retransmit_all_read_repl_ping

Description

Number of retransmits that occurred during SC linearized reads. Includes retransmits originating on the client as well as proxying nodes.

Introduced

4.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

Retransmission statistics are collected in the retransmits ticker log line.

`retransmit_all_udf_dup_res`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_retransmit_all_udf_dup_res
Datadog: aerospike.server.namespace.retransmit_all_udf_dup_res

Description

Number of retransmits that occurred during client initiated UDF transactions that were being duplicate-resolved. Includes retransmits originating on the client as well as proxying nodes.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

Retransmission statistics are collected in the retransmits ticker log line.

`retransmit_all_udf_repl_write`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_retransmit_all_udf_repl_write
Datadog: aerospike.server.namespace.retransmit_all_udf_repl_write

Description

Number of retransmits that occurred during client initiated UDF transactions that were being replica written. Includes retransmits originating on the client as well as proxying nodes.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

Retransmission statistics are collected in the retransmits ticker log line.

`retransmit_all_write_dup_res`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_retransmit_all_write_dup_res
Datadog: aerospike.server.namespace.retransmit_all_write_dup_res

Description

Number of retransmits that occurred during write transactions that were being duplicate-resolved. Includes retransmits originating on the client as well as proxying nodes.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

Retransmission statistics are collected in the retransmits ticker log line.

`retransmit_all_write_repl_write`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_retransmit_all_write_repl_write
Datadog: aerospike.server.namespace.retransmit_all_write_repl_write

Description

Number of retransmits that occurred during write transactions that were being replica written. Includes retransmits originating on the client as well as proxying nodes.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

Retransmission statistics are collected in the retransmits ticker log line.

`retransmit_nsup_repl_write`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_retransmit_nsup_repl_write
Datadog: aerospike.server.namespace.retransmit_nsup_repl_write

Description

Number of retransmits that occurred during NSUP initiated delete transactions that were being replica written.

Introduced

3.10.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

Retransmission statistics are collected in the retransmits ticker log line.

`retransmit_ops_sub_dup_res`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_retransmit_ops_sub_dup_res
Datadog: aerospike.server.namespace.retransmit_ops_sub_dup_res

Description

Number of retransmits that occurred during write subtransactions of background ops scan/query jobs that were being duplicate-resolved.

Introduced

4.7.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

Retransmission statistics are collected in the retransmits ticker log line.

`retransmit_ops_sub_repl_write`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_retransmit_ops_sub_repl_write
Datadog: aerospike.server.namespace.retransmit_ops_sub_repl_write

Description

Number of retransmits that occurred during write subtransactions of background ops scan/query jobs that were being replica written.

Introduced

4.7.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

Retransmission statistics are collected in the retransmits ticker log line.

`retransmit_udf_sub_dup_res`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_retransmit_udf_sub_dup_res
Datadog: aerospike.server.namespace.retransmit_udf_sub_dup_res

Description

Number of retransmits that occurred during UDF subtransactions of scan/query background UDF jobs that were being duplicate-resolved.

Introduced

3.10.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

Retransmission statistics are collected in the retransmits ticker log line.

`retransmit_udf_sub_repl_write`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_retransmit_udf_sub_repl_write
Datadog: aerospike.server.namespace.retransmit_udf_sub_repl_write

Description

Number of retransmits that occurred during UDF subtransactions of scan/query background UDF jobs that were being replica written.

Introduced

3.10.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

Retransmission statistics are collected in the retransmits ticker log line.

`scan_aggr_abort`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_scan_aggr_abort

Description

Number of scan aggregations that were aborted. Removed in Database 6.0, use pi_query_aggr_abort.

Introduced

3.9.0

Removed

6.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`scan_aggr_complete`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_scan_aggr_complete

Description

Number of scan aggregations that completed. Removed in Database 6.0, use pi_query_aggr_complete.

Introduced

3.9.0

Removed

6.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`scan_aggr_error`

warn

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_scan_aggr_error

Description

Number of scan aggregations that failed.

Introduced

3.9.0

Removed

6.0.0

Measurement type

counter

Data type

integer

Monitoring

Compare scan_aggr_error to scan_aggr_complete.

If ratio is higher than acceptable, alert operations to investigate. Removed in Database 6.0.0, use pi_query_aggr_error.

Labels

cluster_namejobserviceinstancelongitudelatitudens

`scan_basic_abort`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_scan_basic_abort

Description

Number of basic scans that were aborted. Removed in Database 6.0.0, use pi_query_long_basic_abort.

Introduced

3.9.0

Removed

6.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`scan_basic_complete`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_scan_basic_complete

Description

Number of basic scans that completed. Removed in Database 6.0.0, use pi_query_long_basic_complete.

Introduced

3.9.0

Removed

6.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`scan_basic_error`

warn

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_scan_basic_error

Description

Number of basic scans that failed.

Introduced

3.9.0

Removed

6.0.0

Measurement type

counter

Data type

integer

Monitoring

Compare scan_basic_error to scan_basic_complete.

If ratio is higher than acceptable, alert operations to investigate. Removed in Database 6.0.0, use pi_query_long_basic_error.

Labels

cluster_namejobserviceinstancelongitudelatitudens

`scan_ops_bg_abort`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_scan_ops_bg_abort

Description

Number of ops background scans that were aborted. Removed in Database 6.0.0, use pi_query_ops_bg_abort.

Introduced

4.7.0

Removed

6.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`scan_ops_bg_complete`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_scan_ops_bg_complete

Description

Number of ops background scans that completed. Removed in Database 6.0, use pi_query_ops_bg_complete.

Introduced

4.7.0

Removed

6.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`scan_ops_bg_error`

warn

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_scan_ops_bg_error

Description

Number of ops background scans that failed.

Introduced

4.7.0

Removed

6.0.0

Measurement type

counter

Data type

integer

Monitoring

Compare scan_ops_bg_error to scan_ops_bg_complete and If ratio is higher than acceptable alert operations to investigate. Removed in Database 6.0.0, use pi_query_ops_bg_error.

Labels

cluster_namejobserviceinstancelongitudelatitudens

`scan_proto_compression_ratio`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_scan_proto_compression_ratio

Description

Measures the average compressed size to uncompressed size ratio for protocol message data in basic scan or aggregation scan client responses. Thus 1.000 indicates no compression and 0.100 indicates a 1:10 compression ratio (90% reduction in size).

Introduced

4.8.0

Removed

6.0.0

Measurement type

moving average

Data type

decimal

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

`scan_proto_uncompressed_pct`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_scan_proto_uncompressed_pct

Description

Measures the percentage of basic scan or aggregation scan client responses with uncompressed protocol message data. Thus 0.000 indicates all responses with compressed data, and 100.000 indicates no responses with compressed data. For example, if protocol message data compression is not used, this metric will remain set to 0.000. If protocol message data compression is then turned on and all responses are compressed, this metric will remain set to 0.000. The only way this metric will ever be set to a value different than 0.000 is if compression is used, but some responses are not compressed (which happens when the uncompressed size is so small that the server does not try to compress, or when the compression fails).

Introduced

4.8.0

Removed

6.0.0

Measurement type

gauge

Data type

decimal

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

`scan_udf_bg_abort`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_scan_udf_bg_abort

Description

Number of UDF background scans that were aborted. Removed in Database 6.0.0, use pi_query_udf_bg_abort.

Introduced

3.9.0

Removed

6.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`scan_udf_bg_complete`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_scan_udf_bg_complete

Description

Number of UDF background scans that completed. Removed in Database 6.0, use pi_query_udf_bg_complete.

Introduced

3.9.0

Removed

6.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`scan_udf_bg_error`

warn

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_scan_udf_bg_error

Description

Number of UDF background scans that failed.

Introduced

3.9.0

Removed

6.0.0

Measurement type

counter

Data type

integer

Monitoring

Compare scan_udf_bg_error to scan_udf_bg_complete.

If ratio is higher than acceptable, alert operations to investigate. Removed in Database 6.0.0, use pi_query_udf_bg_error.

Labels

cluster_namejobserviceinstancelongitudelatitudens

`set-evicted-objects`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_set-evicted-objects

Description

Number of records evicted by a set.

Introduced

Removed

Yes

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`set_index_used_bytes`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_set_index_used_bytes
Datadog: aerospike.server.namespace.set_index_used_bytes

Description

Amount of memory occupied by set indexes for this namespace on this node. See Finding total namespace memory for the total memory accounted for the namespace.

Introduced

7.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`si_query_aggr_abort`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_si_query_aggr_abort
Datadog: aerospike.server.namespace.si_query_aggr_abort

Description

Number of secondary index query aggregations aborted by the user seen by this node.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`si_query_aggr_complete`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_si_query_aggr_complete
Datadog: aerospike.server.namespace.si_query_aggr_complete

Description

Number of secondary index query aggregations completed.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`si_query_aggr_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_si_query_aggr_error
Datadog: aerospike.server.namespace.si_query_aggr_error

Description

Number of secondary index query aggregation errors due to an internal error.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`si_query_long_basic_abort`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_si_query_long_basic_abort
Datadog: aerospike.server.namespace.si_query_long_basic_abort

Description

Number of basic long secondary index queries aborted for this namespace.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`si_query_long_basic_complete`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_si_query_long_basic_complete
Datadog: aerospike.server.namespace.si_query_long_basic_complete

Description

Number of basic long secondary index queries completed for this namespace.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`si_query_long_basic_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_sindex_si_query_long_basic_error
Datadog: aerospike.server.namespace.si_query_long_basic_error

Description

Number of basic long secondary index queries that returned error for this namespace.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`si_query_ops_bg_abort`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_si_query_ops_bg_abort
Datadog: aerospike.server.namespace.si_query_ops_bg_abort

Description

Number of ops background secondary index queries that were aborted.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`si_query_ops_bg_complete`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_si_query_ops_bg_complete
Datadog: aerospike.server.namespace.si_query_ops_bg_complete

Description

Number of ops background secondary index queries that completed.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`si_query_ops_bg_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_si_query_ops_bg_error
Datadog: aerospike.server.namespace.si_query_ops_bg_error

Description

Number of ops background secondary index queries that returned error.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`si_query_udf_bg_abort`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_si_query_udf_bg_abort
Datadog: aerospike.server.namespace.si_query_udf_bg_abort

Description

Number of UDF background secondary index queries that were aborted.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`si_query_udf_bg_complete`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_si_query_udf_bg_complete
Datadog: aerospike.server.namespace.si_query_udf_bg_complete

Description

Number of UDF background secondary index queries that completed.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`si_query_udf_bg_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_si_query_udf_bg_error
Datadog: aerospike.server.namespace.si_query_udf_bg_error

Description

Number of UDF background secondary index queries which returned error.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`sindex-type.mount[ix].age`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_sindex-type.mount[ix].age

Description

Applies only to Enterprise Edition configured to sindex-type flash. This shows the percentage of lifetime (total usage) claimed by OEM for underlying device. Value is -1 unless underlying device is NVMe and may exceed 100. ‘ix’ is the device index. For example, storage-engine.file[0]=/opt/aerospike/test0.dat and storage-engine.file[1]=/opt/aerospike/test2.dat for 2 files specified in the configuration.

Introduced

6.4.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`sindex_flash_used_bytes`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_sindex_flash_used_bytes

Description

Applies only to Enterprise Edition configured with sindex-type flash. Total bytes in-use on the mount(s) for the secondary indexes used by this namespace on this node. This is the same value memory_used_sindex_bytes would have if the secondary indexes were not persisted.

Introduced

6.4.0

Removed

7.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

`sindex_flash_used_pct`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_sindex_flash_used_pct

Description

Applies only to Enterprise Edition configured with sindex-type flash. Percentage of the mount(s) in-use for the secondary indexes used by this namespace on this node. Calculated as (sindex_pmem_used_bytes / sindex-type.mounts-size-limit) * 100

Introduced

6.4.0

Removed

7.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

`sindex_gc_cleaned`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_sindex_gc_cleaned
Datadog: aerospike.server.namespace.sindex_gc_cleaned

Description

Number of secondary index entries cleaned by sindex GC.

Introduced

5.7.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`sindex_mounts_used_pct`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_sindex_mounts_used_pct

Description

Applies only to Enterprise Edition configured with sindex-type pmem or flash. Percentage of the mount(s) in-use for the secondary indexes used by this namespace on this node. Calculated as (sindex_used_bytes / sindex-type.mounts-budget) * 100

Introduced

7.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

`sindex_pmem_used_bytes`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_sindex_pmem_used_bytes

Description

Applies only to Enterprise Edition configured with sindex-type pmem. Total bytes in-use on the mount(s) for the secondary indexes used by this namespace on this node. This is the same value memory_used_sindex_bytes would have if the secondary indexes were not persisted.

Introduced

6.3.0

Removed

7.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

`sindex_pmem_used_pct`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_sindex_pmem_used_pct

Description

Applies only to Enterprise Edition configured with sindex-type pmem. Percentage of the mount(s) in-use for the secondary indexes used by this namespace on this node. Calculated as (sindex_pmem_used_bytes / sindex-type.mounts-size-limit) * 100

Introduced

6.3.0

Removed

7.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

`sindex_used_bytes`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_sindex_used_bytes
Datadog: aerospike.server.namespace.sindex_used_bytes

Description

Total bytes in-use on the mount(s) for the secondary indexes used by this namespace on this node.

Introduced

7.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

`smd_evict_void_time`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_smd_evict_void_time
Datadog: aerospike.server.namespace.smd_evict_void_time

Description

The cluster-wide specified eviction depth, expressed as a void time in seconds since 1 January 2010 UTC. This is distributed to all nodes via SMD. This may be larger than evict_void_time — evict_void_time will eventually advance to this value.

Introduced

4.5.1

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`stop_writes`

critical

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_stop_writes
Datadog: aerospike.server.namespace.stop_writes

Description

If true, this namespace is currently not allowing client-originated writes. Migration writes and prole writes are still allowed. Error code 22 is returned if any one of the following are breached: Prior to Database 7.0.0:

Introduced

3.9.0

Removed

Measurement type

gauge

Data type

integer

Monitoring

If stop-writes is true, critical ALERT.

Until the cause is corrected, the system will reject all writes.

Labels

cluster_namejobserviceinstancelongitudelatitudens

`storage-engine.device[ix].age`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_storage_engine_device_age

Description

Shows percentage of lifetime (total usage) claimed by OEM for underlying storage-engine.device[ix] (may exceed 100). Value will be -1 unless underlying device is NVMe. It is a measure of how much of the drive’s projected lifetime according to the manufacturer has been used at any point in time. When the SSD is brand new, its value will report ‘0’ and when its projected lifetime has been reached, it shows ‘100’, reporting that 100% of the projected lifetime has been used. When the value gets over 100%, the SSD has reached the lifetime specified by the OEM.

Introduced

4.3.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`storage-engine.device[ix].defrag_partial_writes`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_storage_engine_device_defrag_partial_writes

Description

The number of wblocks partial flushed to storage-engine.device[ix] by defrag.

Introduced

7.1.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancens

`storage-engine.device[ix].defrag_q`

warn

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_storage_engine_device_defrag_q

Description

Number of wblocks queued to be defragged on storage-engine.device[ix].

Introduced

4.3.0

Removed

Measurement type

gauge

Data type

integer

Monitoring

Measured per-device or per-file depending on the storage configuration.

If storage-engine.device[ix].defrag_q or storage-engine.file[ix].defrag_q continues to increase over time, alert operations to investigate.

Labels

cluster_namejobserviceinstancelongitudelatitudens

`storage-engine.device[ix].defrag_reads`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_storage_engine_device_defrag_reads

Description

The number of wblocks that have been sent to the defrag_q from storage-engine.device[ix]. Blocks are selected for defragmentation when their usage falls below the configured defrag-lwm-pct.

Introduced

4.3.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`storage-engine.device[ix].defrag_writes`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_storage_engine_device_defrag_writes

Description

The number of wblocks defrag has written to storage-engine.device[ix].

Introduced

4.3.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`storage-engine.device[ix].free_wblocks`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_storage_engine_device_free_wblocks

Description

The number of wblocks (write blocks) free on storage-engine.device[ix].

Introduced

4.3.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`storage-engine.device[ix].partial_writes`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_storage_engine_device_partial_writes

Description

The number of wblocks partial flushed to storage-engine.device[ix].

Introduced

7.1.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancens

`storage-engine.device[ix].read_errors`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_storage_engine_device_read_errors

Description

Number of read errors encountered on storage-engine.device[ix].

Introduced

7.1.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancens

`storage-engine.device[ix].shadow_write_q`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_storage_engine_device_shadow_write_q

Description

The number of wblocks queued to be written to the shadow device of storage-engine.device[ix].

Introduced

4.3.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`storage-engine.device[ix].used_bytes`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_storage_engine_device_used_bytes

Description

The number of bytes used for data on storage-engine.device[ix].

Introduced

4.3.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`storage-engine.device[ix].write_q`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_storage_engine_device_write_q

Description

The number of wblocks queued to be written to storage-engine.device[ix]. Includes blocks written by the defragmentation sub-system.

Introduced

4.3.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`storage-engine.device[ix].writes`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_storage_engine_device_writes

Description

Number of wblocks written to storage-engine.device[ix] since Aerospike started. Does not include defragmentation writes.

Introduced

4.3.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`storage-engine.device[ix]`

optional

Context

namespace

Description

The raw device that is configured in device configuration in namespace context and storage-engine subcontext. ‘ix’ is the device index. The index value starts from 0. For example, storage-engine.device[0]=/dev/xvd1 and storage-engine.device[1]=/dev/xvc1 for two devices specified in the configuration.

Introduced

4.3.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancensstoragestorage-engine

`storage-engine.file[ix].age`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_storage_engine_file_age

Description

Shows the percentage of lifetime (total usage) claimed by OEM for the underlying device of storage-engine.file[ix]. Value will be -1 unless underlying device is NVMe and may exceed 100.

Introduced

4.3.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`storage-engine.file[ix].defrag_partial_writes`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_storage_engine_file_defrag_partial_writes

Description

The number of wblocks partial flushed to storage-engine.file[ix] by defrag.

Introduced

7.1.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancens

`storage-engine.file[ix].defrag_q`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_storage_engine_file_defrag_q

Description

The number of wblocks queued to be defragged on storage-engine.file[ix].

Introduced

4.3.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`storage-engine.file[ix].defrag_reads`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_storage_engine_file_defrag_reads

Description

Number of wblocks that have been sent to the defrag_q from storage-engine.file[ix].

Blocks are selected for defragmentation when their usage falls below the configured defrag-lwm-pct.

Introduced

4.3.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`storage-engine.file[ix].defrag_writes`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_storage_engine_file_defrag_writes

Description

The number of wblocks defrag has written to storage-engine.file[ix].

Introduced

4.3.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`storage-engine.file[ix].free_wblocks`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_storage_engine_file_free_wblocks

Description

The number of wblocks (write blocks) free on storage-engine.file[ix].

Introduced

4.3.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`storage-engine.file[ix].partial_writes`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_storage_engine_file_partial_writes

Description

The number of wblocks partial flushed to storage-engine.file[ix] by writes.

Introduced

7.1.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancens

`storage-engine.file[ix].shadow_write_q`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_storage_engine_file_shadow_write_q

Description

The number of wblocks queued to be written to the shadow file of storage-engine.file[ix].

Introduced

4.3.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`storage-engine.file[ix].used_bytes`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_storage_engine_file_used_bytes

Description

Number of bytes used for data on storage-engine.file[ix].

Introduced

4.3.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`storage-engine.file[ix].write_q`

warn

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_storage_engine_file_write_q

Description

Number of wblocks queued to be written to storage-engine.file[ix].

Introduced

4.3.0

Removed

Measurement type

gauge

Data type

integer

Monitoring

Measured per-device or per-file depending on the storage configuration.

If storage-engine.device[ix].write_q or storage-engine.file[ix].write_q is greater than 1, alert operations to investigate.

Labels

cluster_namejobserviceinstancelongitudelatitudens

`storage-engine.file[ix].writes`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_storage_engine_file_writes

Description

The number of wblocks written to storage-engine.file[ix] since Aerospike started. When running with commit-to-device set to true, this counter will only account for full blocks written and therefore will only count blocks written through the defragmentation process as client writes would write to disk individually rather than at a block level. Includes defragmentation writes.

Introduced

4.3.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`storage-engine.file[ix]`

optional

Context

namespace

Description

The data file path that is configured in file configuration in namespace context and storage-engine subcontext. ‘ix’ is the file index. The index value starts from 0. For example, storage-engine.file[0]=/opt/aerospike/test0.dat and storage-engine.file[1]=/opt/aerospike/test2.dat for two files specified in the configuration.

Introduced

4.3.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancensstoragestorage-engine

`storage-engine.stripe[ix].age`

enterprise optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_storage_engine_stripe_age

Description

Shows the percentage of lifetime (total usage) claimed by OEM for the respective storage-backed persistence device of storage-engine.stripe[ix]. The value will be -1 unless the underlying device is NVMe and may exceed 100, check storage-engine.device[ix].age. This statistic is not available in the log ticker and is only applicable if a storage-backed persistence exists.

Introduced

7.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancensstoragestorage-engine

Detail

More information about stripe allocation can be found on the “Configure Namespace Storage” page, under Setup for in-memory with storage-backed persistence and Setup for in-memory without storage-backed persistence.

`storage-engine.stripe[ix].backing_write_q`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_storage_engine_stripe_backing_write_q

Description

The number of wblocks queued to be written to the respective storage-backed persistence of storage-engine.stripe[ix]. This statistic is available in the log ticker as write-q, and is only applicable if a storage-backed persistence exists.

Introduced

7.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancensstoragenamespacestorage-engine

Detail

Log ticker example with storage-backed persistence:

INFO (drv-mem): (drv_mem.c:3158) {bar} stripe-0.0xad001000: used-bytes 146499360 free-wblocks 492 write (18,0.2) defrag-q 0 defrag-read (1,0.0) defrag-write (0,0.0) write-q 0

Log ticker example without storage-backed persistence:

INFO (drv-mem): (drv_mem.c:3158) {test} stripe-2.0xad002002: used-bytes 887120 free-wblocks 62 write (0,0.0) defrag-q 0 defrag-read (0,0.0) defrag-write (0,0.0)
INFO (drv-mem): (drv_mem.c:3158) {test} stripe-5.0xad002005: used-bytes 915280 free-wblocks 62 write (0,0.0) defrag-q 0 defrag-read (0,0.0) defrag-write (0,0.0)
INFO (drv-mem): (drv_mem.c:3158) {test} stripe-1.0xad002001: used-bytes 900080 free-wblocks 62 write (0,0.0) defrag-q 0 defrag-read (0,0.0) defrag-write (0,0.0)
INFO (drv-mem): (drv_mem.c:3158) {test} stripe-3.0xad002003: used-bytes 896720 free-wblocks 62 write (0,0.0) defrag-q 0 defrag-read (0,0.0) defrag-write (0,0.0)
INFO (drv-mem): (drv_mem.c:3158) {test} stripe-0.0xad002000: used-bytes 909120 free-wblocks 62 write (0,0.0) defrag-q 0 defrag-read (0,0.0) defrag-write (0,0.0)
INFO (drv-mem): (drv_mem.c:3158) {test} stripe-7.0xad002007: used-bytes 898960 free-wblocks 62 write (0,0.0) defrag-q 0 defrag-read (0,0.0) defrag-write (0,0.0)
INFO (drv-mem): (drv_mem.c:3158) {test} stripe-6.0xad002006: used-bytes 897040 free-wblocks 62 write (0,0.0) defrag-q 0 defrag-read (0,0.0) defrag-write (0,0.0)
INFO (drv-mem): (drv_mem.c:3158) {test} stripe-4.0xad002004: used-bytes 895680 free-wblocks 62 write (0,0.0) defrag-q 0 defrag-read (0,0.0) defrag-write (0,0.0)

`storage-engine.stripe[ix].defrag_partial_writes`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_storage_engine_stripe_defrag_partial_writes

Description

The number of wblocks partial flushed to storage-engine.stripe[ix] by defrag.

Introduced

7.1.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancens

`storage-engine.stripe[ix].defrag_q`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_storage_engine_stripe_defrag_q

Description

The number of wblocks queued to be defragged on storage-engine.stripe[ix].

Introduced

7.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancensstorage

Detail

`storage-engine.stripe[ix].defrag_reads`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_storage-engine_stripe_defrag_reads

Description

Number of wblocks that have been sent to the defrag_q from storage-engine.stripe[ix].

Blocks are selected for defragmentation when their usage falls below the configured defrag-lwm-pct.

Introduced

7.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancensstoragestorage-engine

Detail

`storage-engine.stripe[ix].defrag_writes`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_storage_engine_stripe_defrag_writes

Description

The number of wblocks defrag has written to storage-engine.stripe[ix].

Introduced

7.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancensstoragestorage-engine

Detail

`storage-engine.stripe[ix].free_wblocks`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_storage-engine_stripe_free_wblocks

Description

Number of wblocks (write blocks) free on storage-engine.stripe[ix].

Introduced

7.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancensstoragestorage-engine

Detail

`storage-engine.stripe[ix].partial_writes`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_storage_engine_stripe_partial_writes

Description

The number of wblocks partial flushed to storage-engine.stripe[ix] by writes.

Introduced

7.1.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancens

`storage-engine.stripe[ix].used_bytes`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_storage_engine_stripe_used_bytes

Description

Number of bytes used for data on storage-engine.stripe[ix].

Introduced

7.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancensstoragestorage-engine

Detail

`storage-engine.stripe[ix].writes`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_storage-engine.stripe[ix].writes

Description

The number of wblocks written to storage-engine.stripe[ix] since Aerospike started. When running with commit-to-device set to true, this counter will only account for full blocks written and therefore will only count blocks written through the defragmentation process as the client writes would write to disk individually rather than at a block level. Includes defragmentation writes.

Introduced

7.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancensstoragestorage-engine

Detail

`storage-engine.stripe[ix]`

optional

Context

namespace

Description

Stripe is a shared memory segment. Each stripe will have its respective shared memory key, which is internally determined by the server. ‘ix’ is the stripe index. For example, if there are eight stripes, the index(ix) value will be from 0 to 7. So, storage-engine.stripe[0]=stripe-0.0xad002000 and storage-engine.stripe[1]=stripe-1.0xad002001 will show two shared memory segments (stripes) and their keys. This statistic applies to the namespaces configured with storage-engine memory.

Introduced

7.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancensstoragestorage-engine

Detail

`sub_objects`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_sub_objects
Datadog: aerospike.server.namespace.sub_objects

Description

Number of LDT sub objects. Also aggregated at the service statistic level under the same name.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`tombstones`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_tombstones
Datadog: aerospike.server.sets.tombstones

Description

Total number tombstones in this namespace on this node.

Introduced

3.10.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`truncate_lut`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_truncate_lut
Datadog: aerospike.server.sets.truncate_lut

Description

‘The most covering truncate_lut for this namespace. See truncate or truncate-namespace.’

Introduced

3.12.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`truncated_records`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_truncated_records
Datadog: aerospike.server.namespace.truncated_records

Description

The total number of records deleted by truncation for this namespace (includes set truncations). See truncate or truncate-namespace.

Introduced

3.12.0

Removed

6.3.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`truncating`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_truncating
Datadog: aerospike.server.sets.truncating

Description

Indicates when the namespace is in the process of being truncated.

Introduced

6.3.0

Removed

Measurement type

gauge

Data type

boolean

Labels

cluster_namejobserviceinstancelongitudelatitudens

`ttl_reductions_applied`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_ttl_reductions_applied

Description

Incremented when apply-ttl-reduction is true and a command reduces the TTL.

Introduced

8.1.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`ttl_reductions_ignored`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_ttl_reductions_ignored

Description

Incremented when apply-ttl-reduction is false and a command’s attempt to reduce the TTL is ignored. By ignored, the transaction continues and the TTL remains unchanged on the resulting record update.

Introduced

8.1.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`udf_sub_lang_delete_success`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_udf_sub_lang_delete_success
Datadog: aerospike.server.namespace.udf_sub_lang_delete_success

Description

Number of successful UDF delete sub-transactions for scan/query background UDF jobs. See the udf_sub_udf_complete, udf_sub_udf_error, udf_sub_udf_filtered_out, udf_sub_udf_timeout statistics for the containing UDF operation statuses.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`udf_sub_lang_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_udf_sub_lang_error
Datadog: aerospike.server.namespace.udf_sub_lang_error

Description

Number of UDF sub-transactions errors for scan/query background UDF jobs. See the udf_sub_udf_complete, udf_sub_udf_error, udf_sub_udf_filtered_out, udf_sub_udf_timeout statistics for the containing UDF operation statuses.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`udf_sub_lang_read_success`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_udf_sub_lang_read_success
Datadog: aerospike.server.namespace.udf_sub_lang_read_success

Description

Number of successful UDF read sub-transactions for scan/query background UDF jobs. See the udf_sub_udf_complete, udf_sub_udf_error, udf_sub_udf_filtered_out, udf_sub_udf_timeout statistics for the containing UDF operation statuses.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`udf_sub_lang_write_success`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_udf_sub_lang_write_success
Datadog: aerospike.server.namespace.udf_sub_lang_write_success

Description

Number of successful UDF write sub-transactions for scan/query background UDF jobs. See the udf_sub_udf_complete, udf_sub_udf_error, udf_sub_udf_filtered_out, udf_sub_udf_timeout statistics for the containing UDF operation statuses.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`udf_sub_tsvc_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_udf_sub_tsvc_error
Datadog: aerospike.server.namespace.udf_sub_tsvc_error

Description

Number of UDF subtransactions that failed with an error in the transaction service, before attempting to handle the transaction for scan/query background UDF jobs. For example protocol errors or security permission mismatch. Does not include timeouts. In strong-consistency enabled namespaces, this includes transactions against unavailable_partitions and dead_partitions.

The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`udf_sub_tsvc_timeout`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_udf_sub_tsvc_timeout
Datadog: aerospike.server.namespace.udf_sub_tsvc_timeout

Description

Number of UDF subtransactions that timed out in the transaction service, before attempting to handle the transaction for scan/query background UDF jobs.

The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`udf_sub_udf_complete`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_udf_sub_udf_complete
Datadog: aerospike.server.namespace.udf_sub_udf_complete

Description

Number of completed UDF subtransactions for scan/query background UDF jobs. See the following statistics for the underlying operation statuses: udf_sub_lang_delete_success, udf_sub_lang_error, udf_sub_lang_read_success, udf_sub_lang_write_success.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`udf_sub_udf_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_udf_sub_udf_error
Datadog: aerospike.server.namespace.udf_sub_udf_error

Description

Number of failed UDF subtransactions for scan/query background UDF jobs. Does not include timeouts. See the following statistics for the underlying operation statuses:udf_sub_lang_delete_success, udf_sub_lang_error, udf_sub_lang_read_success, udf_sub_lang_write_success.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`udf_sub_udf_filtered_out`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_udf_sub_udf_filtered_out
Datadog: aerospike.server.namespace.udf_sub_udf_filtered_out

Description

Number of UDF subtransactions that did not happen because the record was filtered out with Filter Expressions.

Introduced

4.7.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`udf_sub_udf_timeout`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_udf_sub_udf_timeout
Datadog: aerospike.server.namespace.udf_sub_udf_timeout

Description

Number of UDF subtransactions that timed out for scan/query background UDF jobs. See the following statistics for the underlying operation statuses: udf_sub_lang_delete_success, udf_sub_lang_error, udf_sub_lang_read_success, udf_sub_lang_write_success.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`unavailable_partitions`

critical

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_unavailable_partitions
Datadog: aerospike.server.namespace.unavailable_partitions

Description

Number of unavailable partitions for this namespace (when using strong-consistency). This is the number of partitions that are unavailable when roster nodes are missing. Will turn into dead_partitions if still unavailable when all roster nodes are present.

Introduced

4.0.0

Removed

Measurement type

gauge

Data type

integer

Monitoring

IF unavailable_partitions is not zero, critical ALERT.

Check for network issues and make sure the cluster forms properly.

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

`unreplicated_records`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_unreplicated_records
Datadog: aerospike.server.namespace.unreplicated_records

Description

Number of unreplicated records in the namespace. Applicable only for namespaces operating under the strong-consistency mode.

Introduced

5.7.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

When a re-replication is triggered, the unreplicated_records stat is decremented as the record goes into the “replicating” state. It is incremented back if the re-replication attempt fails, and the record gets into an unreplicated state again.
Re-replication could have already been triggered even if a client tsvc timeout happens for the respective transaction that triggered it.

`write-smoothing-period`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_write-smoothing-period

Description

Removed

Introduced

Removed

Yes

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`xdr_bin_cemeteries`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_xdr_bin_cemeteries
Datadog: aerospike.server.namespace.xdr_bin_cemeteries

Description

Number of tombstones with bin tombstones. They are generated when bin convergence is enabled and a record is durably deleted.

Introduced

5.5.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`xdr_client_delete_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_xdr_client_delete_error
Datadog: aerospike.server.namespace.xdr_client_delete_error

Description

Number of delete requests initiated by XDR that failed on the namespace on this node. For the total number of XDR initiated delete requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_delete_success, xdr_client_delete_error, xdr_client_delete_timeout, xdr_client_delete_not_found, xdr_from_proxy_delete_success, xdr_from_proxy_delete_error, xdr_from_proxy_delete_timeout, xdr_from_proxy_delete_not_found.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`xdr_client_delete_not_found`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_xdr_client_delete_not_found
Datadog: aerospike.server.namespace.xdr_client_delete_not_found

Description

Number of delete requests initiated by XDR that failed on the namespace on this node due to the record not being found. For the total number of XDR initiated delete requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_delete_success, [xdr_client_delete_error](/database/reference/metrics#namespace__xdr_client_delete_error(, xdr_client_delete_timeout, xdr_client_delete_not_found, xdr_from_proxy_delete_success, xdr_from_proxy_delete_error, xdr_from_proxy_delete_timeout, xdr_from_proxy_delete_not_found.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`xdr_client_delete_success`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_xdr_client_delete_success
Datadog: aerospike.server.namespace.xdr_client_delete_success

Description

Number of delete requests initiated by XDR that succeeded on the namespace on this node. For the total number of XDR initiated delete requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_delete_success, xdr_client_delete_error, xdr_client_delete_timeout, xdr_client_delete_not_found, xdr_from_proxy_delete_success, xdr_from_proxy_delete_error, xdr_from_proxy_delete_timeout, xdr_from_proxy_delete_not_found.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`xdr_client_delete_timeout`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_xdr_client_delete_timeout
Datadog: aerospike.server.namespace.xdr_client_delete_timeout

Description

Number of delete requests initiated by XDR that timed out on the namespace on this node. For the total number of XDR initiated delete requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_delete_success, xdr_client_delete_error, xdr_client_delete_timeout, xdr_client_delete_not_found, xdr_from_proxy_delete_success, xdr_from_proxy_delete_error, xdr_from_proxy_delete_timeout, xdr_from_proxy_delete_not_found.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`xdr_client_write_error`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_xdr_client_write_error
Datadog: aerospike.server.namespace.xdr_client_write_error

Description

Number of write requests initiated by XDR that failed on the namespace on this node. For the total number of XDR initiated write requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_write_success, xdr_client_write_error, xdr_client_write_timeout, xdr_from_proxy_write_success, xdr_from_proxy_write_error, xdr_from_proxy_write_timeout.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`xdr_client_write_success`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_xdr_client_write_success
Datadog: aerospike.server.namespace.xdr_client_write_success

Description

Number of write requests initiated by XDR that succeeded on the namespace on this node. For the total number of XDR initiated write requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_write_success, xdr_client_write_error, xdr_client_write_timeout, xdr_from_proxy_write_success, xdr_from_proxy_write_error, xdr_from_proxy_write_timeout.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`xdr_client_write_timeout`

watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_xdr_client_write_timeout
Datadog: aerospike.server.namespace.xdr_client_write_timeout

Description

Number of write requests initiated by XDR that timed out on the namespace on this node. For the total number of XDR initiated write requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_write_success, xdr_client_write_error, xdr_client_write_timeout, xdr_from_proxy_write_success, xdr_from_proxy_write_error, xdr_from_proxy_write_timeout.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`xdr_from_proxy_delete_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_xdr_from_proxy_delete_error
Datadog: aerospike.server.namespace.xdr_from_proxy_delete_error

Description

Number of errors for XDR delete commands proxied from another node. For the total number of XDR initiated delete requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_delete_success, xdr_client_delete_error, xdr_client_delete_timeout, xdr_client_delete_not_found, xdr_from_proxy_delete_success, xdr_from_proxy_delete_error, xdr_from_proxy_delete_timeout, xdr_from_proxy_delete_not_found.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`xdr_from_proxy_delete_not_found`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_xdr_from_proxy_delete_not_found
Datadog: aerospike.server.namespace.xdr_from_proxy_delete_not_found

Description

Number of XDR delete commands proxied from another node that resulted in not found. For the total number of XDR initiated delete requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_delete_success, xdr_client_delete_error, xdr_client_delete_timeout, xdr_client_delete_not_found, xdr_from_proxy_delete_success, xdr_from_proxy_delete_error, xdr_from_proxy_delete_timeout, xdr_from_proxy_delete_not_found.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`xdr_from_proxy_delete_success`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_xdr_from_proxy_delete_success
Datadog: aerospike.server.namespace.xdr_from_proxy_delete_success

Description

Number of successful XDR delete commands proxied from another node. For the total number of XDR initiated delete requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_delete_success, xdr_client_delete_error, xdr_client_delete_timeout, xdr_client_delete_not_found, xdr_from_proxy_delete_success, xdr_from_proxy_delete_error, xdr_from_proxy_delete_timeout, xdr_from_proxy_delete_not_found.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`xdr_from_proxy_delete_timeout`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_xdr_from_proxy_delete_timeout
Datadog: aerospike.server.namespace.xdr_from_proxy_delete_timeout

Description

Number of timeouts for XDR delete commands proxied from another node. For the total number of XDR initiated delete requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_delete_success, xdr_client_delete_error, xdr_client_delete_timeout, xdr_client_delete_not_found, xdr_from_proxy_delete_success, xdr_from_proxy_delete_error, xdr_from_proxy_delete_timeout, xdr_from_proxy_delete_not_found.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`xdr_from_proxy_write_error`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_xdr_from_proxy_write_error
Datadog: aerospike.server.namespace.xdr_from_proxy_write_error

Description

Number of errors for XDR write commands proxied from another node. For the total number of XDR initiated write requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_write_success, xdr_client_write_error, xdr_client_write_timeout, xdr_from_proxy_write_success, xdr_from_proxy_write_error, xdr_from_proxy_write_timeout.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`xdr_from_proxy_write_success`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_xdr_from_proxy_write_success
Datadog: aerospike.server.namespace.xdr_from_proxy_write_success

Description

Number of successful XDR write commands proxied from another node. For the total number of XDR initiated write requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_write_success, xdr_client_write_error, xdr_client_write_timeout, xdr_from_proxy_write_success, xdr_from_proxy_write_error, xdr_from_proxy_write_timeout.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`xdr_from_proxy_write_timeout`

optional

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_xdr_from_proxy_write_timeout
Datadog: aerospike.server.namespace.xdr_from_proxy_write_timeout

Description

Number of timeouts for XDR write commands proxied from another node. For the total number of XDR initiated write requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_write_success, xdr_client_write_error, xdr_client_write_timeout, xdr_from_proxy_write_success, xdr_from_proxy_write_error, xdr_from_proxy_write_timeout.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`xdr_tombstones`

enterprise watch

Context

namespace

Backend-specific Name

Prometheus: aerospike_namespace_xdr_tombstones
Datadog: aerospike.server.namespace.xdr_tombstones

Description

Number of tombstones on this node which are created by XDR for non-durable client deletes. This includes both master and prole.

Introduced

5.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Detail

For namespaces configured with XDR, non-durable delete transactions create XDR tombstones (not to be confused with the durable delete tombstones).

XDR tombstones are deleted after they have been shipped via XDR. The XDR tomb raider runs as specified in xdr-tomb-raider-period and uses xdr-tomb-raider-threads to reduce the index and delete XDR tombstones where the last update time (LUT) is older than the current global last ship time (GLST). The GLST is computed as the lowest value across the last ship time (LST) of all the partitions for the namespace. This is done by having each node send the LST for each partition they own to the principal node which then determines the lowest value and sends it back to all nodes in the cluster via the system metadata (SMD) fabric channel.

Node_stats

`batch_index_complete`

watch

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_batch_index_complete
Datadog: aerospike.server.node_stats.batch_index_complete

Description

Number of batch index requests completed.

Introduced

3.6.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`batch_index_created_buffers`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_batch_index_created_buffers
Datadog: aerospike.server.node_stats.batch_index_created_buffers

Description

Number of 128KB response buffers created. Response buffers are created when there are no buffers left in the pool. If this number consistently increases and there is available memory, you should increase batch-max-unused-buffers.

Introduced

3.6.4

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`batch_index_delay`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_batch_index_delay
Datadog: aerospike.server.node_stats.batch_index_delay

Description

Number of times a batch index response buffer has been delayed (WOULDBLOCK on the send). The number of times a batch index transaction is completely abandoned because it went over its overall allocated time after being delayed is counted under the batch_index_error statistic and will have a WARNING log message associated.

Introduced

4.1.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`batch_index_destroyed_buffers`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_batch_index_destroyed_buffers
Datadog: aerospike.server.node_stats.batch_index_destroyed_buffers

Description

Number of 128KB response buffers destroyed. Response buffers are destroyed when there is no slot left to put the buffer back into the pool. The maximum response buffer pool size is batch-max-unused-buffers.

Introduced

3.6.4

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`batch_index_error`

warn

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_batch_index_error
Datadog: aerospike.server.node_stats.batch_index_error

Description

Number of batch index requests that completed with an error when, for example, the client has timed out but the server is still attempting to send response buffers back. Another occurrence is if the server abandons the transaction due to encountering delays (WOULDBLOCK on send) of more than twice the total timeout set by the client, or 30 seconds if not set when sending response buffers back. This is accompanied by a WARNING log message. Starting with version 6.4.0, this statistic is incremented when a transaction experiences delays exceeding the client timeout by a factor of 1. Each encountered delay is counted under the batch_index_delay statistic.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Monitoring

Compare batch_index_error to batch_index_complete. If ratio is higher than acceptable, alert Operations to investigate.

Labels

cluster_namejobserviceinstancelongitudelatitude

`batch_index_huge_buffers`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_batch_index_huge_buffers
Datadog: aerospike.server.node_stats.batch_index_huge_buffers

Description

Number temporary response buffers created that exceeded 128KB. Huge buffers are created when one of the records is retrieved that is greater than 128KB. Huge records do not benefit from batching and can result in excessive memory thrashing on the server. The batch_index_created_buffers and batch_index_destroyed_buffers do include the huge buffers created and destroyed.

Introduced

3.6.4

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`batch_index_initiate`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_batch_index_initiate
Datadog: aerospike.server.node_stats.batch_index_initiate

Description

Number of batch index requests received.

Introduced

3.6.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`batch_index_proto_compression_ratio`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_batch_index_proto_compression_ratio
Datadog: aerospike.server.node_stats.batch_index_proto_compression_ratio

Description

Measures the average compressed size to uncompressed size ratio for protocol message data in batch index responses. Thus 1.000 indicates no compression and 0.100 indicates a 1:10 compression ratio (90% reduction in size).

Introduced

4.8.0

Removed

Measurement type

moving average

Data type

decimal

Labels

cluster_namejobserviceinstancelongitudelatitude

Detail

`batch_index_proto_uncompressed_pct`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_batch_index_proto_uncompressed_pct
Datadog: aerospike.server.node_stats.batch_index_proto_uncompressed_pct

Description

Measures the percentage of batch index responses with uncompressed protocol message data. Thus 0.000 indicates all responses with compressed data, and 100.000 indicates no responses with compressed data. For example, if protocol message data compression is not used, this metric will remain set to 0.000. If protocol message data compression is then turned on and all responses are compressed, this metric will remain set to 0.000. The only way this metric will ever be set to a value different than 0.000 is if compression is used, but some responses are not compressed (which happens when the uncompressed size is so small that the server does not try to compress, or when the compression fails).

Introduced

4.8.0

Removed

Measurement type

gauge

Data type

decimal

Labels

cluster_namejobserviceinstancelongitudelatitude

Detail

`batch_index_queue`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_batch_index_queue

Description

Number of batch index requests (transactions count) processed and response buffer blocks used on each batch queue. Format: Q1_REQUESTS:Q1_BUFFERS, Q2_REQUESTS:Q2_BUFFERS, ...

The buffer block counter is actually decremented on batch responses before the transaction count is decremented. Therefore, it is possible for a buffer slot becomes available on the queue and a new batch transaction count is incremented before the previous batch command count is decremented. It is also possible that multiple transactions came in for a thread for which none of the response buffers has been created yet. Finally, batch_index_huge_buffers are counted as part of the buffer blocks used on each batch queue.

Introduced

3.6.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`batch_index_timeout`

watch

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_batch_index_timeout
Datadog: aerospike.server.node_stats.batch_index_timeout

Description

Number of batch index requests that timed-out on the server before being processed. Those would be caused by a batch subtransaction that has timed out for this batch index transaction. The overall time allowed for a batch-index transaction on the server is not bound, except if a delay is encountered (WOULDBLOCK on send).

For Database 4.1.0 through 6.3.0, the overall batch index transaction max delay time is twice the total timeout set by the client, or 30 seconds if there is no timeout set by the client.

For Database 6.4.0 and later, the overall batch index transaction max delay time is the same as set by the client, or 30 seconds if there is no timeout set by the client.

Introduced

3.6.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`batch_index_unused_buffers`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_batch_index_unused_buffers
Datadog: aerospike.server.node_stats.batch_index_unused_buffers

Description

Number of available 128 KB response buffers currently in buffer pool.

Introduced

3.6.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`client_connections`

critical

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_client_connections
Datadog: aerospike.server.node_stats.client_connections

Description

Number of active client connections to this node. Also available in the log on the fds proto ticker line.

Introduced

Removed

Measurement type

gauge

Data type

integer

Monitoring

If client_connections is below an expected low value, then this condition might indicate a problem with the network between clients and server.
If client_connections is greater than an expected high value, then this condition might indicate a problem with clients rapidly opening and closing sockets.
If client_connections is at or near proto_fd_max, then the server is either currently unable to accept new connections or might soon be unable to do so.

Labels

cluster_namejobserviceinstancelongitudelatitude

`client_connections_closed`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_client_connections_closed
Datadog: aerospike.server.node_stats.client_connections_closed

Description

Number of client connections that have been closed. One of client_connections_opened or client_connections_closed should be closely monitored or alerted against. Also available in the log on the fds proto ticker line.

Introduced

5.6.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`client_connections_opened`

critical

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_client_connections_opened
Datadog: aerospike.server.node_stats.client_connections_opened

Description

Number of client connections created to this node since the node was started. One of client_connections_opened or client_connections_closed should be closely monitored or alerted against. Also available in the log on the fds proto ticker line.

Introduced

5.6.0

Removed

Measurement type

counter

Data type

integer

Monitoring

If client_connections_opened changes unexpectedly without clients having been added or removed, or a significant change in workload having occurred, this condition might indicate a slow down on a node or a connectivity issue on the node.

Labels

cluster_namejobserviceinstancelongitudelatitude

`cluster_clock_skew_ms`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_cluster_clock_skew_ms
Datadog: aerospike.server.node_stats.cluster_clock_skew_ms

Description

Current maximum clock skew in milliseconds between nodes in a cluster. Will trigger clock_skew_stop_writes when breaching the cluster_clock_skew_stop_writes_sec threshold. This threshold is normally 20 seconds for strong-consistency namespaces on any Aerospike version, or 40 seconds for AP namespaces where NSUP is enabled (nsup-period is not zero) in Database 4.5.1 or later.

Introduced

4.0.0.4

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`cluster_clock_skew_stop_writes_sec`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_cluster_clock_skew_stop_writes_sec
Datadog: aerospike.server.node_stats.cluster_clock_skew_stop_writes_sec

Description

The threshold at which any namespace that is set to strong-consistency stops accepting writes due to clock skew (cluster_clock_skew_ms).

This value is in seconds, not milliseconds.

Although this value shows as 0 for AP namespaces, starting with Database 4.5.1, these namespaces stop accepting writes if NSUP is enabled (nsup-period is not zero) and the clock skew exceeds 40 seconds.

Introduced

4.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`cluster_generation`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_cluster_generation
Datadog: aerospike.server.node_stats.cluster_generation

Description

A 64 bit unsigned integer incremented on a node for every successful cluster partition re-balance or transition to orphan state. This is a node local value and does not need to be the same across the cluster.

Introduced

4.3.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`cluster_integrity`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_cluster_integrity
Datadog: aerospike.server.node_stats.cluster_integrity

Description

When false, indicates integrity issues within the cluster, meaning that some nodes are either faulty or dead. A node in the succession list is deemed faulty if the node is alive and it reports to be an orphan or is part of some other cluster. Another condition for a faulty node would be for it to be alive but having a clustering protocol identifier that does not match the rest of the cluster. When true, indicates that the cluster is in a whole and complete state (as far as the nodes that it sees and is able to connect to all concerned). Information about a cluster integrity fault is also logged to the server log file repeatedly.

Introduced

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`cluster_is_member`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_cluster_is_member
Datadog: aerospike.server.node_stats.cluster_is_member

Description

When false, indicates that the node is not joined to a cluster; that is, it is an orphan. When true, indicates that the node is joined to a cluster.

Introduced

3.13.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`cluster_key`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_cluster_key

Description

Randomly generated 64 bit hexadecimal string used to name the last Paxos cluster state agreement.

Introduced

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`cluster_max_compatibility_id`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_cluster_max_compatibility_id
Datadog: aerospike.server.node_stats.cluster_max_compatibility_id

Description

Each node has a compatibility ID that is an integer based on the node’s database version. During upgrades, this value is used to determine software compatibility. cluster_max_compatibility_id indicates the cluster’s maximum software version. See cluster_min_compatibility_id.

Introduced

5.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`cluster_min_compatibility_id`

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_cluster_min_compatibility_id
Datadog: aerospike.server.node_stats.cluster_min_compatibility_id

Description

Each node has a compatibility ID that is an integer based on the node’s database version. During upgrades, this value is used to determine software compatibility. cluster_min_compatibility_id indicates the cluster’s minimum software version. See cluster_max_compatibility_id.

Introduced

5.0.0

Removed

Measurement type

gauge

Labels

cluster_namejobserviceinstancelongitudelatitude

`cluster_principal`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_cluster_principal

Description

This specifies the Node ID of the current cluster principal. Will be ‘0’ on an orphan node.

Introduced

4.3.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`cluster_size`

critical

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_cluster_size
Datadog: aerospike.server.node_stats.cluster_size

Description

Size of the cluster. Can be checked to make sure the size of the cluster is the expected one after adding or removing a node. Check across all nodes in a cluster.

Introduced

Removed

Measurement type

gauge

Data type

integer

Monitoring

If cluster_size does not equal the expected cluster size and the cluster is not undergoing maintenance, your operations group needs to investigate.

Labels

cluster_namejobserviceinstancelongitudelatitude

`demarshal_error`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_demarshal_error
Datadog: aerospike.server.node_stats.demarshal_error

Description

Number of errors during the demarshal step.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`deprecated_requests`

watch

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_deprecated_requests

Description

Number of times a deprecated feature has been used.

Introduced

8.1.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_nameservice

`early_tsvc_batch_sub_error`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_early_tsvc_batch_sub_error
Datadog: aerospike.server.node_stats.early_tsvc_batch_sub_error

Description

Number of errors early in the transaction for batch subtransactions. For example, bad/unknown namespace name or security authentication errors.

The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.

Introduced

3.9.0

Removed

7.2.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`early_tsvc_client_error`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_early_tsvc_client_error
Datadog: aerospike.server.node_stats.early_tsvc_client_error

Description

Number of errors early in the transaction for direct client requests. Those include transactions hitting the proto-fd-max, transactions with a bad/unknown namespace name or security authentication errors. Those also include cases where partitions are unavailable in AP mode, when clients attempt transactions against an orphan node.

The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`early_tsvc_from_proxy_batch_sub_error`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_early_tsvc_from_proxy_batch_sub_error
Datadog: aerospike.server.node_stats.early_tsvc_from_proxy_batch_sub_error

Description

Number of errors early in the commands for batch subtransactions proxied from another node. For example, bad or unknown namespace name or security authentication errors.

The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`early_tsvc_from_proxy_error`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_early_tsvc_from_proxy_error
Datadog: aerospike.server.node_stats.early_tsvc_from_proxy_error

Description

Number of errors early in the commands for commands, other than batch subtransactions, proxied from another node, for example, bad or unknown namespace name or security authentication errors.

The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.

Introduced

4.5.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`early_tsvc_ops_sub_error`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_early_tsvc_ops_sub_error
Datadog: aerospike.server.node_stats.early_tsvc_ops_sub_error

Description

Number of errors early in an internal ops subtransaction (records accessed by a background query operate command). For example, bad or unknown namespace name or security authentication errors.

The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.

Introduced

4.7.0

Removed

7.2.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`early_tsvc_udf_sub_error`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_early_tsvc_udf_sub_error
Datadog: aerospike.server.node_stats.early_tsvc_udf_sub_error

Description

Number of errors early in the transaction for UDF subtransactions. For example, bad or unknown namespace name or security authentication errors.

The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.

Introduced

3.9.0

Removed

7.2.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`entries_per_bval`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_entries_per_bval
Datadog: aerospike.server.sindex.entries_per_bval

Description

Ratio of entries to unique bvals (bin values) for a given secondary index on the node. The value is an integer (rounded to the nearest integer) and is calculated using hyperloglog estimates for unique bvals. The stat is generated by a background process. A value of 0 means the stat is not yet generated. The process runs when the secondary index is created and populated, at startup and every hour thereafter. A low value means that the index is highly selective.

Introduced

6.1.0

Removed

Measurement type

gauge

Data type

integer

Monitoring

This stat appears in the response to the sindex-stat info command to retrieve statistics for a specified namespace and index. For example, asinfo -v 'sindex-stat:ns=namespace1;indexname=index21'.

Labels

cluster_namejobserviceinstancelongitudelatitude

`entries_per_rec`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_entries_per_rec
Datadog: aerospike.server.sindex.entries_per_rec

Description

Ratio of entries to unique records for a given secondary index on the node. This value will always be 1 if it is not a list or map secondary index. The value is an integer (rounded to the nearest integer) and is calculated using hyperloglog estimates for unique recs. The stat is generated by a background process. A value of 0 means the stat is not yet generated. The process runs at startup, every hour thereafter, and when a secondary index is created and populated.

Introduced

6.1.0

Removed

Measurement type

gauge

Data type

integer

Monitoring

This stat appears in the response to the ‘sindex-stat’ info command to retrieve statistics for a specified namespace and index. For example, asinfo -v 'sindex-stat:ns=namespace1;indexname=index21'.

Labels

cluster_namejobserviceinstancelongitudelatitude

`err_storage_defrag_fd_get`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_err_storage_defrag_fd_get

Description

Removed

Introduced

Removed

Yes

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`err_sync_copy_null_node`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_err_sync_copy_null_node

Description

Number of errors during cluster state exchange because of missing general node information.

Introduced

Removed

Yes

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`fabric_bulk_recv_rate`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_fabric_bulk_recv_rate
Datadog: aerospike.server.node_stats.fabric_bulk_recv_rate

Description

Rate of traffic (bytes/sec) received by the fabric bulk channel during the last ticker-interval (every 10 seconds by default).

Introduced

3.11.1.1

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`fabric_bulk_send_rate`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_fabric_bulk_send_rate
Datadog: aerospike.server.node_stats.fabric_bulk_send_rate

Description

Rate of traffic (bytes/sec) sent by the fabric bulk channel during the last ticker-interval (every 10 seconds by default).

Introduced

3.11.1.1

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`fabric_connections`

watch

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_fabric_connections
Datadog: aerospike.server.node_stats.fabric_connections

Description

Number of active fabric connections to this node. Also available in the log on the fds proto ticker line.

Introduced

3.9.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`fabric_connections_closed`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_fabric_connections_closed
Datadog: aerospike.server.node_stats.fabric_connections_closed

Description

Number of fabric connections that have been closed. Also available in the log on the fds proto ticker line.

Introduced

5.6.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`fabric_connections_opened`

critical

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_fabric_connections_opened
Datadog: aerospike.server.node_stats.fabric_connections_opened

Description

Number of fabric connections created to this node since the node was started. Also available in the log on the fds proto ticker line.

Introduced

5.6.0

Removed

Measurement type

counter

Data type

integer

Monitoring

If fabric_connections_opened is unexpectedly changing, alert as this condition would indicate a connectivity problem with a node or a cluster change.

Labels

cluster_namejobserviceinstancelongitudelatitude

`fabric_ctrl_recv_rate`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_fabric_ctrl_recv_rate
Datadog: aerospike.server.node_stats.fabric_ctrl_recv_rate

Description

Rate of traffic (bytes/sec) received by the fabric ctrl channel during the last ticker-interval (every 10 seconds by default).

Introduced

3.11.1.1

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`fabric_ctrl_send_rate`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_fabric_ctrl_send_rate
Datadog: aerospike.server.node_stats.fabric_ctrl_send_rate

Description

Rate of traffic (bytes/sec) sent by the fabric ctrl channel during the last ticker-interval (every 10 seconds by default).

Introduced

3.11.1.1

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`fabric_meta_recv_rate`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_fabric_meta_recv_rate
Datadog: aerospike.server.node_stats.fabric_meta_recv_rate

Description

Rate of traffic (bytes/sec) received by the fabric meta channel during the last ticker-interval (every 10 seconds by default).

Introduced

3.11.1.1

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`fabric_meta_send_rate`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_fabric_meta_send_rate
Datadog: aerospike.server.node_stats.fabric_meta_send_rate

Description

Rate of traffic (bytes/sec) sent by the fabric meta channel during the last ticker-interval (every 10 seconds by default).

Introduced

3.11.1.1

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`fabric_rw_recv_rate`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_fabric_rw_recv_rate
Datadog: aerospike.server.node_stats.fabric_rw_recv_rate

Description

Rate of traffic (bytes/sec) received by the fabric meta channel during the last ticker-interval (every 10 seconds by default).

Introduced

3.11.1.1

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`fabric_rw_send_rate`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_fabric_rw_send_rate
Datadog: aerospike.server.node_stats.fabric_rw_send_rate

Description

Rate of traffic (bytes/sec) sent by the fabric rw channel during the last ticker-interval (every 10 seconds by default).

Introduced

3.11.1.1

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`failed_best_practices`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_failed_best_practices
Datadog: aerospike.server.node_stats.failed_best_practices

Description

Indicates true if any of the best-practices, which are checked when the server starts, were violated, otherwise failed_best_practices will indicate false. Each failed best-practice will log a unique warning message and a list of failed best-practices can be queried using the best-practices info command.

Introduced

5.7.0

Removed

Measurement type

gauge

Data type

boolean

Labels

cluster_namejobserviceinstancelongitudelatitude

`heap_active_kbytes`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_heap_active_kbytes
Datadog: aerospike.server.node_stats.heap_active_kbytes

Description

The amount of memory in in-use pages, in KiB. An in-use page is a page that has some allocated memory (either partial or full).

Introduced

3.10.1

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`heap_allocated_kbytes`

watch

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_heap_allocated_kbytes
Datadog: aerospike.server.node_stats.heap_allocated_kbytes

Description

The amount of memory, in KiB, allocated by the asd daemon. The heap_allocated_kbytes / heap_active_kbytes ratio (6.0 or later) and heap_allocated_kbytes / heap_mapped_kbytes ratio (prior to 6.0) (also provided under heap_efficiency_pct) provide a picture of the fragmentation of the heap. This is for all memory usage except for the shared memory parts (for the primary index in the Enterprise Edition).

Introduced

3.10.1

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`heap_efficiency_pct`

warn

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_heap_efficiency_pct
Datadog: aerospike.server.node_stats.heap_efficiency_pct

Description

Provides an indication of the jemalloc heap fragmentation. This represents the heap_allocated_kbytes / heap_active_kbytes ratio. A lower number indicates a higher fragmentation rate.

Introduced

3.10.1

Removed

Measurement type

gauge

Data type

integer

Monitoring

If heap_efficiency_pct goes below 60% or 50% (depending on configuration, advise your operations group to investigate.

Labels

cluster_namejobserviceinstancelongitudelatitude

`heap_mapped_kbytes`

watch

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_heap_mapped_kbytes
Datadog: aerospike.server.node_stats.heap_mapped_kbytes

Description

Amount of memory in mapped pages in KiB, such as the amount of memory that JEM received from the Linux kernel. Should be a multiple of 4, which is the typical page size (4096 bytes).

Introduced

3.10.1

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`heap_site_count`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_heap_site_count
Datadog: aerospike.server.node_stats.heap_site_count

Description

Number of distinct sites in the server code (specific locations in server functions) that have allocated heap memory designated for tracking as governed by the debug-allocations setting from the time when the server was started. The heap_site_count is only nonzero when debug-allocations is set to a value other than none. The heap_site_count value can only increase.

Introduced

3.14.1

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`heartbeat_connections`

watch

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_heartbeat_connections
Datadog: aerospike.server.node_stats.heartbeat_connections

Description

Number of active heartbeat connections to this node. Also available in the log on the fds proto ticker line.

Introduced

3.9.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`heartbeat_connections_closed`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_heartbeat_connections_closed
Datadog: aerospike.server.node_stats.heartbeat_connections_closed

Description

Number of heartbeat connections that have been closed. Also available in the log on the fds proto ticker line.

Introduced

5.6.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`heartbeat_connections_opened`

critical

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_heartbeat_connections_opened
Datadog: aerospike.server.node_stats.heartbeat_connections_opened

Description

Number of heartbeat connections created to this node since the node was started. Also available in the log on the fds proto ticker line.

Introduced

5.6.0

Removed

Measurement type

counter

Data type

integer

Monitoring

If heartbeat_connections_opened is unexpectedly changing, alert as this condition would indicate a connectivity problem with a node or a cluster change.

Labels

cluster_namejobserviceinstancelongitudelatitude

`heartbeat_received_foreign`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_heartbeat_received_foreign
Datadog: aerospike.server.node_stats.heartbeat_received_foreign

Description

Total number of heartbeats received from remote nodes.

Introduced

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`heartbeat_received_self`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_heartbeat_received_self
Datadog: aerospike.server.node_stats.heartbeat_received_self

Description

Total number of multicast heartbeats from this node received by this node. Will be 0 for mesh.

Introduced

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`info_complete`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_info_complete
Datadog: aerospike.server.node_stats.info_complete

Description

Number of info requests completed.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`info_queue`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_info_queue
Datadog: aerospike.server.node_stats.info_queue

Description

Number of info requests pending in info queue.

Introduced

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`info_timeout`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_info_timeout
Datadog: aerospike.server.node_stats.info_timeout

Description

Tracks total timed-out info transactions. Related to info-max-ms.

Introduced

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`long_queries_active`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_long_queries_active
Datadog: aerospike.server.node_stats.long_queries_active

Description

Number of queries currently active (formerly queries_active or scans_active). The long_queries_active stat is shared by both primary index (PI) queries and secondary index (SI) queries. Only long queries are monitored.

Introduced

6.1.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`migrate_allowed`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_migrate_allowed
Datadog: aerospike.server.node_stats.migrate_allowed

Description

This indicates whether migrations are allowed or not on a node. true when allowed, false when not. When there is a change in a cluster, this statistic’s value will change to false until the rebalance is completed across all namespaces. The rebalance is the step that figures out all partition migrations that need to be scheduled. The rebalance is not the migrations itself but the process that precedes the partitions migrations. migrate_allowed true indicates that all migrations related statistics have been set and can be leveraged programmatically, for example, migrate_partitions_remaining to check if migrations are ongoing or not).

Introduced

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`migrate_partitions_remaining`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_migrate_partitions_remaining
Datadog: aerospike.server.node_stats.migrate_partitions_remaining

Description

This is the number of partitions remaining to migrate (in either direction). When migrate_allowed is true, this is the stat which will accurately determine if migrations are complete for a single node across all namespaces. There could be a short period after a reclustering event when this statistic shows 0 but the migrations have not started yet. During such time, migrate_allowed would return false.

Introduced

3.8.3

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`objects`

watch

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_objects
Datadog: aerospike.server.sets.objects

Description

Total number of replicated objects on this node. Includes master and replica objects.

Introduced

Removed

Measurement type

gauge

Data type

integer

Monitoring

Trending objects provides operations insight into object fluctuations over time.

Labels

cluster_namejobserviceinstancelongitudelatitude

`paxos_principal`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_paxos_principal

Description

Identifier for the node in which this node believes to be the Paxos Principal.

Introduced

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`process_cpu_pct`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_process_cpu_pct
Datadog: aerospike.server.node_stats.process_cpu_pct

Description

Percentage of CPU usage by the asd process.

Introduced

4.7.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`proxy_in_progress`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_proxy_in_progress
Datadog: aerospike.server.node_stats.proxy_in_progress

Description

Number of proxies in progress. Also called proxy hash. The command’s TTL (client set timeout or transaction-max-ms is checked every 5ms (Database 6.0.0 and later) when waiting in the proxy-hash.

Introduced

3.3.21

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`queries_active`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_queries_active

Description

Number of queries currently active (formerly scans_active). The bqueries_active stat is shared by both primary index (PI) queries and secondary index (SI) queries. Only long queries are monitored. Removed in Database 6.1.0, use long_queries_active.

Introduced

6.0.0

Removed

6.1.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`query_bad_records`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_query_bad_records

Description

Number of false positive entries in secondary index queries.

Introduced

Removed

Yes

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`query_long_running`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_query_long_running

Description

Number of long running queries currently in process.

Introduced

Removed

6.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`query_short_running`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_query_short_running
Datadog: aerospike.server.node_stats.query_short_running

Description

Number of short running queries currently in process.

Introduced

Removed

6.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`query_tracked`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_query_tracked

Description

Number of queries tracked by the system. (Number of queries which ran more than query untracked_time (default 1 sec)).

Introduced

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`read_touch_error`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_read_touch_error
Datadog: aerospike.server.namespace.read_touch_error

Description

Number of read touch errors which were not timeouts.

Introduced

7.1.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancens

`read_touch_skip`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_read_touch_skip
Datadog: aerospike.server.namespace.read_touch_skip

Description

Number of touches abandoned upon finding that another write (including an earlier touch) has taken place or is taking place, removing the need to proceed with the touch.

Introduced

7.1.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancens

`read_touch_success`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_read_touch_success
Datadog: aerospike.server.namespace.read_touch_success

Description

Number of successful read touches.

Introduced

7.1.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancens

`read_touch_timeout`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_read_touch_timeout
Datadog: aerospike.server.namespace.read_touch_timeout

Description

Number of touches that ended in timeout.

Introduced

7.1.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancens

`read_touch_tsvc_error`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_read_touch_tsvc_error
Datadog: aerospike.server.namespace.read_touch_tsvc_error

Description

Number of read touch subtransactions that failed with an error in the internal transaction queue. Does not include timeouts.

The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.

Introduced

7.1.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancens

`read_touch_tsvc_timeout`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_read_touch_tsvc_timeout
Datadog: aerospike.server.namespace.read_touch_tsvc_timeout

Description

Number of read touches that time out early in the internal transaction queue, while waiting to be picked up by a service thread.

The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.

Introduced

7.1.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancens

`reaped_fds`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_reaped_fds
Datadog: aerospike.server.node_stats.reaped_fds

Description

Number of idle client connections closed.

Introduced

Removed

Measurement type

counter

Data type

integer

Monitoring

If reaped_fds are growing more rapidly than normal , it may indicate client[s] are opening and closing sockets too rapidly — potential application issue.

Labels

cluster_namejobserviceinstancelongitudelatitude

`rw_err_dup_write_cluster_key`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_rw_err_dup_write_cluster_key

Description

Removed

Introduced

Removed

Yes

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`rw_err_dup_write_internal`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_rw_err_dup_write_internal

Description

Removed

Introduced

Removed

Yes

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`rw_in_progress`

warn

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_rw_in_progress
Datadog: aerospike.server.node_stats.rw_in_progress

Description

Number of rw transactions in progress. Also called rw hash. This tracks transaction parked on the rw hash while processing on other nodes (all write replicas, read duplicate resolutions). The transaction’s TTL (client set timeout or transaction-max-ms is checked every 5ms in Database 6.0.0 and later when waiting in the rw-hash.

Introduced

3.9.0

Removed

Measurement type

gauge

Data type

integer

Monitoring

Depends on expected workload.

If rw_in_progress is higher than expected, or if this deviates more than acceptable from the established baseline over time,alert operations to investigate the cause. May indicate a slowdown on a particular node or overloading on the fabric.

Labels

cluster_namejobserviceinstancelongitudelatitude

Detail

While a transaction is parked in the rw-hash, other transactions for the same record will be queued (those queued transactions wouldn’t be counted in this metric). Once a transaction completes, queued transactions for the same records get re-started (as tracked in the xxxx-restart benchmark histograms (such as write-restart). At that point, the first transaction to be processed will take the rw-hash slot and the other ones will wait for the next round. Transactions that need to be serialized (such as writes for the same record or a read transaction in strong consistency mode while a write transaction is in progress or any transaction requiring duplicate resolution) would not be proceed until they get their slot in the rw-hash.

`scans_active`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_scans_active

Description

Number of scans currently active. Removed in Database 6.0.0, use queries_active.

Introduced

3.6.0

Removed

6.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`sindex_gc_garbage_cleaned`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_sindex_gc_garbage_cleaned
Datadog: aerospike.server.node_stats.sindex_gc_garbage_cleaned

Description

Sum of secondary index garbage entries cleaned by sindex GC. Moved to namespace level as sindex_gc_cleaned in Database 5.7.0.

Introduced

3.3.10

Removed

5.7.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`sindex_gc_garbage_found`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_sindex_gc_garbage_found
Datadog: aerospike.server.node_stats.sindex_gc_garbage_found

Description

Sum of secondary index garbage entries found by sindex GC.

Introduced

3.3.10

Removed

5.7.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`sindex_gc_list_creation_time`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_sindex_gc_list_creation_time
Datadog: aerospike.server.node_stats.sindex_gc_list_creation_time

Description

Sum of time spent in finding secondary index garbage entries by sindex GC (millisecond).

Introduced

3.3.10

Removed

5.7.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`sindex_gc_list_deletion_time`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_sindex_gc_list_deletion_time
Datadog: aerospike.server.node_stats.sindex_gc_list_deletion_time

Description

Sum of time spent in cleaning sindex garbage entries by sindex GC (millisecond).

Introduced

3.3.10

Removed

5.7.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`sindex_gc_objects_validated`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_sindex_gc_objects_validated
Datadog: aerospike.server.node_stats.sindex_gc_objects_validated

Description

Number of secondary index entries processed by sindex GC.

Introduced

3.3.10

Removed

5.7.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`sindex_gc_retries`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_sindex_gc_retries
Datadog: aerospike.server.node_stats.sindex_gc_retries

Description

Number of retries when sindex GC cannot get sprigs lock. Replaced sindex_gc_locktimedout.

Introduced

4.2.0

Removed

5.7.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`sindex_ucgarbage_found`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_sindex_ucgarbage_found
Datadog: aerospike.server.node_stats.sindex_ucgarbage_found

Description

Number of un-cleanable garbage entries in the sindexes encountered through queries.

Introduced

3.3.3

Removed

5.7.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`stat_cluster_key_err_ack_rw_trans_reenqueue`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_stat_cluster_key_err_ack_rw_trans_reenqueue

Description

Number of Read/Write trans re-enqueued because of cluster key mismatch.

Introduced

Removed

Yes

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`stat_cluster_key_partition_transaction_queue_count`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_stat_cluster_key_partition_transaction_queue_count

Description

Removed/unused

Introduced

Removed

Yes

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`stat_cluster_key_prole_retry`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_stat_cluster_key_prole_retry

Description

Number of times a prole write was retried as a result of a cluster key mismatch.

Introduced

Removed

Yes

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`stat_cluster_key_regular_processed`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_stat_cluster_key_regular_processed

Description

Number of successful transactions that passed the cluster key test.

Introduced

Removed

Yes

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`stat_cluster_key_trans_to_proxy_retry`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_stat_cluster_key_trans_to_proxy_retry

Description

Number of times a proxy was redirected.

Introduced

Removed

Yes

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`stat_cluster_key_transaction_reenqueue`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_stat_cluster_key_transaction_reenqueue

Description

Removed/unused

Introduced

Removed

Yes

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`stat_evicted_set_objects`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_stat_evicted_set_objects

Description

Number of objects evicted from a Set due to set limits defined in Aerospike configuration.

Introduced

Removed

Yes

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`stat_single_bin_records`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_stat_single_bin_records

Description

Removed: Number of single bin records.

Introduced

Removed

Yes

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`stat_slow_trans_queue_batch_pop`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_stat_slow_trans_queue_batch_pop

Description

Number of times we moved a batch of trans from slow queue to fast queue.

Introduced

Removed

Yes

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`stat_slow_trans_queue_pop`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_stat_slow_trans_queue_pop

Description

Number of trans that were moved from slow queue to fast queue.

Introduced

Removed

Yes

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`stat_slow_trans_queue_push`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_stat_slow_trans_queue_push

Description

Number of trans that we pushed onto the slow queue.

Introduced

Removed

Yes

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`storage_defrag_wait`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_storage_defrag_wait

Description

Number of times the defrag waited (called sleep).

Introduced

Removed

Yes

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`sub_objects`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_sub_objects
Datadog: aerospike.server.namespace.sub_objects

Description

Number of LDT sub objects. Aggregated over the sub_objects stat at the namespace level.

Introduced

3.9.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`system_free_mem_kbytes`

critical

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_system_free_mem_kbytes
Datadog: aerospike.server.node_stats.system_free_mem_kbytes

Description

Amount of free system memory in kilobytes. Includes buffers and caches, but not shared memory.

Introduced

Removed

Measurement type

gauge

Data type

integer

Monitoring

If system_free_mem_kbytes is abnormally low, could indicate the server is approaching the limits of the available RAM. Operations should investigate and potentially add nodes or increase per node RAM.

Labels

cluster_namejobserviceinstancelongitudelatitude

`system_free_mem_pct`

critical

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_system_free_mem_pct
Datadog: aerospike.server.node_stats.system_free_mem_pct

Description

Percentage of free system memory.

Introduced

Removed

Measurement type

gauge

Data type

integer

Monitoring

If system_free_mem_pct is abnormally low, could indicate the server is approaching the limits of the available RAM. Operations should investigate and potentially add nodes or increase per node RAM.

Labels

cluster_namejobserviceinstancelongitudelatitude

`system_kernel_cpu_pct`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_system_kernel_cpu_pct
Datadog: aerospike.server.node_stats.system_kernel_cpu_pct

Description

Percentage of CPU usage by processes running in kernel mode.

Introduced

4.7.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`system_thp_mem_kbytes`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_system_thp_mem_kbytes
Datadog: aerospike.server.node_stats.system_thp_mem_kbytes

Description

Amount of memory in use by the Transparent Huge Page mechanism, in kilobytes.

Introduced

5.7.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`system_total_cpu_pct`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_system_total_cpu_pct
Datadog: aerospike.server.node_stats.system_total_cpu_pct

Description

Percentage of CPU usage by all running processes. Equal to system_user_cpu_pct + system_kernel_cpu_pct.

Introduced

4.7.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

Detail

`system_user_cpu_pct`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_system_user_cpu_pct
Datadog: aerospike.server.node_stats.system_user_cpu_pct

Description

Percentage of CPU usage by processes running in user mode.

Introduced

4.7.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`threads_detached`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_threads_detached
Datadog: aerospike.server.node_stats.threads_detached

Description

Number of detached server threads currently running.

Introduced

5.6.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`threads_joinable`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_threads_joinable
Datadog: aerospike.server.node_stats.threads_joinable

Description

Number of joinable server threads currently running.

Introduced

5.6.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`threads_pool_active`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_threads_pool_active
Datadog: aerospike.server.node_stats.threads_pool_active

Description

Number of currently active threads in the server thread pool.

Introduced

5.6.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`threads_pool_total`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_threads_pool_total
Datadog: aerospike.server.node_stats.threads_pool_total

Description

Total number of threads in the server thread pool.

Introduced

5.6.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`time_since_rebalance`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_time_since_rebalance
Datadog: aerospike.server.node_stats.time_since_rebalance

Description

Number of seconds since the last reclustering event, either triggered by the recluster info command or by a cluster disruption (such as a node being add/removed or a network disruption).

Introduced

4.3.1

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`tree_gc_queue`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_tree_gc_queue
Datadog: aerospike.server.node_stats.tree_gc_queue

Description

This is the number of trees queued up, ready to be completely removed (partitions drop). Corresponds to the tree-gc-q entry in the log ticker.

Introduced

3.10.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`tscan_aborted`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_tscan_aborted

Description

Number of scans that were aborted. Removed as of 3.6.0.

Introduced

Removed

Yes

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`tscan_initiate`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_tscan_initiate

Description

Number of new scan requests initiated. Removed as of 3.6.0.

Introduced

Removed

Yes

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`tscan_pending`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_tscan_pending

Description

Number of scan requests pending. Removed as of 3.6.0.

Introduced

Removed

Yes

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`tscan_succeeded`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_tscan_succeeded

Description

Number of scan requests that have successfully finished. Removed as of 3.6.0.

Introduced

Removed

Yes

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`uptime`

optional

Context

node_stats

Backend-specific Name

Prometheus: aerospike_node_stats_uptime
Datadog: aerospike.server.uptime

Description

Time in seconds since last server restart.

Introduced

Removed

Measurement type

gauge

Data type

integer

Monitoring

If uptime is below 300 and the cluster is not undergoing maintenance this node restarted within the last 5 minutes. Advise operations to investigate.

Labels

cluster_namejobserviceinstancelongitudelatitude

Sets

`device_data_bytes`

optional

Context

sets

Backend-specific Name

Prometheus: aerospike_sets_device_data_bytes

Description

Device storage used by this set in bytes, for the data part (does not include index part). Value will be 0 if data is not stored on device. For size used in memory, See memory_data_bytes.

Introduced

5.2.0

Removed

7.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudensset

Detail

`memory_data_bytes`

optional

Context

sets

Backend-specific Name

Prometheus: aerospike_sets_memory_data_bytes
Datadog: aerospike.server.sets.memory_data_bytes

Description

Memory used by this set in bytes, for the data part (does not include index part). Value will be 0 if data is not stored in memory. For size used on disk, See device_data_bytes (available in Database 5.2.0 and later), or the set level object size histogram.

Introduced

3.9.0

Removed

7.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudensset

Detail

`ns`

optional

Context

sets

Backend-specific Name

Prometheus: aerospike_sets_ns

Description

Namespace name this set belongs to.

Introduced

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudensset

`objects`

watch

Context

sets

Backend-specific Name

Prometheus: aerospike_sets_objects
Datadog: aerospike.server.sets.objects

Description

Total number of objects (master and all replicas) in this set on this node. This is updated in real time and is not dependent on the nsup-period or nsup-hist-period configurations.

Introduced

3.9.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudensset

`set`

optional

Context

sets

Backend-specific Name

Prometheus: aerospike_sets_set

Description

Name of this set.

Introduced

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudensset

`tombstones`

watch

Context

sets

Backend-specific Name

Prometheus: aerospike_sets_tombstones
Datadog: aerospike.server.sets.tombstones

Description

Total number of tombstones (master and all replicas) in this set on this node.

Introduced

3.10.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudensset

`truncate_lut`

optional

Context

sets

Backend-specific Name

Prometheus: aerospike_sets_truncate_lut
Datadog: aerospike.server.sets.truncate_lut

Description

‘The most covering truncate_lut for this set. See truncate or truncate-namespace.’

Introduced

3.12.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudensset

`truncating`

optional

Context

sets

Backend-specific Name

Prometheus: aerospike_sets_truncating
Datadog: aerospike.server.sets.truncating

Description

Indicates when the set is in the process of being truncated.

Introduced

6.3.0

Removed

Measurement type

gauge

Data type

boolean

Labels

cluster_namejobserviceinstancelongitudelatitudensset

Sindex

`delete_error`

optional

Context

sindex

Backend-specific Name

Prometheus: aerospike_sindex_delete_error

Description

Number of errors while processing a delete transaction for this secondary index.

Introduced

3.9.0

Removed

6.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`delete_success`

optional

Context

sindex

Backend-specific Name

Prometheus: aerospike_sindex_delete_success

Description

Number of successful delete transactions processed for this secondary index.

Introduced

3.9.0

Removed

6.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`entries`

optional

Context

sindex

Backend-specific Name

Prometheus: aerospike_sindex_entries
Datadog: aerospike.server.sindex.entries

Description

Number of secondary index entries for this secondary index. This is the number of records that have been indexed by this secondary index.

Introduced

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`ibtr_memory_used`

optional

Context

sindex

Backend-specific Name

Prometheus: aerospike_sindex_ibtr_memory_used

Description

Amount of memory, in bytes, the secondary index is consuming for the keys, as opposed to nbtr_memory_used which is the amount of memory the secondary index is consuming for the entries. The total being reported by si_accounted_memory.

Introduced

Removed

6.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`keys`

optional

Context

sindex

Backend-specific Name

Prometheus: aerospike_sindex_keys

Description

Number of secondary keys for this secondary index.

Introduced

Removed

6.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`load_pct`

optional

Context

sindex

Backend-specific Name

Prometheus: aerospike_sindex_load_pct
Datadog: aerospike.server.sindex.load_pct

Description

Progress in percentage of the creation of secondary index.

Introduced

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`load_time`

optional

Context

sindex

Backend-specific Name

Prometheus: aerospike_sindex_load_time
Datadog: aerospike.server.sindex.load_time

Description

Time it took for the secondary index to be fully created.

Introduced

6.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`loadtime`

optional

Context

sindex

Backend-specific Name

Prometheus: aerospike_sindex_loadtime

Description

Time it took for the secondary index to be fully created.

Introduced

Removed

6.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`memory_used`

optional

Context

sindex

Backend-specific Name

Prometheus: aerospike_sindex_memory_used

Description

Amount of memory, in bytes, consumed by the secondary index. Renamed to used_bytes in Database 6.3.0. Do not use memory_used in Database 6.3.0 and later.

Introduced

6.0.0

Removed

6.3.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`nbtr_memory_used`

optional

Context

sindex

Backend-specific Name

Prometheus: aerospike_sindex_nbtr_memory_used

Description

Amount of memory, in bytes, the secondary index is consuming for the entries, as opposed to ibtr_memory_used which is the amount of memory the secondary index is consuming for the keys. The total being reported by si_accounted_memory.

Introduced

Removed

6.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_agg`

optional

Context

sindex

Backend-specific Name

Prometheus: aerospike_sindex_query_agg

Description

Number of query aggregations attempted for this secondary index on this node.

Introduced

Removed

5.7.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_agg_avg_rec_count`

optional

Context

sindex

Backend-specific Name

Prometheus: aerospike_sindex_query_agg_avg_rec_count

Description

Average number of records returned by the aggregations underlying queries against this secondary index.

Introduced

Removed

5.7.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_agg_avg_record_size`

optional

Context

sindex

Backend-specific Name

Prometheus: aerospike_sindex_query_agg_avg_record_size

Description

Average size of the records returned by the aggregations underlying queries against this secondary index.

Introduced

Removed

5.7.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_avg_rec_count`

optional

Context

sindex

Backend-specific Name

Prometheus: aerospike_sindex_query_avg_rec_count

Description

Average number of records returned by the all queries against this secondary index (combines query_agg_avg_rec_count and query_lookup_avg_rec_count).

Introduced

Removed

5.7.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_avg_record_size`

optional

Context

sindex

Backend-specific Name

Prometheus: aerospike_sindex_query_avg_record_size

Description

Average size of the records returned by all the queries against this secondary index (combines query_agg_avg_record_size and query_lookup_avg_record_size)

Introduced

Removed

5.7.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_basic_abort`

optional

Context

sindex

Backend-specific Name

Prometheus: aerospike_sindex_query_basic_abort

Description

Number of basic queries aborted for this secondary index. Removed in Database 6.0.0, use si_query_long_basic_abort.

Introduced

5.7.0

Removed

6.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_basic_avg_rec_count`

optional

Context

sindex

Backend-specific Name

Prometheus: aerospike_sindex_query_basic_avg_rec_count

Description

Average number of records returned by the lookup queries against this secondary index.

Introduced

5.7.0

Removed

6.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_basic_complete`

optional

Context

sindex

Backend-specific Name

Prometheus: aerospike_sindex_query_basic_complete

Description

Number of basic queries completed for this secondary index. Removed in Database 6.0.0, use si_query_long_basic_complete.

Introduced

5.7.0

Removed

6.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_basic_error`

optional

Context

sindex

Backend-specific Name

Prometheus: aerospike_sindex_query_basic_error

Description

Number of basic queries that returned error for this secondary index. Removed in Database 6.0.0, use si_query_long_basic_error.

Introduced

5.7.0

Removed

6.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_lookup_avg_rec_count`

optional

Context

sindex

Backend-specific Name

Prometheus: aerospike_sindex_query_lookup_avg_rec_count

Description

Average number of records returned by the lookup queries against this secondary index. Renamed to query_basic_avg_rec_count in Database 5.7.0.

Introduced

Removed

5.7.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_lookup_avg_record_size`

optional

Context

sindex

Backend-specific Name

Prometheus: aerospike_sindex_query_lookup_avg_record_size

Description

Average size of the records returned by the lookup queries against this secondary index.

Introduced

Removed

5.7.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_lookups`

optional

Context

sindex

Backend-specific Name

Prometheus: aerospike_sindex_query_lookups

Description

Number of lookup queries ever attempted for this secondary index on this node. Removed in Database 5.7.0. Use query_basic_complete + query_basic_error + query_basic_abort instead.

Introduced

Removed

5.7.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`query_reqs`

optional

Context

sindex

Backend-specific Name

Prometheus: aerospike_sindex_query_reqs

Description

Number of query requests ever attempted for this secondary index on this node (combines query_lookups and query_agg).

Introduced

Removed

5.7.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`si_accounted_memory`

optional

Context

sindex

Backend-specific Name

Prometheus: aerospike_sindex_si_accounted_memory

Description

Amount of memory, in bytes, the secondary index is consuming. Removed in Database 5.7.0 the sum of ibtr_memory_used and nbtr_memory_used.

Introduced

Removed

5.7.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`si_query_short_basic_complete`

optional

Context

sindex

Backend-specific Name

Prometheus: aerospike_sindex_si_query_short_basic_complete
Datadog: aerospike.server.namespace.si_query_short_basic_complete

Description

Number of basic short secondary index queries completed for this secondary index.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`si_query_short_basic_error`

optional

Context

sindex

Backend-specific Name

Prometheus: aerospike_sindex_si_query_short_basic_error
Datadog: aerospike.server.namespace.si_query_short_basic_error

Description

Number of basic short secondary index queries that returned error for this secondary index.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`si_query_short_basic_timeout`

optional

Context

sindex

Backend-specific Name

Prometheus: aerospike_sindex_si_query_short_basic_timeout
Datadog: aerospike.server.namespace.si_query_short_basic_timeout

Description

Short queries are not monitored, so they cannot be aborted. They might time out, which is reflected in this statistic.

Introduced

6.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`stat_gc_recs`

optional

Context

sindex

Backend-specific Name

Prometheus: aerospike_sindex_stat_gc_recs
Datadog: aerospike.server.sindex.stat_gc_recs

Description

Number of records that have been garbage collected out of the secondary index memory. See sindex-gc-period and sindex-gc-max-rate configuration parameters for tuning the secondary index garbage collection. ”

Introduced

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`stat_gc_time`

optional

Context

sindex

Backend-specific Name

Prometheus: aerospike_sindex_stat_gc_time

Description

Amount of time spent processing garbage collection for the secondary index. See sindex-gc-period and sindex-gc-max-rate configuration parameters for tuning the secondary index garbage collection.

Introduced

Removed

5.7.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`used_bytes`

optional

Context

sindex

Backend-specific Name

Prometheus: aerospike_sindex_used_bytes
Datadog: aerospike.server.sindex.used_bytes

Description

Amount of memory, in bytes, consumed by the secondary index.

NOTE: Renamed from memory_used in Database 6.3.0.

Introduced

6.3.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitude

`write_error`

optional

Context

sindex

Backend-specific Name

Prometheus: aerospike_sindex_write_error

Description

Number of errors while processing a write transaction for this secondary index.

Introduced

3.9.0

Removed

6.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

`write_success`

optional

Context

sindex

Backend-specific Name

Prometheus: aerospike_sindex_write_success

Description

Number of successful write transactions processed for this secondary index.

Introduced

3.9.0

Removed

6.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudens

Users

`conns_in_use`

Context

users

Backend-specific Name

Prometheus: aerospike_users_conns_in_use

Description

Number of client connections for a given user.

Introduced

5.6.0

Removed

Measurement type

gauge

Data type

integer

Monitoring

To see metrics from asadm use the command:

show users statistics

If you are using the Aerospike Prometheus Exporter these metrics are shown in the Users View.

Labels

cluster_namejobserviceinstancelongitudelatitudeuser

Detail

When security is enabled, per node user metrics are available from the security protocol.

`limitless_read_scan_query`

Context

users

Backend-specific Name

Prometheus: aerospike_users_limitless_read_scan_query

Description

Limitless read query requests per second for a given user.

Introduced

5.6.0

Removed

Measurement type

moving average

Monitoring

To see metrics from asadm use the command:

show users statistics

If you are using the Aerospike Prometheus Exporter these metrics are shown in the Users View.

Labels

cluster_namejobserviceinstancelongitudelatitudeuser

Detail

When security is enabled and enable-quotas is true, per node user metrics available from the security protocol. For more information, see Enable access control.

`limitless_write_scan_query`

Context

users

Backend-specific Name

Prometheus: aerospike_users_limitless_write_scan_query

Description

Limitless write query requests per second for a given user.

Introduced

5.6.0

Removed

Measurement type

moving average

Data type

integer

Monitoring

To see metrics from asadm use the command:

show users statistics

If you are using the Aerospike Prometheus Exporter these metrics are shown in the Users View.

Labels

cluster_namejobserviceinstancelongitudelatitudeuser

Detail

When security is enabled and enable-quotas is true, per node user metrics are available from the security protocol. For more information, see Enable access control.

`read_scan_query_rps`

Context

users

Backend-specific Name

Prometheus: aerospike_users_read_scan_query_rps

Description

Read query requests per second for a given user.

Introduced

5.6.0

Removed

Measurement type

gauge

Data type

integer

Monitoring

To see metrics from asadm use the command:

show users statistics

If you are using the Aerospike Prometheus Exporter these metrics are shown in the Users View.

Labels

cluster_namejobserviceinstancelongitudelatitudeuser

Detail

When security is enabled and enable-quotas is true, per node user metrics are available from the security protocol. See Enable access control for more information about these metrics.

`read_single_record_tps`

Context

users

Backend-specific Name

Prometheus: aerospike_users_read_single_record_tps

Description

Read transactions per second for a given user.

Introduced

5.6.0

Removed

Measurement type

moving average

Data type

integer

Monitoring

To see metrics from asadm use the command:

show users statistics

If you are using the Aerospike Prometheus Exporter these metrics are shown in the Users View.

Labels

cluster_namejobserviceinstancelongitudelatitudeuser

Detail

When security is enabled and enable-quotas is true, per node user metrics are available from the security protocol. For more information, see Enable access control.

`write_scan_query_rps`

Context

users

Backend-specific Name

Prometheus: aerospike_users_write_scan_query_rps

Description

Write query requests per second for a given user.

Introduced

5.6.0

Removed

Measurement type

moving average

Data type

integer

Monitoring

To see metrics from asadm use the command:

show users statistics

If you are using the Aerospike Prometheus Exporter these metrics are shown in the Users View.

Labels

cluster_namejobserviceinstancelongitudelatitudeuser

Detail

When security is enabled and enable-quotas is true, per node user metrics are available from the security protocol. For more information, see Enable access control.

`write_single_record_tps`

Context

users

Backend-specific Name

Prometheus: aerospike_users_write_single_record_tps

Description

Write transactions per second for a given user.

Introduced

5.6.0

Removed

Measurement type

moving average

Data type

integer

Monitoring

To see metrics from asadm use the command:

show users statistics

If you are using the Aerospike Prometheus Exporter these metrics are shown in the Users View.

Labels

cluster_namejobserviceinstancelongitudelatitudeuser

Detail

When security is enabled and enable-quotas is true, per node user metrics are available from the security protocol. For more information, see Enable access control.

Xdr

`abandoned`

warn

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_abandoned
Datadog: aerospike.server.xdr.abandoned

Description

Number of records abandoned because of permanent failure at the destination. The destination configuration must be changed for these records to be successfully shipped.

Introduced

5.0.0

Removed

Measurement type

counter

Data type

integer

Monitoring

If abandoned is consistently higher than expected alert operations to investigate.

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`active_failed_node_sessions`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_active_failed_node_sessions

Description

Number of active failed node sessions pending. A failed node session keeps track of node at the local cluster that have left the cluster and need other nodes to ship on their behalf until they join back.

Introduced

3.9.0

Removed

5.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`active_link_down_sessions`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_active_link_down_sessions

Description

Number of active link down sessions pending. A link down session keeps track of destination clusters that are not reachable for a given time window.

Introduced

3.9.0

Removed

5.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`bytes_shipped`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_bytes_shipped

Description

Number of bytes shipped for a namespace to a DC by XDR.

Introduced

6.1.0

Removed

Measurement type

counter

Data type

decimal

Monitoring

Use the asinfo command get-stats to report these metrics.

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`compression_ratio`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_compression_ratio

Description

Running average compression ratio. Example: asinfo -h localhost -l -v get-stats:context=xdr;dc=aerospike_b;namespace=test

Introduced

5.0.0

Removed

Measurement type

moving average

Data type

decimal

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`dc_as_open_conn`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_dc_as_open_conn

Description

Number of open connection to the Aerospike DC. If the DC accepts pipeline writes, there will be 64 connections per destination node. Replaced dc_open_conn starting with Database 4.4.0.

Introduced

4.4.0

Removed

5.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`dc_as_size`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_dc_as_size

Description

The cluster size of the destination Aerospike datacenter (DC). Replaced by dc_size starting with Database 4.4.0.

Introduced

4.4.0

Removed

5.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`dc_http_good_locations`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_dc_http_good_locations

Description

Number of URLs that are considered healthy and being used by the change notification system. Part of the change notification.

Introduced

4.4.0

Removed

5.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`dc_http_locations`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_dc_http_locations

Description

Number of URLs configured for the HTTP destination. Part of the change notification.

Introduced

4.4.0

Removed

5.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`dc_ship_attempt`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_dc_ship_attempt

Description

Number of records that have been attempted to be shipped, but could have resulted in either success or error. See dc_ship_success for successfully shipped records.

Introduced

3.9.0

Removed

5.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`dc_ship_bytes`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_dc_ship_bytes

Description

Number of bytes shipped for this DC.

Introduced

3.9.0

Removed

5.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`dc_ship_delete_success`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_dc_ship_delete_success

Description

Number of delete transactions that have been successfully shipped. This is the per DC statistic for xdr_ship_delete_success.

Introduced

3.9.0

Removed

5.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`dc_ship_destination_error`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_dc_ship_destination_error

Description

Number of errors from the remote cluster(s) while shipping records for this DC. Errors include out-of-space, key-busy, etc. This is the per DC statistic for xdr_ship_destination_error.

Introduced

3.9.0

Removed

5.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`dc_ship_idle_avg`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_dc_ship_idle_avg

Description

Average number of ms of sleep for each record being shipped. 0.000 if there is no throttling. Throttling will occur if the set throughput limit (xdr-max-ship-throughput) has been reached or in case of unexpected slowdown at the destination cluster. This is part of the rsas entry in the logs (xdr context).

Introduced

3.9.0

Removed

5.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`dc_ship_idle_avg_pct`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_dc_ship_idle_avg_pct

Description

Representation in percent of total time spent for dc_ship_idle_avg. This is part of the rsas entry in the logs (xdr context).

Introduced

3.9.0

Removed

5.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`dc_ship_inflight_objects`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_dc_ship_inflight_objects

Description

Number of records that are inflight (which have been shipped but for which a response from the remote DC has not yet been received).

Introduced

3.9.0

Removed

5.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`dc_ship_latency_avg`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_dc_ship_latency_avg

Description

Moving average of shipping latency for the specific DC.

Introduced

3.9.0

Removed

5.0.0

Measurement type

moving average

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`dc_ship_source_error`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_dc_ship_source_error

Description

Number of client layer errors while shipping records for this DC. Errors include timeout, bad network fd, etc. This is the per DC statistic for xdr_ship_source_error.

Introduced

3.9.0

Removed

5.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`dc_ship_success`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_dc_ship_success

Description

Number of records that have been successfully shipped. This is the per DC statistic for xdr_ship_success.

Introduced

3.9.0

Removed

5.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`dc_state`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_dc_state

Description

State of the DC. Here are the different statuses: CLUSTER_INACTIVE, CLUSTER_UP, CLUSTER_DOWN, CLUSTER_WINDOW_SHIP.
- The CLUSTER_INACTIVE state is for a DC that has not been seeded (configured) in the XDR stanza and would be a place holder for a future dynamic seeding.
- The CLUSTER_UP state is the normal state for a DC that is able to receive records from an XDR client and is currently not having any records being shipped to it from a previous window where it was down (which would be the CLUSTER_WINDOW_SHIP state).
- A cluster will be in CLUSTER_DOWN when the source (XDR client) cannot connect to it for over 30 seconds. This would prevent the entries in the digestlog to be reclaimed. The XDR client will periodically try to reconnect and upon succeeding, will spawn a window shipper to ‘catch up’ then entries in the digestlog that were missed. The DC specific lag (dc_timelag) will increase in such state but will not be accounted for in the overall XDR timelag (xdr_timelag).
- A cluster states switches to CLUSTER_WINDOW_SHIP when it can be re-connected to after being in CLUSTER_DOWN state. The DC specific lag (dc_timelag) will be accounted for in the overall XDR timelag (xdr_timelag).

Introduced

3.8.1

Removed

5.0.0

Measurement type

gauge

Data type

string

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`dc_timelag`

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_dc_timelag

Description

Time lag for this specific DC. See xdr_timelag for details of how this is calculated.

Introduced

3.8.1

Removed

Measurement type

gauge

Data type

integer

Monitoring

If dc_timelag consistently greater than a few seconds it may indicate network connectivity issues or errors writing at a destination cluster.

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`dlog_free_pct`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_dlog_free_pct

Description

Percentage of the digest log free and available for use.

Introduced

3.9.0

Removed

5.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`dlog_logged`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_dlog_logged

Description

Number of records logged into digest log.

Introduced

3.9.0

Removed

5.0.0

Measurement type

counter

Data type

integer

Monitoring

Trending stat_recs_logged allows operations insight into how many records are being enqueued for shipment over time.

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`dlog_overwritten_error`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_dlog_overwritten_error

Description

Number of digest log entries that got overwritten.

Introduced

3.9.0

Removed

5.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`dlog_processed_link_down`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_dlog_processed_link_down

Description

Number of linkdown that were processed.

Introduced

3.9.0

Removed

5.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`dlog_processed_main`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_dlog_processed_main

Description

Number of records processed on the local Aerospike server.

Introduced

3.9.0

Removed

5.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`dlog_processed_replica`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_dlog_processed_replica

Description

Number of records processed for a node in the cluster that is not the local node.

Introduced

3.9.0

Removed

5.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`dlog_relogged`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_dlog_relogged

Description

Number of records relogged by this node into the digest log due to temporary issues when attempting to ship. A relogged digest log entry would be caused by one of three potential conditions: - An issue with the local client when attempting to ship (tracked by xdr_ship_source_error). - An issue with the network or the destination cluster itself (tracked by xdr_ship_destination_error). - An issue when reading the record on the local node(tracked by xdr_read_error), but those would actually end up relogged on the node now owning the record (see relogged_outgoing).

Introduced

3.9.0

Removed

5.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

Detail

The XDR component typically processes only master record’s digest log entries on a given node (the exception being during failed node processing, when a node on the source cluster has failed). When relogging such master record’s dlog entry, the corresponding prole copy would also be relogged on the respective node holding the replicas. This would increment the relogged_outgoing statistic on the current node and the relogged_incoming on the receiving node. It is therefore expected to see the dlog_relogged and relogged_outgoing statistics matching for clusters that are stable (no migrations).

The relogs happening due to master partition ownership changes (migrations) are also tracked through relogged_incoming and relogged_outgoing.

Permanent errors will not be relogged but will have a WARNING log message at the destination cluster (for example, to name a few, invalid namespace, record too big if mismatched write-block-size between source and destination, authentication or permission error).

Some Permanent Errors: AEROSPIKE_ERR_RECORD_TOO_BIG, AEROSPIKE_ERR_REQUEST_INVALID, AEROSPIKE_ERR_ALWAYS_FORBIDDEN.
Some Transient Errors: AEROSPIKE_ERR_SERVER, AEROSPIKE_ERR_CLUSTER_CHANGE, AEROSPIKE_ERR_SERVER_FULL, AEROSPIKE_ERR_CLUSTER, AEROSPIKE_ERR_RECORD_BUSY, AEROSPIKE_ERR_DEVICE_OVERLOAD, AEROSPIKE_ERR_FAIL_FORBIDDEN.

See the C client errors for the exhaustive list.

`dlog_used_objects`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_dlog_used_objects

Description

Total number of records slots used in the digest log.

Introduced

3.9.0

Removed

5.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`filtered_out`

watch

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_filtered_out

Description

Number of local records that are skipped after having been read but before actual shipment. Such records might be skipped because of the configured shipping rules. For example, if the rules exclude all bins of a record, the record is skipped.

This counter does not include records not submitted to the XDR queue, such as a record that is not eligible for shipping because its set is disabled.

Introduced

5.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`global_lastshiptime`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_global_lastshiptime

Description

Minimum last ship time in millisecond (epoch) for XDR for across the cluster. Specifies to what point can slots in the digest log can be reclaimed, by tracking the oldest last ship time across all nodes in the cluster.

Introduced

3.10.0

Removed

5.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`hot_keys`

watch

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_hot_keys

Description

Number of times a record write is skipped from processing because that record is already pending processing. This value also includes the number of records skipped for replica partitions.

Introduced

5.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`hotkey_fetch`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_hotkey_fetch

Description

If there are hot keys in the system (same record updated quite frequently), XDR optimizes by not shipping all the updates. This stat represents the number of record’s digest that are actually shipped because their cache entries expired and were dirty. Interpret in conjunction with xdr_hotkey_skip. The timeout of the cache entries is controlled by xdr-hotkey-time-ms.

Introduced

3.9.0

Removed

5.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`hotkey_skip`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_hotkey_skip

Description

Replaces noship_recs_dup_intrabatch and noship_recs_genmismatch. If there are hot keys in the system (same record updated quite frequently), XDR optimizes by not shipping all the updates. This stat represents the number of record’s digests that are skipped due to an already existing entry in the reader’s thread cache (meaning a version of this record was just shipped). Interpret in conjunction with xdr_hotkey_fetch. The timeout of the cache entries is controlled by xdr-hotkey-time-ms.

Introduced

3.9.0

Removed

5.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`in_progress`

watch

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_in_progress

Description

Number of records that are pending completion. Records can be in different stages like local read, network send, pending acknowledgment. If a record is being retried (see retry_conn_reset, retry_dest, and retry_no_node, it is not considered complete and repeats the cycle.

Introduced

5.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`in_queue`

watch

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_in_queue

Description

Number of records in the in-memory transaction queue still to be processed. These are the records which have been written into the xdr transaction-queue but have not been picked up yet to processed further by XDR.

Introduced

5.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`lag`

critical

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_lag
Datadog: aerospike.server.xdr.lag

Description

Lag in seconds between the destination and the source datacenters. This gives an indication of how much behind the source lags in term of shipping records, or, in other terms, how long have records been waiting at the source before being shipped to that DC.
Here are a bit more details:
The lag is the difference between the last update time of the records being shipped (called ‘last ship time’ or LST) and the current time. The LST is internally maintained per partition and aggregated at the namespace level (minimum across all partitions). The lag can seem unsettled (step function) while recoveries are in progress (See the recoveries_pending statistic). This is because the recovery for a partition can take a while and the LST is updated only on completion of a recovery pass (as opposed to per record). A recovery pass is considered complete only after the batch of records for a given partition is completely and successfully shipped (no elements left in the retry queue).

Introduced

5.0.0

Removed

Measurement type

gauge

Data type

integer

Monitoring

If lag is consistently greater than a few seconds, this condition might indicate network connectivity issues or errors writing at a destination cluster.<br /

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`lap_us`

warn

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_lap_us
Datadog: aerospike.server.xdr.lap_us

Description

Time in microseconds (μsecs) taken to process records across partitions in one lap (processing cycle). This is diagnostic information. A higher number indicates slowness of source in processing the records.

Available only at the dc level, not namespace level. Example: asinfo -h localhost -l -v get-stats:context=xdr;dc=aerospike_b

Introduced

5.0.0

Removed

Measurement type

gauge

Data type

integer

Monitoring

If lap_us is consistently higher than expected alert operations to investigate.

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`latency_ms`

warn

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_latency_ms
Datadog: aerospike.server.xdr.latency_ms

Description

Average network latency for the successfully shipped latency. This value does not include timed-out shipment attempts or any other errors. Updated every log ticker interval (10 seconds by default).

Available only at the dc level, not namespace level. Example: asinfo -h localhost -l -v get-stats:context=xdr;dc=aerospike_b

Introduced

5.0.0

Removed

Measurement type

gauge

Data type

moving average

Monitoring

Depending on configuration, latency_ms should be within the latency of the link between the DCs.

If latency_ms increases beyond the expectations based on the distance (or known link latency) between clusters, alert operations to investigate.

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`local_recs_migration_retry`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_local_recs_migration_retry

Description

Number of records missing in a batch call, generally a result of migrations, but can also be caused by expiration and eviction.

Introduced

3.2.7

Removed

6.4.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`nodes`

watch

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_nodes

Description

Number of nodes in the destination DC as seen by XDR. There may be some delay for the remote changes to be reflected in this stat, especially on node departure, as XDR gives some grace period before removing a node.

Introduced

5.3.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`not_found`

watch

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_not_found

Description

Number of local records not found by XDR when attempting to read them. Such records might have been expired, evicted, or deleted.

Introduced

5.0.0

Removed

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`queue_overflow_error`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_queue_overflow_error

Description

Number of XDR queue overflow errors. Typically happens when there are no physical space available on the storage holding the digest log, or if the writes are happening at such a rate that elements are not written fast enough to the digest log. The number of entries this queue can hold is 1 million.

Introduced

3.9.0

Removed

5.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`read_active_avg_pct`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_read_active_avg_pct

Description

This statistics reflects how busy the XDR read threads are by calculating, the average time in percent of total time that the XDR read threads spend actually processing something vs. waiting for a new digest log entry to arrive on their queues from the dlogreader / failed node shippers / window shippers.

Introduced

3.9.0

Removed

5.0.0

Measurement type

moving average

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`read_error`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_read_error

Description

Number of read requests initiated by XDR that failed. Those are rare, but if present, would typically be caused by reservation failures (node lost master and/or prole ownership of the partition the record belonged to during migrations). This will cause the record’s digest log entry to be relogged to the node now owning the partition (tracked under relogged_outgoing). Other rare cases would be for example when running out of memory or failure to access the storage layer. For the total number of XDR initiated read requests, sum up the xdr_read_success, xdr_read_notfound and xdr_read_error statistics.

Introduced

3.9.0

Removed

5.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`read_idle_avg_pct`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_read_idle_avg_pct

Description

This is a sister statistic to xdr_read_active_avg_pct and represents the average time in percent of total time that the XDR read threads waits for a new digest log entry to arrive on their queues from the dlogreader / failed node shippers / window shippers.

Introduced

3.9.0

Removed

5.0.0

Measurement type

moving average

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`read_latency_avg`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_read_latency_avg

Description

Moving average latency in milliseconds for XDR to read a record.

Introduced

3.9.0

Removed

5.0.0

Measurement type

moving average

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`read_notfound`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_read_notfound

Description

Number of read requests initiated by XDR that were not found. These do not get relogged. This would typically happen if a record is updated and then deleted, but a lag caused the entry to for the record update to be processed after the record has been deleted. For the total number of XDR initiated read requests, sum the xdr_read_success, xdr_read_notfound and xdr_read_error statistics.

Introduced

3.9.0

Removed

5.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`read_reqq_used`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_read_reqq_used

Description

How many digest log entries are currently in the XDR read threads queues. Each XDR read thread has an in-memory queue with a capacity of 1,000 log entries associated with it. See also related statistic xdr_read_reqq_used_pct. When the dlogreader / failed node shipper / window shipper cannot write to a queue, because the queue is full, it blocks, until there’s space in the queue again.

Introduced

3.9.0

Removed

5.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`read_reqq_used_pct`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_read_reqq_used_pct

Description

Sister statistic to xdr_read_reqq_used to represent how full in percent the XDR read request queues are.

Introduced

3.9.0

Removed

5.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`read_respq_used`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_read_respq_used

Description

How many entries are being used in the XDR read response queues. Those queues are used to hand back records after they have been locally fetched. Those queues are similar to the queues referred to in the xdr_read_reqq_used stat except for the fact that they are not bounded. The throttling would happen at the XDR read request queues.

Introduced

3.9.0

Removed

5.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`read_success`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_read_success

Description

Number of read requests initiated by XDR that succeeded. For the total number of XDR initiated read requests, sum up the xdr_read_success, xdr_read_notfound and xdr_read_error statistics.

Introduced

3.9.0

Removed

5.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`read_txnq_used`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_read_txnq_used

Description

Number of XDR read commands that are in flight in the local transaction queue. XDR limits to 10,000 the number of outstanding XDR read requests. The requests are placed in an internal transaction queue. See xdr_read_txnq_used_pct for the percent used in this queue.

Introduced

3.9.0

Removed

5.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`read_txnq_used_pct`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_read_txnq_used_pct

Description

Percent used of the XDR read commands that are in flight (out of a maximum allowed of 10,000) in the transaction queue. It is an internal transaction queue. See xdr_read_txnq_used for the number of XDR issued reads that are in flight.

Introduced

3.9.0

Removed

5.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`recoveries`

warn

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_recoveries
Datadog: aerospike.server.xdr.recoveries

Description

Number of partitions that are recovered by reducing the primary index of that partition. Recovery is done when the in-memory transaction queue of the partition is either full or if necessary records are not present in the in-memory transaction queue.

See also recoveries_pending.

Introduced

5.0.0

Removed

Measurement type

counter

Data type

integer

Monitoring

If recoveries is consistently increasing alert operations to investigate.

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`recoveries_pending`

warn

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_recoveries_pending
Datadog: aerospike.server.xdr.recoveries_pending

Description

Number of recoveries currently pending.

If recoveries_pending is zero, there are no recoveries in progress. Non-zero indicates the number of recoveries in progress.

Introduced

5.0.0

Removed

Measurement type

gauge

Data type

integer

Monitoring

If recoveries_pending is unexpectedly increasing alert operations to investigate.

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`relogged_incoming`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_relogged_incoming

Description

Number of records relogged into this node’s digest log by another node. This typically happens during the following situations:

migrations at the source cluster, when there are outstanding digest log entries and the partition ownership changes by the time they are processed, if the local node does not own master or prole copy of the partition such record belongs to, the node now owning the master copy of the partition would get an incoming digest log entry relogged to it.
when a node relogs record’s digest log entries to itself (dlog_relogged), it will also relog those for the node owning the prole counterpart.

Introduced

3.9.0

Removed

5.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

Detail

The sending node will then have its relogged_outgoing statistic incremented.

`relogged_outgoing`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_relogged_outgoing

Description

Number of records relogged to another node’s digest log. This typically happens during the following situations:
- migrations at the source cluster, when there are outstanding digest log entries for which the local node does not own either master or prole partition for the record anymore (xdr_read_error)
- when a node relogs record’s digest log entries to itself (dlog_relogged), it will also relog those for the node owning the prole counterpart.

Introduced

3.9.0

Removed

5.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

Detail

The receiving node will then have its relogged_incoming statistic incremented.

`retry_conn_reset`

warn

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_retry_conn_reset
Datadog: aerospike.server.xdr.retry_conn_reset

Description

Number of records whose shipment is retried due to a reset of the connection to the remote datacenter. A connection can be reset due to timeouts (10s), network problems, or destination node restarts.

This statistic can increase in bursts. Because of the XDR pipeline, there can be many records that are retried when a connection is reset.

Introduced

5.0.0

Removed

Measurement type

counter

Data type

integer

Monitoring

If retry_conn_reset is consistently higher than expected alert operations to investigate.

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`retry_dest`

warn

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_retry_dest
Datadog: aerospike.server.xdr.retry_dest

Description

Number of records retried due to a temporary error returned by destination node. The destination node has responded with a specific error code; therefore, such errors are not related to the network. Such errors include key busy and device overload.

Introduced

5.0.0

Removed

Measurement type

counter

Data type

integer

Monitoring

If retry_dest is consistently higher than expected alert operations to investigate.

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`retry_no_node`

warn

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_retry_no_node
Datadog: aerospike.server.xdr.retry_no_node

Description

Number of records retried because XDR cannot determine which destination node is the master.

This typically happens when XDR does not discover the full cluster of the destination, perhaps due to firewall settings. In such a case, the master for all partitions cannot be known. The other possibility is that the entire namespace is not present on the destination cluster.

Introduced

5.1.0

Removed

Measurement type

counter

Data type

integer

Monitoring

If retry_no_node is consistently higher than expected alert operations to investigate.

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`ship_bytes`

watch

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_ship_bytes

Description

Estimated number of bytes XDR has shipped to remote clusters.

Introduced

3.9.0

Removed

5.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`ship_compression_avg_pct`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_ship_compression_avg_pct

Description

Used to determine how beneficial compression is (higher is better).

Introduced

3.9.0

Removed

5.0.0

Measurement type

moving average

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`ship_delete_success`

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_ship_delete_success

Description

Number of delete operations that were successfully shipped.

Introduced

3.9.0

Removed

5.0.0

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`ship_destination_error`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_ship_destination_error

Description

Number of errors from the remote cluster(s) while shipping records. Errors include timeout, out-of-space, key-busy, etc. Those would be typically relogged, except in case of permanent error (tracked under xdr_ship_destination_permanent_error — for example records too big or some bad namespace configuration), in which case they trigger a WARNING log message at the destination. For the total number of records XDR attempted to ship, sum up xdr_ship_success, xdr_ship_source_error and xdr_ship_destination_error. Those do not count errors while attempting to read the record locally, but only errors after a record to be shipped has been passed to XDR’s underlying C client. For errors reading records locally, See xdr_read_error.

Introduced

3.9.0

Removed

5.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`ship_destination_permanent_error`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_ship_destination_permanent_error

Description

Number of permanent errors from the remote cluster(s) while shipping records. Example errors include records too big or some bad namespace configuration, in which case they trigger a WARNING log message at the destination and will not be relogged. These do not count errors while attempting to read the record locally, but only errors after a record to be shipped has been passed to XDR’s underlying C client. For errors reading records locally, See xdr_read_error. For all errors while shipping to a destination, see xdr_ship_destination_error.

Introduced

4.4.0.4

Removed

5.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`ship_fullrecord`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_ship_fullrecord

Description

Number of records that did not take advantage of bin level shipping (see xdr-ship-bins).

Introduced

3.9.0

Removed

5.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`ship_inflight_objects`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_ship_inflight_objects

Description

Number of objects that are inflight (which have been shipped but for which a response from the remote DC has not yet been received).

Introduced

3.9.0

Removed

5.0.0

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`ship_latency_avg`

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_ship_latency_avg

Description

Moving average latency in milliseconds to ship a record to remote Aerospike clusters. This is computed by dividing time into 1 second intervals.

Introduced

3.9.0

Removed

5.0.0

Measurement type

gauge

Data type

integer

Monitoring

Depending on configuration, xdr_ship_latency_avg should be within the latency of the link between the DCs.

If xdr_ship_latency_avg increases beyond the expectations based on the distance (or known link latency) between clusters, alert operations to investigate.

Labels

cluster_namejobserviceinstancelongitudelatitudedc

Detail

The average is calculated over each 1 second interval separately and then thrown into the exponential moving average. The exponential moving average is actually a moving average of independent 1-second averages. This is done to avoid having some time intervals where there is a much higher volume of transactions having a heavier weight compared to time intervals with much fewer transactions.

`ship_outstanding_objects`

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_ship_outstanding_objects

Description

Number of outstanding records not yet processed. This only applies to the main thread and will not account for digest log entries pending window shipper or failed node processing. It represents the difference between the write pointer position and the read pointer position. It also does not account for entries pending in the queue prior to being flushed to the digest log, which can go up to 100 entries or 500ms if not full by that time (configurable through xdr-digestlog-iowait-ms).

Introduced

3.9.0

Removed

5.0.0

Measurement type

gauge

Data type

integer

Monitoring

Trending xdr_ship_outstanding_objects allows operations insight into how the XDR record transmit queue size changes over time.

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`ship_source_error`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_ship_source_error

Description

Number of client layer errors while shipping records. Errors include connection errors, bad network fd, etc. For the total number of records XDR attempted to ship, sum up xdr_ship_success, xdr_ship_source_error and xdr_ship_destination_error. Those do not count errors while attempting to read the record locally, but only errors after a record to be shipped has been passed to XDR’s underlying C client. For errors reading records locally, See xdr_read_error.

Introduced

3.9.0

Removed

5.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`ship_success`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_ship_success

Description

Number of records successfully shipped to remote Aerospike clusters (across all datacenters configured, meaning one record successfully shipped to 3 different datacenters will increment this counter by 3). Includes xdr_ship_delete_success. For the total number of records XDR attempted to ship, sum up xdr_ship_success, xdr_ship_source_error and xdr_ship_destination_error. Those do not count errors while attempting to read the record locally, but only errors after a record to be shipped has been passed to XDR’s underlying C client. For errors reading records locally, See xdr_read_error.

Introduced

3.9.0

Removed

5.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`stat_pipe_reads_diginfo`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_stat_pipe_reads_diginfo

Description

Number of digest information read from the named pipe.

Introduced

3.2.7

Removed

6.4.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`success`

warn

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_success
Datadog: aerospike.server.xdr.success

Description

Number of records successfully shipped to remote datacenters.

Introduced

5.0.0

Removed

Measurement type

counter

Data type

integer

Monitoring

If success is consistently lower than expected alert operations to investigate.

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`throughput`

watch

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_throughput

Description

Number of records successfully shipped per second. Updated every log ticker interval (10 secs by default).

Introduced

5.0.0

Removed

Measurement type

gauge

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`timelag`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_timelag

Description

Time in seconds it took the latest shipped record from the moment it was first written at the source until it was attempted to be shipped to the destination cluster. This is equivalent to the time its digestlog entry waited in the digestlog before being processed. Each record written at the source is timestamped as it gets written into the XDR digestlog.

Introduced

3.8.1

Removed

5.0.0

Measurement type

gauge

Data type

integer

Monitoring

[Removed in 5.0] If xdr_timelag is consistently greater than a few seconds, this condition might indicate network connectivity issues or errors writing at a destination cluster.

The knowledge base article on FAQ - What are the causes of XDR throttling might be helpful.

Labels

cluster_namejobserviceinstancelongitudelatitudedc

Detail

When having multiple destination DCs, this represents the maximum time lag across all the remote DCs that are not in the CLUSTER_INACTIVE or CLUSTER_DOWN states (see dc_state). Under normal operations, though, the timelag for each DC that are in the CLUSTER_UP state will be the same, given that XDR ships records in lock-step. The timelag at each DC would be different when a DC is in the CLUSTER_DOWN or in the CLUSTER_WINDOW_SHIP state. This does not represent the time it will take for XDR to ‘catch up’, nor does it necessarily relate to the number of outstanding digests in the digest log still to be processed. For per DC time lag, see dc_timelag.

`uncompressed_pct`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_uncompressed_pct

Description

Running average percentage of records not compressed because they are below the compression threshold (100) or failed to be compressed at all. See also related parameter enable-compression.

Introduced

5.0.0

Removed

Measurement type

moving average

Data type

decimal

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`uninitialized_destination_error`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_uninitialized_destination_error

Description

Number of records in the digest log not shipped because the destination cluster has not been initialized for a DC that is configured for a namespace. This should not happen. Those errors are not counted as xdr_ship_*_error.

Introduced

3.9.0

Removed

5.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc

`unknown_namespace_error`

optional

Context

xdr

Backend-specific Name

Prometheus: aerospike_xdr_unknown_namespace_error

Description

Number of records in the digest log not shipped because they belong to an unknown namespace, on the source cluster. One situation where this would happen is if a namespace is removed (or the order of namespaces is changed in the configuration) while there are some entries in the digest log not processed yet. This should not happen in most cases. Those errors are not counted as xdr_ship_*_error.

Introduced

3.9.0

Removed

5.0.0

Measurement type

counter

Data type

integer

Labels

cluster_namejobserviceinstancelongitudelatitudedc