Key metrics to monitor
Aerospike recommends that you monitor your system with the metrics listed on this page. For a complete list of metrics, see the Metric reference.
Operating system and server health
- Monitor system metrics with the Prometheus Node Exporter or an OS-specific tool.
- Use the The Aerospike health monitor to detect health outliers.
Finding total namespace memory
The metric memory_used_bytes was removed in Database 7.0.0 to streamline configuration and capacity planning, and to stabilize overhead so that memory usage calculations are more accurate.
In Database 7.0.0, no single metric reports the amount of memory used in the namespace. A combination of items provide the same information as memory_used_bytes.
You allocated a specific amount of storage for your namespace when you created it. You also set a limit in the system-memory-pct parameter that tells Aerospike when the memory is full enough to stop writing to the namespace.
Before you reach that limit, you can determine the total memory used in the namespace by adding the following individual metrics. Depending on which of the following is stored in memory in your namespace, add up the values to get the total used memory bytes:
data_used_bytesindex_used_bytesset_index_used_bytessindex_used_bytes
You may also run the info namespace command in the Aerospike Admin (asadm) tool.
See Aerospike Admin - Info namespace for more information.
Admin> info namespace~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Usage Information (2023-10-13 15:59:46 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace| Node|Evictions| Stop|~System Memory~|~Primary Index~~|~~Secondary~~~|~~~~~~~~~~~~~~~~~~~Storage Engine~~~~~~~~~~~~~~~~~~ | | |Writes| Avail%| Evict%| Type| Used|~~~~Index~~~~~| Type| Used| Used%|Evict%| Used|Avail%|Avail | | | | | | | | Type| Used| | | | | Stop%| |Stop%bar |172.17.0.3:3000| 0.000 |False | 74| 0|shmem|625.000 KB|shmem|0.000 B |memory|625.000 KB|0.06 %| 0.0 %|70.0 %|97.0 %|5.0 %bar | | 0.000 | | | | |625.000 KB| |0.000 B | |625.000 KB|0.06 %| | | |test |172.17.0.3:3000| 0.000 |False | 74| 0|shmem|625.000 KB|shmem|0.000 B |memory|625.000 KB|0.06 %| 0.0 %|70.0 %|87.0 %|5.0 %test | | 0.000 | | | | |625.000 KB| |0.000 B | |625.000 KB|0.06 %| | | |Number of rows: 2::: note
The metric memory_used_bytes was removed in Database 7.0 to streamline configuration and capacity planning, and to stabilize overhead so that memory usage calculations are more accurate.
In Database 7.0, no single metric reports the amount of memory used in the namespace. A combination of items provide the same information as memory_used_bytes.
:::
Recommended alert metrics
- clock_skew_stop_writes
- data_avail_pct
- dead_partitions
- device_available_pct
- hwm_breached
- memory_free_pct
- pmem_available_pct
- stop_writes
- unavailable_partitions
- client_connections
- client_connections_opened
- cluster_size
- fabric_connections_opened
- heartbeat_connections_opened
- system_free_mem_kbytes
- system_free_mem_pct
- lag
Health-specific metrics
Other metrics to watch
- client_delete_error
- client_read_error
- client_udf_error
- client_write_error
- index_flash_alloc_pct
- memory_used_bytes
- pi_query_aggr_error
- pi_query_long_basic_error
- pi_query_ops_bg_error
- pi_query_short_basic_error
- pi_query_udf_bg_error
- pi_query_udf_bg_error
- scan_aggr_error
- scan_basic_error
- scan_ops_bg_error
- scan_udf_bg_error
- storage_engine_device_defrag_q
- storage_engine_file_write_q
- batch_index_error
- heap_efficiency_pct
- rw_in_progress
- abandoned
- lap_us
- latency_ms
- recoveries
- recoveries_pending
- retry_conn_reset
- retry_dest
- retry_no_node
- success