Metrics

The Aerospike clients provide a powerful system for monitoring performance and resource usage. This system is built on a few core concepts:

Dimensions: Levels at which metrics are aggregated. The three primary dimensions are Cluster, Node, and Namespace. You can also configure labels on the client instance which apply to all metrics it generates.
Metrics: These are the quantitative measurements being tracked, such as transaction counts, error rates, and latency buckets.

Clients can be configured to emit two levels of metrics:

Standard metrics: A baseline of essential statistics (e.g., connections, threads) that are always collected with negligible performance impact.
Extended metrics: Detailed diagnostic data, including latency histograms and transaction counts. These incur a performance penalty and must be enabled.

Enabling extended metrics

Extended metrics can be enabled programmatically within your application or at runtime if your client supports Dynamic Client Configuration. For implementation details, please see your specific client’s metrics documentation or the Dynamic Client Configuration documentation.

Dimensions and labels

Metrics are collected and can be aggregated at the following levels:

Cluster: the highest level, representing the entire Aerospike cluster that the client is connected to.
Node: represents a single server instance within the cluster.
Namespace: represents a data partition within a specific node.

You can alo define custom key-value pairs (label[]) on the client instance that are applied to all metrics it that client generates (e.g. owner=data-team or app_id=fraud-service). app_id is an auto-generated label that created by the client to correlate client-side operations with server-side metrics. By default, it defaults to the client username, but you can override it with a custom value.

Available Metrics

Metrics are collected at the Cluster, Node, and Namespace dimensions. Namespace-level metrics are available across all dimensions, whereas some metrics are specific to only the cluster or node dimension. This is not configurable. Metrics marked with (Extended) are available only when extended metrics are enabled.

Only cluster-level metrics

These metrics are only collected at the cluster level.

cpu: CPU usage percentage of the client process. (Extended)
mem: Memory usage of the client process. (Extended)
threadsInUse: Number of active threads executing synchronous commands.
recoverQueueSize: Number of connections in the sync connection recover queue.
invalidNodeCount: Count of failed node additions.
commandCount: Total commands issued by the client. (Extended)
retryCount: Total command retries.
delayQueueTimeoutCount: Async commands that timed out in the delay queue.

Event loop metrics

Metrics for each async event loop.

processSize: Number of commands actively being processed.
queueSize: Number of commands waiting in the delay queue.

Only node-level metrics

These metrics are only collected at the node level.

syncConn.inUse/asyncConn.inUse: Active connections from the pool.
syncConn.inPool/asyncConn.inPool: Idle connections in the pool.
syncConn.opened/asyncConn.opened: Total connections opened.
syncConn.closed/asyncConn.closed: Total connections closed.

Namespace metrics (per Cluster, per Node, per Namespace)

These metrics are collected per namespace and are available at the cluster, node, and namespace dimensions.

errors: Command error count. (Extended)
timeouts: Command timeout count. (Extended)
keyBusy: Count of AS_PROTO_RESULT_FAIL_KEY_BUSY errors. (Extended)
bytesIn: Total bytes read. (Extended)
bytesOut: Total bytes written. (Extended)
latency[]: Latency histograms for the following command types: (Extended)
- conn: Connection creation latency.
- write: Single record write commands.
- read: Single record read commands.
- batch: Batch read/write commands.
- query: Scan/Query commands.

Metrics log format

When configured, clients can periodically write a timestamped record of metrics snapshots to a log file. The schema explicitly defines the dimensions, labels, and metrics nested within them. The log will contain extended metrics only when they are enabled.

Log schema:

cluster[name,clientType,clientVersion,app_id,label[],cpu,mem,invalidNodeCount,commandCount,retryCount,delayQueueTimeoutCount,eventloop[],node[]] label[name,value] eventloop[processSize,queueSize] node[name,address,port,syncConn,asyncConn,namespace[]] conn[inUse,inPool,opened,closed] namespace[name,errors,timeouts,keyBusy,bytesIn,bytesOut,latency[]] latency(%u,%u)[type[l1,l2,l3...]]

Sample output:

cluster[,C,7.0.4,my_app,[[region,us-west],[zone,usw1-az3]],0,5685248,0,12,0,0,[],[[BB9AFA9A8421C00,10.211.55.4,3000,0,1,2,0,0,0,0,0,[test,1,0,0,1111,2222,[conn[0,0,0,0,0,0,0],write[7,0,0,0,0,0,0],read[3,1,0,0,0,0,0],batch[0,0,0,0,0,0,0],query[0,0,0,0,0,0,0]]]]]]

For details on how to enable metrics collection, configure policies, and access metrics data, please refer to the documentation for your specific Aerospike client.