Metrics
Metrics
Some Aerospike client libraries have a data structure for cluster statistics, but the Python client works a bit differently. To use extended metrics, you must explicitly notify the client to track latency and command counts for every node.
To enable:
policy = MetricsPolicy( report_dir="/var/log/aerospike/metrics", interval=600,)# client is aerospike.Client objectclient.enable_metrics(policy=policy)
To disable:
client.disable_metrics()
The MetricsPolicy
fields are:
-
metrics_listeners
: Listeners that handles metrics notification events. If set to None, the default listener implementation is used, which writes the metrics snapshot to a file which can later be read and forwarded to OpenTelemetry by a separate offline application. Otherwise, use all listeners set in the class instance.The listener can be overridden to send the metrics snapshot directly to OpenTelemetry.
The following is a list of
metrics_listeners
fields:enable_listener
: Called when metrics have been enabled for the cluster.snapshot_listener
: Called when a metrics snapshot has been requested for the given cluster.node_close_listener
: Called when a node is dropped from the cluster.disable_listener
: Called when metrics have been disabled for the cluster.
-
report_dir
: Directory path to write metrics log files for listeners that write logs. -
report_size_limit
: Metrics file size soft limit, in bytes, for listeners that write logs. Whenreport_size_limit
is reached or exceeded, the current metrics file is closed and a new metrics file is created with a new timestamp. Ifreport_size_limit
is set to 0, the metrics file size is unbounded and the file is only closed whenaerospike.Client.disable_metrics()
oraerospike.Client.close()
is called.Defaults to 0.
-
interval
: Number of cluster tend iterations between metrics notification events. One tend iteration is defined astend_interval
in the client configuration, plus the time to tend all nodes.Defaults to 30.
-
latency_columns
: Number of elapsed time range buckets in latency histograms.Defaults to 7.
-
latency_shift
: Power of 2 multiple between each range bucket in latency histograms starting at column 3. The bucket units are in milliseconds. The first 2 buckets are<=1ms
and>1ms
. Examples:# latencyColumns=7 latencyShift=1# <=1ms >1ms >2ms >4ms >8ms >16ms >32ms# latencyColumns=5 latencyShift=3# <=1ms >1ms >8ms >64ms >512ms
The default extended metrics file includes:
-
cluster
: Metrics about the cluster connected to by the client.-
name
: Cluster name. -
cpu
: Current CPU usage percentage of the client process. -
mem
: Current memory usage of the client process. -
invalidNodeCount
: Count of add node failures in the most recent cluster tend iteration. -
tranCount
: Count of commands since client was started. -
retryCount
: Count of command retries since the client was started. -
delayQueueTimeoutCount
: Count, since client was started, of async commands that timed out in the delay queue before the command was processed.eventloop
: Metrics for each async event loop.processSize
: Approximate number of commands actively being processed on the event loop.queueSize
: Approximate number of commands stored on this event loop’s delay queue that have not been started yet.
-
-
node
: Metrics for each node.-
name
: Node name. -
address
: Node IP address. -
port
: Node port. -
syncConn
: Sync connections.inUse
: Active connections from connection pools currently executing commands.inPool
: Initialized connections in connection pools that are not currently active.opened
: Total number of node connections opened since node was started.closed
: Total number of node connections closed since node was started.
-
asyncConn
: Async connections. These should always be0
for the Python client.inUse
: Active connections from connection pools currently executing commands.inPool
: Initialized connections in connection pools that are not currently active.opened
: Total number of node connections opened since node was started.closed
: Total number of node connections closed since node was started.
-
errors
: Command error count since node was started. If the error is retryable, multiple errors per command may occur. -
timeouts
: Command timeout count since node was started. If the timeout is retryable (such assocket_timeout
), multiple timeouts per command may occur. -
latency
: Latency buckets for the following types:conn
: Connection creation latency.write
: Single record write commands.read
: Single record read commands.batch
: Batch read/write commands.query
: Scan/Query commands.
-
Extended metrics file format: <reportDir>/metrics-yyyyMMddHHmmss.log
Extended metrics file example:
2023-08-03 17:56:45.444 header(1) cluster[name,cpu,mem,invalidNodeCount,commandCount,retryCount,delayQueueTimeoutCount,eventloop[],node[]] eventloop[processSize,queueSize] node[name,address,port,syncConn,asyncConn,errors,timeouts,latency[]] conn[inUse,inPool,opened,closed] latency(5,3)[type[l1,l2,l3...]]2023-08-03 17:57:45.472 cluster[,0,29539536,0,86,0,0,[],[[BB9BF3DDF290C00,172.16.70.243,3000,0,1,2,0,0,0,0,0,0,0,[conn[0,0,0,0,0],write[6,1,0,0,0],read[14,0,0,0,0],batch[6,3,0,0,0],query[0,0,0,0,0]]],[BCDBF3DDF290C00,172.16.70.243,3020,0,1,2,0,0,0,0,0,2,0,[conn[1,0,0,0,0],write[13,1,0,0,0],read[3,0,0,0,0],batch[9,0,0,0,0],query[0,0,0,0,0]]],[BC3BF3DDF290C00,172.16.70.243,3010,0,1,2,0,0,0,0,0,0,0,[conn[1,0,0,0,0],write[7,1,0,0,0],read[27,0,0,0,0],batch[10,0,0,0,0],query[0,0,0,0,0]]]]]2023-08-03 17:58:45.476 cluster[,0,29539536,0,86,0,0,[],[[BB9BF3DDF290C00,172.16.70.243,3000,0,1,2,0,0,0,0,0,0,0,[conn[0,0,0,0,0],write[6,1,0,0,0],read[14,0,0,0,0],batch[6,3,0,0,0],query[0,0,0,0,0]]],[BCDBF3DDF290C00,172.16.70.243,3020,0,1,2,0,0,0,0,0,2,0,[conn[1,0,0,0,0],write[13,1,0,0,0],read[3,0,0,0,0],batch[9,0,0,0,0],query[0,0,0,0,0]]],[BC3BF3DDF290C00,172.16.70.243,3010,0,1,2,0,0,0,0,0,0,0,[conn[1,0,0,0,0],write[7,1,0,0,0],read[27,0,0,0,0],batch[10,0,0,0,0],query[0,0,0,0,0]]]]]2023-08-03 17:59:45.483 cluster[,0,29539536,0,86,0,0,[],[[BB9BF3DDF290C00,172.16.70.243,3000,0,1,2,0,0,0,0,0,0,0,[conn[0,0,0,0,0],write[6,1,0,0,0],read[14,0,0,0,0],batch[6,3,0,0,0],query[0,0,0,0,0]]],[BCDBF3DDF290C00,172.16.70.243,3020,0,1,2,0,0,0,0,0,2,0,[conn[1,0,0,0,0],write[13,1,0,0,0],read[3,0,0,0,0],batch[9,0,0,0,0],query[0,0,0,0,0]]],[BC3BF3DDF290C00,172.16.70.243,3010,0,1,2,0,0,0,0,0,0,0,[conn[1,0,0,0,0],write[7,1,0,0,0],read[27,0,0,0,0],batch[10,0,0,0,0],query[0,0,0,0,0]]]]]...