Skip to content
Visit booth 3171 at Google Cloud Next to see how to unlock real-time decisions at scaleMore info

Metrics

Metrics

Some Aerospike client libraries have a data structure for cluster statistics, but the Python client works a bit differently. To use extended metrics, you must explicitly notify the client to track latency and command counts for every node.

To enable:

policy = MetricsPolicy(
report_dir="/var/log/aerospike/metrics",
interval=600,
)
# client is aerospike.Client object
client.enable_metrics(policy=policy)

To disable:

client.disable_metrics()

The MetricsPolicy fields are:

  • metrics_listeners: Listeners that handles metrics notification events. If set to None, the default listener implementation is used, which writes the metrics snapshot to a file which can later be read and forwarded to OpenTelemetry by a separate offline application. Otherwise, use all listeners set in the class instance.

    The listener can be overridden to send the metrics snapshot directly to OpenTelemetry.

    The following is a list of metrics_listeners fields:

    • enable_listener: Called when metrics have been enabled for the cluster.
    • snapshot_listener: Called when a metrics snapshot has been requested for the given cluster.
    • node_close_listener: Called when a node is dropped from the cluster.
    • disable_listener: Called when metrics have been disabled for the cluster.
  • report_dir: Directory path to write metrics log files for listeners that write logs.

  • report_size_limit: Metrics file size soft limit, in bytes, for listeners that write logs. When report_size_limit is reached or exceeded, the current metrics file is closed and a new metrics file is created with a new timestamp. If report_size_limit is set to 0, the metrics file size is unbounded and the file is only closed when aerospike.Client.disable_metrics() or aerospike.Client.close() is called.

    Defaults to 0.

  • interval: Number of cluster tend iterations between metrics notification events. One tend iteration is defined as tend_interval in the client configuration, plus the time to tend all nodes.

    Defaults to 30.

  • latency_columns: Number of elapsed time range buckets in latency histograms.

    Defaults to 7.

  • latency_shift: Power of 2 multiple between each range bucket in latency histograms starting at column 3. The bucket units are in milliseconds. The first 2 buckets are <=1ms and >1ms. Examples:

    # latencyColumns=7 latencyShift=1
    # <=1ms >1ms >2ms >4ms >8ms >16ms >32ms
    # latencyColumns=5 latencyShift=3
    # <=1ms >1ms >8ms >64ms >512ms

The default extended metrics file includes:

  • cluster: Metrics about the cluster connected to by the client.

    • name: Cluster name.

    • cpu: Current CPU usage percentage of the client process.

    • mem: Current memory usage of the client process.

    • invalidNodeCount: Count of add node failures in the most recent cluster tend iteration.

    • tranCount: Count of commands since client was started.

    • retryCount: Count of command retries since the client was started.

    • delayQueueTimeoutCount: Count, since client was started, of async commands that timed out in the delay queue before the command was processed.

      • eventloop: Metrics for each async event loop.
        • processSize: Approximate number of commands actively being processed on the event loop.
        • queueSize: Approximate number of commands stored on this event loop’s delay queue that have not been started yet.
  • node: Metrics for each node.

    • name: Node name.

    • address: Node IP address.

    • port: Node port.

    • syncConn: Sync connections.

      • inUse: Active connections from connection pools currently executing commands.
      • inPool: Initialized connections in connection pools that are not currently active.
      • opened: Total number of node connections opened since node was started.
      • closed: Total number of node connections closed since node was started.
    • asyncConn: Async connections. These should always be 0 for the Python client.

      • inUse: Active connections from connection pools currently executing commands.
      • inPool: Initialized connections in connection pools that are not currently active.
      • opened: Total number of node connections opened since node was started.
      • closed: Total number of node connections closed since node was started.
    • errors: Command error count since node was started. If the error is retryable, multiple errors per command may occur.

    • timeouts: Command timeout count since node was started. If the timeout is retryable (such as socket_timeout), multiple timeouts per command may occur.

    • latency: Latency buckets for the following types:

      • conn: Connection creation latency.
      • write: Single record write commands.
      • read: Single record read commands.
      • batch: Batch read/write commands.
      • query: Scan/Query commands.

Extended metrics file format: <reportDir>/metrics-yyyyMMddHHmmss.log

Extended metrics file example:

Terminal window
2023-08-03 17:56:45.444 header(1) cluster[name,cpu,mem,invalidNodeCount,commandCount,retryCount,delayQueueTimeoutCount,eventloop[],node[]] eventloop[processSize,queueSize] node[name,address,port,syncConn,asyncConn,errors,timeouts,latency[]] conn[inUse,inPool,opened,closed] latency(5,3)[type[l1,l2,l3...]]
2023-08-03 17:57:45.472 cluster[,0,29539536,0,86,0,0,[],[[BB9BF3DDF290C00,172.16.70.243,3000,0,1,2,0,0,0,0,0,0,0,[conn[0,0,0,0,0],write[6,1,0,0,0],read[14,0,0,0,0],batch[6,3,0,0,0],query[0,0,0,0,0]]],[BCDBF3DDF290C00,172.16.70.243,3020,0,1,2,0,0,0,0,0,2,0,[conn[1,0,0,0,0],write[13,1,0,0,0],read[3,0,0,0,0],batch[9,0,0,0,0],query[0,0,0,0,0]]],[BC3BF3DDF290C00,172.16.70.243,3010,0,1,2,0,0,0,0,0,0,0,[conn[1,0,0,0,0],write[7,1,0,0,0],read[27,0,0,0,0],batch[10,0,0,0,0],query[0,0,0,0,0]]]]]
2023-08-03 17:58:45.476 cluster[,0,29539536,0,86,0,0,[],[[BB9BF3DDF290C00,172.16.70.243,3000,0,1,2,0,0,0,0,0,0,0,[conn[0,0,0,0,0],write[6,1,0,0,0],read[14,0,0,0,0],batch[6,3,0,0,0],query[0,0,0,0,0]]],[BCDBF3DDF290C00,172.16.70.243,3020,0,1,2,0,0,0,0,0,2,0,[conn[1,0,0,0,0],write[13,1,0,0,0],read[3,0,0,0,0],batch[9,0,0,0,0],query[0,0,0,0,0]]],[BC3BF3DDF290C00,172.16.70.243,3010,0,1,2,0,0,0,0,0,0,0,[conn[1,0,0,0,0],write[7,1,0,0,0],read[27,0,0,0,0],batch[10,0,0,0,0],query[0,0,0,0,0]]]]]
2023-08-03 17:59:45.483 cluster[,0,29539536,0,86,0,0,[],[[BB9BF3DDF290C00,172.16.70.243,3000,0,1,2,0,0,0,0,0,0,0,[conn[0,0,0,0,0],write[6,1,0,0,0],read[14,0,0,0,0],batch[6,3,0,0,0],query[0,0,0,0,0]]],[BCDBF3DDF290C00,172.16.70.243,3020,0,1,2,0,0,0,0,0,2,0,[conn[1,0,0,0,0],write[13,1,0,0,0],read[3,0,0,0,0],batch[9,0,0,0,0],query[0,0,0,0,0]]],[BC3BF3DDF290C00,172.16.70.243,3010,0,1,2,0,0,0,0,0,0,0,[conn[1,0,0,0,0],write[7,1,0,0,0],read[27,0,0,0,0],batch[10,0,0,0,0],query[0,0,0,0,0]]]]]
...
Feedback

Was this page helpful?

What type of feedback are you giving?

What would you like us to know?

+Capture screenshot

Can we reach out to you?