Skip to content

Metrics

The Aerospike Python client implements the standard client-side metrics system. For a comprehensive overview of the available metrics, see the Client Metrics documentation.

This page describes how to configure and access these metrics for the Aerospike Python client.

Some Aerospike client libraries have a data structure for cluster statistics, but the Python client works differently. To use extended metrics, you must explicitly notify the client to track latency and command counts for every node.

Configuring metrics collection

The Python client collects a standard set of essential metrics by default. You can enable more detailed “extended” metrics, which include latency histograms and transaction-level details.

You can enable extended metrics programmatically, or using Dynamic Client Configuration.

Using Dynamic Client Configuration (preferred)

The Aerospike Python client 17.1.0 or later is required for Dynamic Client Configuration.

You can enable and configure metrics dynamically if your application is set up for Dynamic Client Configuration, which allows operators to manage metrics collection without code changes or application restarts. See the Dynamic Client Configuration Quickstart for details about how to get started.

You can update the metrics policy dynamically using Dynamic Client Configuration. However, changes to the metrics configuration such as enabling or disabling extended metrics, changing labels, or adjusting reporting intervals, may affect your ability to compare new metrics data with previously collected metrics. For consistent analysis, ensure that configuration changes are tracked and considered when interpreting historical trends.

Programmatic configuration

Run the following command to enable metrics programmatically:

policy = MetricsPolicy(
report_dir="/var/log/aerospike/metrics",
interval=600,
)
# client is aerospike.Client object
client.enable_metrics(policy=policy)

To disable:

client.disable_metrics()

The MetricsPolicy fields are:

  • metrics_listeners: Listeners that handles metrics notification events. If set to None, the default listener implementation is used, which writes the metrics snapshot to a file which can later be read and forwarded to OpenTelemetry by a separate offline application. Otherwise, use all listeners set in the class instance.

    The listener can be overridden to send the metrics snapshot directly to OpenTelemetry.

    The following is a list of metrics_listeners fields:

    • enable_listener: Called when metrics have been enabled for the cluster.
    • snapshot_listener: Called when a metrics snapshot has been requested for the given cluster.
    • node_close_listener: Called when a node is dropped from the cluster.
    • disable_listener: Called when metrics have been disabled for the cluster.
  • report_dir: Directory path to write metrics log files for listeners that write logs.

  • report_size_limit: Metrics file size soft limit, in bytes, for listeners that write logs. When report_size_limit is reached or exceeded, the current metrics file is closed and a new metrics file is created with a new timestamp. If report_size_limit is set to 0, the metrics file size is unbounded and the file is only closed when aerospike.Client.disable_metrics() or aerospike.Client.close() is called.

    Defaults to 0.

  • interval: Number of cluster tend iterations between metrics notification events. One tend iteration is defined as tend_interval in the client configuration, plus the time to tend all nodes.

    Defaults to 30.

  • latency_columns: Number of elapsed time range buckets in latency histograms.

    Defaults to 7.

  • latency_shift: Power of 2 multiple between each range bucket in latency histograms starting at column 3. The bucket units are in milliseconds. The first 2 buckets are <=1ms and >1ms. Examples:

    # latencyColumns=7 latencyShift=1
    # <=1ms >1ms >2ms >4ms >8ms >16ms >32ms
    # latencyColumns=5 latencyShift=3
    # <=1ms >1ms >8ms >64ms >512ms

The default extended metrics file includes:

  • cluster: Metrics about the cluster connected to by the client.

    • name: Cluster name.

    • cpu: Current CPU usage percentage of the client process.

    • mem: Current memory usage of the client process.

    • invalidNodeCount: Count of add node failures in the most recent cluster tend iteration.

    • tranCount: Count of commands since client was started.

    • retryCount: Count of command retries since the client was started.

    • delayQueueTimeoutCount: Count, since client was started, of async commands that timed out in the delay queue before the command was processed.

      • eventloop: Metrics for each async event loop.
        • processSize: Approximate number of commands actively being processed on the event loop.
        • queueSize: Approximate number of commands stored on this event loop’s delay queue that have not been started yet.
  • node: Metrics for each node.

    • name: Node name.

    • address: Node IP address.

    • port: Node port.

    • syncConn: Sync connections.

      • inUse: Active connections from connection pools currently executing commands.
      • inPool: Initialized connections in connection pools that are not currently active.
      • opened: Total number of node connections opened since node was started.
      • closed: Total number of node connections closed since node was started.
    • asyncConn: Async connections. These should always be 0 for the Python client.

      • inUse: Active connections from connection pools currently executing commands.
      • inPool: Initialized connections in connection pools that are not currently active.
      • opened: Total number of node connections opened since node was started.
      • closed: Total number of node connections closed since node was started.
    • errors: Command error count since node was started. If the error is retryable, multiple errors per command may occur.

Namespace metrics:

is retryable (such as socket_timeout), multiple timeouts per command may occur.

  • latency: Latency buckets for the following types:

    • conn: Connection creation latency.
    • write: Single record write commands.
    • read: Single record read commands.
    • batch: Batch read/write commands.
    • query: Scan/Query commands.
  • ns: Namespace.

  • bytes_in: Bytes received from the server.

  • bytes_out: Bytes sent to the server.

  • error_count: Command error count since node was initialized. If the error is retryable, multiple errors per command may occur.

  • timeout_count: Command timeout count since node was initialized. If the timeout is retryable (i.e socket_timeout), multiple timeouts per command may occur.

  • key_busy_count: Command key busy error count since node was initialized.

Extended metrics file format: <reportDir>/metrics-yyyyMMddHHmmss.log

Extended metrics file example:

Terminal window
2023-08-03 17:56:45.444 header(1) cluster[name,cpu,mem,invalidNodeCount,commandCount,retryCount,delayQueueTimeoutCount,eventloop[],node[]] eventloop[processSize,queueSize] node[name,address,port,syncConn,asyncConn,errors,timeouts,latency[]] conn[inUse,inPool,opened,closed] latency(5,3)[type[l1,l2,l3...]]
2023-08-03 17:57:45.472 cluster[,0,29539536,0,86,0,0,[],[[BB9BF3DDF290C00,172.16.70.243,3000,0,1,2,0,0,0,0,0,0,0,[conn[0,0,0,0,0],write[6,1,0,0,0],read[14,0,0,0,0],batch[6,3,0,0,0],query[0,0,0,0,0]]],[BCDBF3DDF290C00,172.16.70.243,3020,0,1,2,0,0,0,0,0,2,0,[conn[1,0,0,0,0],write[13,1,0,0,0],read[3,0,0,0,0],batch[9,0,0,0,0],query[0,0,0,0,0]]],[BC3BF3DDF290C00,172.16.70.243,3010,0,1,2,0,0,0,0,0,0,0,[conn[1,0,0,0,0],write[7,1,0,0,0],read[27,0,0,0,0],batch[10,0,0,0,0],query[0,0,0,0,0]]]]]
2023-08-03 17:58:45.476 cluster[,0,29539536,0,86,0,0,[],[[BB9BF3DDF290C00,172.16.70.243,3000,0,1,2,0,0,0,0,0,0,0,[conn[0,0,0,0,0],write[6,1,0,0,0],read[14,0,0,0,0],batch[6,3,0,0,0],query[0,0,0,0,0]]],[BCDBF3DDF290C00,172.16.70.243,3020,0,1,2,0,0,0,0,0,2,0,[conn[1,0,0,0,0],write[13,1,0,0,0],read[3,0,0,0,0],batch[9,0,0,0,0],query[0,0,0,0,0]]],[BC3BF3DDF290C00,172.16.70.243,3010,0,1,2,0,0,0,0,0,0,0,[conn[1,0,0,0,0],write[7,1,0,0,0],read[27,0,0,0,0],batch[10,0,0,0,0],query[0,0,0,0,0]]]]]
2023-08-03 17:59:45.483 cluster[,0,29539536,0,86,0,0,[],[[BB9BF3DDF290C00,172.16.70.243,3000,0,1,2,0,0,0,0,0,0,0,[conn[0,0,0,0,0],write[6,1,0,0,0],read[14,0,0,0,0],batch[6,3,0,0,0],query[0,0,0,0,0]]],[BCDBF3DDF290C00,172.16.70.243,3020,0,1,2,0,0,0,0,0,2,0,[conn[1,0,0,0,0],write[13,1,0,0,0],read[3,0,0,0,0],batch[9,0,0,0,0],query[0,0,0,0,0]]],[BC3BF3DDF290C00,172.16.70.243,3010,0,1,2,0,0,0,0,0,0,0,[conn[1,0,0,0,0],write[7,1,0,0,0],read[27,0,0,0,0],batch[10,0,0,0,0],query[0,0,0,0,0]]]]]
...
Feedback

Was this page helpful?

What type of feedback are you giving?

What would you like us to know?

+Capture screenshot

Can we reach out to you?