Skip to main content
Loading

Histograms from Aerospike logs

Overviewโ€‹

Histograms plot the distribution of latencies for various transaction types that are processed by an Aerospike cluster. You can use them for in-depth analysis of the activity on your system.

Writes to the log fileโ€‹

Periodically, the Aerospike server writes histograms to the log file. The frequency of writes to the log file is controlled by the ticker-interval configuration parameter.

Histograms begin with the text "histogram dump:" followed by:

  • the histogram name
  • the number of measurements represented by the histogram
  • a maximum of four buckets each
    • The form for each bucket is (BUCKET_NUMBER: NUM_OBSERVED). The threshold for a given bucket is 2BUCKET_NUMBER units.

Example histogram dumpโ€‹

The following example shows a read histogram for the namespace test where 36,453,689,581 measurements are reported, 36,414,385,962 of which were less than 1ms (20ms):

Jun 09 2020 21:40:06 GMT: INFO (info): (hist.c:240) histogram dump: {test}-read (36453689581 total) msec
Jun 09 2020 21:40:06 GMT: INFO (info): (hist.c:257) (00: 36414385962) (01: 0031233450) (02: 0007841324) (03: 0000215526)
Jun 09 2020 21:40:06 GMT: INFO (info): (hist.c:257) (04: 0000002198) (05: 0000000693) (06: 0000000590) (07: 0000000716)
Jun 09 2020 21:40:06 GMT: INFO (info): (hist.c:257) (08: 0000001405) (09: 0000002365) (10: 0000004646) (11: 0000000706)

Some histograms provide metrics which are not latency related, such as accessed object sizes on read/write transactions, and number of records returned by secondary index queries.

High level characteristics of histogram outputโ€‹

  • Histograms only appear in the logs if they contain data. For example, a batch-specific histogram on a server which does not do any batch read operations would not be present.

  • Read and write histograms are tracked on per-namespace level basis. If latencies are only observed on a specific namespace, it is easier to track down based on the specific data in the namespace and the application writing to it.

  • To analyze transactions in detail, turn on the relevant configuration-enabled benchmark histograms specific to the transaction type.

  • Turning on a couple of the non-default histograms usually does not impact the performance of a cluster. However, in some cases, turning on more than a few of those configurable histograms could have a noticeable impact on the performance of a cluster.

  • Analyzing a time period that bridges a server restart distorts results of time slices and average/maximum calculations.

  • Histogram data can also be retrieved through the asadm show latencies command.

  • By default, latency histograms are measured in milliseconds, and can be configured to microseconds. See microsecond-histograms for details.

info

For information on how to analyze histogram data, see the Log Latency Tool guide, or the histogram command of the Aerospike Admin (asadm) tool when used under the log analyzer mode.

Notational conventions

  • Required braces, like {ns}, where ns is the name of a namespace, indicate an input variable you supply on the asinfo command line.
  • Brackets, like fabric-[channel]-send-init, for the different fabric channels, indicate an output name of the particular histogram type.

Benchmarksโ€‹

Benchmarks are active as either auto-enabled or configuration-enabled.

Auto-enabled benchmarksโ€‹

Auto-enabled benchmarks are a set of histograms that measure end-to-end performance of Aerospike subsystems and appear only when the histogram has data to display. They give a high-level view of what is happening on a node at a given point in time. These benchmark stats are written to the Aerospike log file at 10 second intervals and can be used to spot things such as latency in a certain transaction type.

{ns}-read

Time taken for read requests from the time they are received at the node to when the response leaves the node.

{ns}-write

Time taken for end-to-end writes, which includes the time taken for replica writes. Does not include deletes.

{ns}-udf

Time taken for UDF transactions.

batch-index

Batch read "parent" transaction. Tracks the time taken from the start of the transaction to getting a response. If batch calls are showing high latencies on the client side, consider changing the maxConcurrentThreads. For example, in the Java client library, set Batch Policy from the default 1 (sequential) to 0 (or a number greater than 0). Configuring the maxConcurrentThreads allows concurrent synchronous batch request threads to be sent to server nodes at any time.

batch-rec-count

The number of records per batch request sent to the cluster node. This histogram shows the distribution of counts, and not latencies. This histogram is introduced in Database 7.1.

{ns}-batch-sub-read

Time taken for batch (sub) read requests from the time they are received at the node to when the response leaves the node. Introduced in Database 6.4.

{ns}-batch-sub-write

Time taken for batch (sub) writes from end-to-end (includes the time taken for replica write). Does not include deletes. Introduced in Database 6.4.

{ns}-batch-sub-udf

Time taken for batch (sub) UDF transactions. Introduced in Database 6.4.

{ns}-query

Time taken for query transactions. Removed as of Database 6.0.

{ns}-query-rec-count

Records returned per query, showing the distribution of counts, not latencies. The first bucket in the histogram is always empty since queries not returning any records are not counted. Removed as of Database 6.0.

{ns}-pi-query

Time taken for primary index short query transactions. Introduced in Database 6.0.

{ns}-pi-query-rec-count

Displays the records returned per primary index short query. This histogram shows the distribution of counts, and not latencies. The first bucket in histogram is always empty as queries not returning any records do not get counted in this histogram. Introduced in Database 6.0.

{ns}-si-query

Time taken for secondary index short query transactions. This histogram is introduced in Database 6.0.

{ns}-si-query-rec-count

Displays the records returned per secondary index short query. This histogram shows the distribution of counts, and not latencies. The first bucket in histogram is always empty as queries not returning any records do not get counted in this histogram. Introduced in Database 6.0.

{ns}-re-repl

Latencies when re-replicating. Only available in strong consistency enabled namespaces.

{ns}-read-touch

Time taken for read operations that also touch the record according to the default-read-touch-ttl-pct configuration parameter (or equivalent client policy). Introduced in Database 7.1.

Configuration-enabled benchmarksโ€‹

Configuration-enabled benchmarks include specific histograms that you can enable or disable dynamically. These benchmarks are more granular than the auto-enabled benchmarks. If you contact Aerospike Support for assistance with a problem, they may request that you enable a few sets that are relevant to your issue. If you keep these benchmarks permanently enabled, they may impact performance.

note

Tools package 6.0.x or later is required to use asadm's manage config commands. Otherwise, use the equivalent asinfo - set-config command.

General transaction histogramโ€‹

To enable a specific benchmark, use the following command:

asadm -e 'enable; manage config service param <configuration> to true'
Type of HistogramDescriptionConfiguration
infoTime taken for an info protocol transaction.enable-hist-info

Specific transaction benchmarksโ€‹

The transaction benchmarks are monitored at the per-namespace level.

Proxy transaction analysisโ€‹

Displays time for a proxy transaction from the sender view.

asadm -e 'enable; manage config namespace <namespaceName> param enable-hist-proxy to true'

proxy

Time taken for proxy transactions.

Fabric transaction analysisโ€‹

Displays the time for a fabric transaction from sender's view.

asadm -e 'enable; manage config service param enable-benchmarks-fabric to true'

fabric-[channel]-send-init

Tracks the time taken for preparing a message for sending over fabric (intra cluster communication). Does not include any network operations. Replace [channel] with one of bulk (migrations), ctrl (paxos and migration control), meta (smd) or rw (read / write transactions).

fabric-[channel]-send-fragment

Tracks the time taken from end of send-init and stops after the network send. Can be hit multiple times if message was too big to fit in the send buffer. Replace [channel] with one of bulk (migrations), ctrl (paxos and migration control), meta (smd) or rw (read / write transactions).

fabric-[channel]-recv-fragment

Tracks the time taken from the beginning of the first byte of a message from network and ends after network recv. Can be hit multiple times if message was big enough to require multiple network recv calls. Replace [channel] with one of bulk (migrations), ctrl (paxos and migration control), meta (smd) or rw (read / write transactions).

fabric-[channel]-recv-cb

Tracks the time taken to process the transaction on the receiving node (rw, bulk, control, meta). Replace [channel] with one of bulk (migrations), ctrl (paxos and migration control), meta (smd) or rw (read / write transactions).

Ops subtransaction analysisโ€‹

Displays slices of internally generated ops subtransactions from a background scan or query.

asadm -e 'enable; manage config namespace <namespaceName> param enable-benchmarks-ops-sub to true'

ops-sub-start

Time taken from the beginning of a transaction to when a node partition is reserved.

ops-sub-restart

Time taken for a partition to be reserved to a partition being re-reserved (possible internal restart/reattempt, e.g. "key busy"). See the write-restart histogram for further details about transactions being reattempted.

ops-sub-dup-res

Time taken from a partition reserved to duplicates resolved (only during ongoing data rebalancing). Timeouts are not included.

ops-sub-master

Time taken from partition reserved or after duplicate resolution to an actual master record applied.

ops-sub-repl-write

Time taken from master record written to replica(s) written.

ops-sub-response

Time taken from master write (if write-commit-level configured to master) or replica write to client response completion.

Read transaction analysisโ€‹

Displays slices of a single-record read transaction.

asadm -e 'enable; manage config namespace <namespaceName> param enable-benchmarks-read to true'

read-start

Time taken from the beginning of a transaction to when a node partition is reserved. High latencies here could indicate bottleneck at the network or CPU (and often irq balancing) as well as in the transaction queue. There are no transaction queues. Thus only the demarshal time slice is relevant.

read-restart

Time taken for a partition to be reserved to a partition being re-reserved (possible internal restart/reattempt, e.g. "key busy" on duplicate resolve). For this slice, in the case of a potential hotkey, the clock runs from when the transaction gets into the rw-hash, until it is reattempted. The process is similar but there is no initial transaction queue or thread phase. After the transaction being processed completes, all transactions waiting on the rw-hash are put into an internal transaction queue in order to be reattempted.

The same transaction can be re-queued multiple times. Each is counted as a separate transaction in this slice. However, the clock resets every time a transaction gets into the rw-hash. A transaction being re-queued multiple times will not have its total accumulated time accounted for, but only the time it waited in the rw-hash (queued behind a transaction in progress) the last time prior to going through.

read-dup-res

Time taken from a partition reserved to duplicates resolved (only during ongoing data rebalancing). Timeouts are not included.

read-local

Time taken from partition reserved or after duplicate resolution to an actual local record read. For data-in-memory true, this is the slice for reading from memory. For data-in-memory false, this is the slice for reading from the disk except when the record is still in the post-write-cache or in cached (read-page-cache enabled). To troubleshoot further, enable storage benchmarks.

read-response

Time taken from local record read to client response completion. High latencies here could indicate a bottleneck on the network or the client, getting slow to consume responses from the server.

read-repl-ping

Time taken for reading all the copies of a record when the linearize read policy is in effect. This only applies for strong consistency enabled namespaces.

Write transaction analysisโ€‹

Displays slices of a single-record write transaction, excluding deletes.

asadm -e 'enable; manage config namespace <namespaceName> param enable-benchmarks-write to true'

write-start

Time taken from the beginning of a transaction to when a node partition is reserved. High latency here could indicate a bottleneck at the network or CPU, and often IRQ balancing.

A network bottleneck could be explained by max-record-size exceeding the MTU allowed by the network. If leveraging batch writes, an MTU that is smaller than the record sizes or batch size can cause subsequent fragments of a transaction to be delayed when picked up by a service thread.

When compression is enabled, higher latencies could also potentially be observed in this slice. There are no transaction queues, so only the demarshal time slice is relevant.

write-restart

Time taken for a partition to be reserved to a partition being re-reserved (possible internal restart/reattempt, e.g. "key busy"). For this slice, in the case of a potential hotkey, the clock runs from when the transaction gets into the rw-hash, until it is reattempted.

There is no initial transaction queue or thread phase; after the transaction being processed completes, all transactions waiting on the rw-hash are put into an internal transaction queue to be reattempted.

The same transaction can be re-queued multiple times. Each is counted as a separate transaction in this slice. However, the clock resets every time a transaction gets into the rw-hash. A transaction being re-queued multiple times will not have its total accumulated time accounted for, but rather only the time it waited in the rw-hash (queued behind a transaction in progress) the last time prior to going through. This also includes restarts from duplicate resolution if any.

write-dup-res

Time taken from a partition reserved to duplicates resolved (only during ongoing data rebalancing). Timeouts are not included.

write-master

Time between when the partition is reserved or after a duplicate resolution finishes, to when an actual master record write to the streaming write buffer (SWB) in-memory takes place. Latencies displayed here typically cover the time it took to update secondary indexes when necessary, and/or the read portion of the record write (if a write is an update, the record is read before a write is performed). Following are details of what is counted in this slice:

  • read before write
  • because of update
  • because of stored key (sendKey policy set to true)
  • compression at rest
  • sindex
  • larger number of bins with bin convergence
  • commit-to-device true (in SC)
  • binop processing
  • filter processing (if the filter requires reading a bin to evaluate)

write-repl-write

Time taken from master record written to replica(s) written.

write-response

Time taken from master write (if write-commit-level configured to master) or replica write to client response completion. High latencies here could indicate a bottleneck on the network or the client, getting slow to consume responses from the server.

UDF transaction analysisโ€‹

Displays slices of a single-record UDF transaction.

asadm -e 'enable; manage config namespace <namespaceName> param enable-benchmarks-udf to true'

udf-start

Time taken from the beginning of a transaction to when a node partition is reserved. This includes the demarshal part (tracked for all transactions under svc-demarshal for Database versions prior to 4.8) as well as the time spent in the transaction queue (tracked for all transactions under svc-queue for Database versions prior to 4.8). As of Database 4.7, there are no transaction queues and thus only the demarshal time slice is relevant.

udf-restart

Time taken for a partition to be reserved to a partition being re-reserved (possible internal restart/reattempt, e.g. "key busy"). See the write-restart histogram for further details about transactions being reattempted.

udf-dup-res

Time taken from a partition reserved to duplicates resolved (only during ongoing data rebalancing). Timeouts are not included.

udf-master

Time taken from partition reserved or after duplicate resolution to an actual master record applied.

udf-repl-write

Time taken from master record written to replica(s) written.

udf-response

Time taken from master write (if write-commit-level configured to master) or replica write to client response completion.

Sub-UDF transaction analysisโ€‹

Displays the slices of internally generated UDF transactions, usually by a background UDF, scan or a query.

asadm -e 'enable; manage config namespace <namespaceName> param enable-benchmarks-udf-sub to true'

udf-sub-start

Time taken from the beginning of a transaction to when a node partition is reserved. There are no transaction queues and thus only the demarshal time slice is relevant.

udf-sub-restart

Time taken for a partition to be reserved to a partition being re-reserved (possible internal restart/reattempt, e.g. "key busy"). See the write-restart histogram for further details about transactions being reattempted.

udf-sub-dup-res

Time taken from a partition reserved to duplicates resolved (only during ongoing data rebalancing). Timeouts are not included.

udf-sub-master

Time taken from partition reserved or after duplicate resolution to an actual master record applied.

udf-sub-response

Time taken from master write (if write-commit-level configured to master) or replica write to client response completion.

Batch transaction analysisโ€‹

Displays the slices for a batch transaction.

asadm -e 'enable; manage config namespace <namespaceName> param enable-benchmarks-batch-sub to true'

batch-sub-udf-master

Time taken from partition reserved or after duplicate resolution to an actual master record applied using UDF.

batch-sub-repl-write

Time taken from master record written to replica(s) written.

batch-sub-write-master

Time taken from partition reserved or after duplicate resolution to an actual master record applied.

batch-sub-prestart

The demarshal time for a batch sub-transaction. For single record transactions the '-start' slice covers the time from the demarshal to the partition reservation. For batch transactions multiple individual transactions are involved so the portion to deserialize off the network socket (demarshal) and internally queue (batch-sub-prestart) is separated from the portion where individual transactions (batch-sub-transactions) wait to be picked up by a service thread until the partition reservation (batch-sub-start). Higher latencies in this slice are typically caused by larger transactions where some TCP segments take longer to get to the server. For example, inadequate MTU or client/network delays can cause the deserialization to wait for TCP segments to arrive.

batch-sub-start

Time taken from a batch sub-transaction being queued until a node partition is reserved. Higher latencies in this slice indicate batch sub-transactions waiting longer before being picked up by a service thread. Depending on the situation, if the system can handle the workload, increasing the number of service-threads or tuning the inline policies on the client side can help.

  • Prior to Database 4.8, the time taken from the beginning of a transaction to when a node partition is reserved. This includes the demarshal part (tracked for all transactions under svc-demarshal) as well as the time spent in the transaction queue (tracked for all transactions under svc-queue). There are no transaction queues and only the demarshal time slice is relevant.

  • Prior to Database 4.4, this histogram only included the final "chunk" when reading a large request out of the socket, and could under-report the latency.

  • Database 4.8.0.1 introduced a problem where this histogram also contains data points that should instead be in thebatch-sub-restart histogram. This is fixed as of Database 4.8.0.27, and also fixed for other server lineages in versions 4.9.0.25,5.0.0.28, 5.1.0.26, 5.2.0.18, 5.3.0.9, and 5.4.0.4.

batch-sub-restart

Time taken from a partition being reserved to a partition being re-reserved (possible internal restart/reattempt). See the write-restart histogram for details about transactions being reattempted.

batch-sub-dup-res

Time taken from a partition reserved to duplicates resolved (only during ongoing data rebalancing). Timeouts are not included.

batch-sub-read-local

Time taken from partition reserved or after duplicate resolution to an actual local record read.

batch-sub-response

Time taken from local record read to batch parent response completion.

batch-sub-repl-ping

Time taken for reading all the copies of a record when the linearize read policy is in effect. This only applies for strong consistency enabled namespaces.

Storage transaction analysisโ€‹

Storage benchmarks are a set of device-targeted histograms that measure performance of the storage subsystem in time slices. They provide insight into performance issues related to storage devices and PMem storage files. Due to their performance impact, they are disabled by default. You can enable them by passing the set-config command to the asinfo utility.

To enable storage-benchmarks:

asadm -e 'enable; manage config namespace NAMESPACE storage-engine param enable-benchmarks-storage to true'

{ns}-[device]-read

Tracks the time taken to read from disk. Not available for PMem storage.

{ns}-[device]-write

Tracks amount of time taken to write a write block to disk. Not available for PMem storage. Since Database 7.1, the write-block-size configuration parameter has been replaced by flush-size, which specifies the size of these writes. Prior to Database 7.1, the size of a write block is the configured write-block-size. When running with commit-to-device set to true with versions prior to Database 5.3, this counter will only account for full blocks written and therefore will only count blocks written through the defragmentation process as client writes would write to disk individually rather than at a block level. But when running with commit-to-device set to true with Database 5.3 and later, client writes will also be included in the histogram.

{ns}-[device]-shadow-write

Tracks amount of time taken to write a write block to a shadow device, when using shadow devices. Not available for PMem storage. In Database 7.1, the write-block-size configuration parameter was replaced by flush-size, which specifies the size of these writes. Prior to Database 7.1, the size of a write block is the configured write-block-size. When running with commit-to-device set to true with versions prior to 5.3, this counter will only account for full blocks written and therefore will only count blocks written through the defragmentation process as client writes would write to disk individually rather than at a block level. But when running with commit-to-device set to true with Database 5.3 and later, client writes will also be included in the histogram.

{ns}-[device]-large-block-read

Tracks the amount of time to read a write block from disk. Not available for PMem storage. In Database 7.1, the write-block-size configuration parameter was replaced by flush-size, which specifies the size of these reads. Prior to Database 7.1, the size of a write-block is the configured write-block-size. As of Database 5.3, tomb-raider device reads are included in this histogram.

{ns}-[device]-fsync

Tracks the time taken for fsync. Available in Database 4.3.0.10 and below. Deprecated in Database 4.3.1.3.

{ns}-device-read-size

Tracks the size of the objects being read.

{ns}-device-write-size

Tracks the size of the objects being written. This includes replica writes from other nodes (on namespaces with replication factor 2 or more). Does not include defragmentation writes.

Secondary index transaction analysisโ€‹

This histogram is removed in Database 6.0.