Release notes for previous Monitoring Stack releases
Monitoring stack 3.0.0
November 9, 2023 | Download
- Aerospike Monitoring Stack version 3.0.0 is a major upgrade with improved dashboards designed to be forward and backward compatible with Aerospike versions.
- Establishes a consistent design pattern with status displayed at the top and detailed time ranges displayed below.
- All dashboards and alerts are forward compatible to 7.x versions and backwards compatible to 5.x versions.
- Removed unused and deprecated dashboards (alerts, exporter, and jobs).
Breaking Changes
- Alert severity is modified to be of type string like critical, error, warn, and info. Earlier number based severity is deprecated.
- New alerts related to 7.0 metrics, connectors, and bug-fixes are added with string type severity only.
- Removed the 3 deprecated alerts, exporters and job dashboards.
New Features
- Add DynaTrace to the OTEL Examples. [OM-116]
- Added support documentation and example otel-collector configurations on integrating Aerospike metrics with DynaTrace.
- Node View - Handle 7.0 metric changes. [OM-127]
- Revamped dashboard according to 7.0 metrics theme and display build version, alert by severity, and data, index and memory metrics are split into respective panels.
- Data, index and memory metrics are shown as minimum, average and maximum to identify anomalies easily across namespaces.
- Namespace View - Handle 7.0 metric changes. [OM-128]
- Revamped dashboard according to 7.0 metrics theme and display build-version, alert by severity, and data, index and memory metrics are split into respective panels.
- Data, index and memory metrics are shown as minimum, average and maximum to identify anomalies easily across all nodes.
- Unique Data View - Handle 7.0 metric changes. [OM-129]
- Revamped dashboard according to 7.0 metrics theme and display build-version, alert by severity, and data, index and memory metrics are split into respective panels.
- Displays usage across clusters and historical usage by each cluster.
- Data is displayed in three layers: 1. all-clusters, 2. single-cluster, 3. by namespace in each cluster.
- Update Alert Rule to Handle 7.0 metric changes. [OM-130]
- Enhanced alerts to use 7.x metrics and marked previous alerts with “pre7x” prefix.
- List of alerts added / modified.
- Modified - NamespaceDataCloseToStopWrites, LowDataAvailWarning, LowDataAvailCritical.
- Added - HighDataUseNamespaceWarning, HighDataUseNamespaceCritical.
- Renamed - pre7x_NamespaceSetQuotaWarning, pre7x_NamespaceSetQuotaAlertCritical.
- Set Index - Handle 7.0 metric changes. [OM-133]
- Revamped dashboard according to 7.0 metrics theme and display build-version, alert by severity, and data, index and memory metrics are split into respective panels.
- Data metrics are shown as minimum, average and maximum to identify anomalies easily across all nodes.
- All Flash - Handle 7.0 metrics. [OM-134]
- Revamped dashboard according to 7.0 metrics theme and display build-version, alert by severity, and data, index and memory metrics are split into respective panels.
- Rolling Restart Dashboard - Handle 7.0 metric changes. [OM-135]
- Revamped dashboard according to 7.0 metrics theme and display build-version, alert by severity, and data, index and memory metrics are split into respective panels.
- Data and memory are now showing both top-k and bottom-k, which represents both over-utilized and under-utilized.
- Cluster view - Handle 7.0 metric changes. [OM-136]
- Revamped dashboard according to 7.0 metrics theme and display build-version, alert by severity, and data, index and memory metrics are split into respective panels.
- Handle 7.0 - Multi cluster view. [OM-139]
- Revamped dashboard according to 7.0 metrics theme and display build-version, alert by severity, and data, index and memory metrics are split into respective panels.
- Topology diagram now shows dashboards and connectors using different diagrams.
- Remove deprecated job and alerts (old) dashboards. [OM-144]
- Removed the 2 deprecated jobs, exporter and alerts dashboard. Alerts dashboard is replaced with new alertsview dashboard in previous release.
Bug Fixes
- Standardized alert severity colors, bug-fix where info alert count now showing correctly. [OM-140]
- AMS - Change ordering of memory free pct graph on Rolling Restart dashboard. [OM-114]
- Avoid average function in namespace dashboard. [OM-74]
- Monitoring dashboard “Namespace” does not show namespace level values. [OM-105]
- Improve Dashboard Queries and Linting. [OM-109]
Monitoring stack 2.8.0
September 20, 2023 | Download
- The v2.8.0 Grafana dashboards are not backwards compatible with servers older than 6.0.0.0.
- This release includes 1 major feature - Connector Dashboard, Alerts and topology.
- Aerospike Monitoring Stack version 2.8.0 adds 2 dashboard, alerts and bug fixes:
- 2 dashboards to monitor connectors and connector JVM metrics.
- Enhanced alerts to cover various aspects of Connector key metric thresholds and JVM health.
New Features
- Create predefined Prometheus alert rules for Connectors. [OM-64]
- This release include 6 alerts to cover mandatory functional and process/health of the Connectors.
- Key alerts covered are connector-status, connector-request-lag, connector-request-errors, jvm heap, jvm cpu and jvm gc.
- This release include 6 alerts to cover mandatory functional and process/health of the Connectors.
- Connectors alerts & Dashboards [OM-56]
- Connector view dashboard which helps to monitor 6 connectors.
- Connectors supported are - xdr-proxy, kafka-outboud, pulsar-outbound, esp-outbound, elastic-search and jms-outbound.
- Key metrics covered are - request lag, request error, success, skipped, connections, xdr record byte size, etc…
- Connector view dashboard which helps to monitor 6 connectors.
- Create a dashboard for a Connector(s) [OM-107]
- Connector JVM view dashboard which helps to monitor JVM health of 6 Connectors.
- Connectors supported are - xdr-proxy, kafka-outboud, pulsar-outbound, esp-outbound, elastic-search and jms-outbound.
- Key metrics covered are - uptime, cpu, memory, threads, files, classes and buffers.
- Connector JVM view dashboard which helps to monitor JVM health of 6 Connectors.
- Multi-cluster view dashboard is enhanced to display Aerospike Server topology using the cluster-name and xdr dc configurations.
Bug Fixes
- Avoid duplicate defrag metric values on the namespace dashboard. [OM-122]
- Namespace view dashboard - average objects per sprig stat. [OM-113]
- Add high-water mark breached to the Rolling Restart dashboard. [OM-120]
Monitoring stack 2.7.0
August 28, 2023 | Download
- The v2.7.0 Grafana dashboards are not backwards compatible with servers older than 6.0.0.0.
- This release includes 2 major features - Enhanced Alerts and All Flash use-case dashboard.
- Aerospike Monitoring Stack version 2.7.0 adds new dashboard and bug fixes:
- All Flash dashboard, various key metrics which should be monitored while working with flash storage at both index and sindex.
- Enhanced alerts to cover various aspects of server metrics, this release covers alerts on Namespaces, XDR, Latencies, Best checks, Node-exporter etc…
- Aerospike Prometheus exporter 1.13.0 or greater must be used to get the Aerospike 6.4 metrics.
New Features
- Add new XDR bytes-shipped metrics to dashboards. [OM-104]
- Display bytes-shipped both as stat and time-series which can help monitoring the replication progress.
- Observability & Management Alerts - Enhance / enrich prometheus alerts from ACMS. [OM-98]
- This release includes 40 alerts covering various metrics of Aerospike Server, some key areas are:
- Namespaces, Latencies, data replication (xdr), set, node-exporter, flash , best checks etc…
- This release includes 40 alerts covering various metrics of Aerospike Server, some key areas are:
- Use-case Dashboard: all-flash. [OM-93]
- A new use-case dashboard is introduced in this release, this dashboard focuses mainly on key metrics and alerts related to flash usage.
- Some key metrics are average-objects per sprig, index-pressure, primary index flash and secondary index flash etc…
- A new use-case dashboard is introduced in this release, this dashboard focuses mainly on key metrics and alerts related to flash usage.
- Use-case Dashboard Organization & Naming. [OM-48]
- Added brief descriptions on each dashboard and updated tags to identify each dashboard easily.
- Observability dashboard unit tests. [OM-111]
- Created a framework to test our dashboard automatically including panels, expression / queries, layout and expression results.
- Add user stat related alerts. [OM-103]
- Added user stat specific alerts covering connections, connection churn etc…
- Add warning for best practice failures. [OM-101]
- Alerts if best-practices are not followed while setting up the Aerospike server, this flag is sent by the server after a series of checks.
- Add warning for node-exporter not being present. [OM-102]
- As a precursor to integrate node-exporter metrics into Aerospike Monitoring stack, this alert is introduced if node-exporter is not configured, raising a warning alert in the Alerts View dashboard.
Monitoring stack 2.6.1
August 3, 2023 | Download
- The v2.6.1 Grafana dashboards are not backwards compatible with servers older than 6.0.0.0.
- Aerospike Monitoring Stack version 2.6.1 adds bug fixes.
- Aerospike Prometheus exporter 1.12.0 or greater must be used to get the Aerospike 6.3 metrics.
- Deprecated
- Existing Alerts dashboard is deprecated and will be removed in future releases.
- Existing Jobs dashboard is deprecated and will be removed in future releases.
Bug Fixes
- Issues in Multi-cluster view dashboard [OM-100]
- Corrected label and unit in XDR panel.
- Corrected links from XDR and Latencies to respective dashboards (instead of cluster-view).
- Added a alert-severity based filter.
- Issues in Alerts view
- Panel colors are corrected according to the severity types.
- Issues in Unique Data view
- Unique data bytes are not shown correctly when custom labels are enabled in configuration.
- Added historical time-series for unique data-bytes data point.
Monitoring stack 2.6.0
July 12, 2023 | Download
- The v2.6.0 Grafana dashboards are not backwards compatible with servers older than 6.0.0.0.
- This release eliminates instances of hard coded values for variables. As a result, the user needs to ensure that the Aerospike Prometheus data source is selected as a default in order for dashboard data to populate correctly.
- Aerospike Monitoring Stack version 2.6.0 adds the new dashboard and bug fixes.
- Rolling restarts dashboard, various key metrics which should be monitored during specific use cases.
- Alerts View dashboard, adopting more meaningful alert severity levels.
- Aerospike Prometheus exporter 1.12.0 or greater must be used to get the Aerospike 6.3 metrics.
- Deprecated
- Existing Alerts dashboard is deprecated and will be removed in future releases.
- Existing Jobs dashboard is deprecated and will be removed in future releases.
New Features
- Rolling Restarts dashboard, data is shown in group like stats, error and resources. This dashboard curates various key metrics which should be monitored during specific use cases, like node restart, software upgrade, investigation, etc. Resource utilization is displayed for the TopK major consumers at a service and namespace level. [OM-79]
- Added the new Alerts view dashboard. This visualizes alerts according to the severity as count and each alert. Newly adopted alert levels in decreasing order:
critical
,error
,warn
andinfo
. This dashboard replaces the existing Alerts dashboard. [OM-85] - All Aerospike dashboards and panel visualizations are modified according to the Grafana 9.x version. [OM-82]
- Improved and reorganized Aerospike Monitoring stack examples: [OM-49]
- Reorganized docker compose file in relevant folder.
- Added examples on how to use AeroLab which can spin up Aerospike clusters per Proof of Concept (POC) needs.
Bug Fixes
- Includes bug fixes related to queries and visualizations: [OM-82]
- All queries now include proper regex pattern to honor single or multiple value template variable selection.
- All Time-Series are adjusted to use range vector.
- All dashboard have standardized template variable and same order.
Monitoring stack 2.5.0
June 19, 2023 | Download
- Aerospike Monitoring Stack version 2.5.0 adds the new Multi Cluster view dashboard, Otel integration examples and bug fixes
New Features
- Added the new Multi cluster view dashboard. This visualizes multiple clusters across regions and data centers with a focus on health. This dashboard consists of 4 panels. [OM-45]
- Geomap panel - displays multiple cluster view.
- Cluster panel - displays key metrics like size, alerts, XDR lag, Read & Write latencies.
- Node panel - uses the Polystat plugin and displays nodes in Green or Red indicating the health.
- Namespace panel - displays namespaces in Green or Red indicating the health.
- Key metrics used in this dashboard:
aerospike_node_up
aerospike_namespace_objects
aerospike_node_stats_cluster_size
aerospike_xdr_lag
aerospike_latencies_write_ms_bucket
aerospike_latencies_read_ms_bucket
- Added new examples on how to integrate Aerospike prometheus exporter with the Otel collector and export metrics to a partner solution. [OM-60]
- Partner integration examples are provided for NewRelic, Datadog and Cloudwatch.
Bug Fixes
- In the Namespace dashboard, the Defrag row hides anomalies as a result of aggregation. [OM-76]
- Removed the Defrag row, as aggregation is removed and moved from the defrag panels to the namespace row to display defrag metrics for each namespace.
Monitoring stack 2.4.0
May 16, 2023 | Download
- Aerospike Monitoring Stack version 2.4.0 adds support for metrics introduced in Aerospike 6.3.
New Features
- Added defrag metrics to the namespace view dashboard. [OM-62]
- Adds
aerospike_namespace_storage_engine_defrag_lwm_pct
. - Adds
aerospike_namespace_storage_engine_file_defrag_q
. - Adds
aerospike_namespace_storage_engine_device_defrag_q
. - Adds
aerospike_namespace_storage_engine_file_defrag_reads
. - Adds
aerospike_namespace_storage_engine_device_defrag_reads
. - Adds
aerospike_namespace_storage_engine_file_defrag_writes
. - Adds
aerospike_namespace_storage_engine_device_defrag_writes
.
- Adds
Bug Fixes
- In namespace view dashboard NSUP Cycle is summed, instead of showing max/average. [OM-61]
- Fixed NSUP metrics panel to show maximum and average
aerospike_namespace_nsup_cycle_duration
. - Fixed NSUP metrics panel to show maximum and average
aerospike_namespace_nsup_cycle_deleted_pct
.
- Fixed NSUP metrics panel to show maximum and average
- Migration summary doubles up in cluster view dashboard. [OM-22]
- Fixed migration metrics panel to show
aerospike_namespace_migrate_rx_partitions_remaining
andaerospike_namespace_migrate_tx_partitions_remaining
separately.
- Fixed migration metrics panel to show
Monitoring stack 2.3.1
April 19, 2023 | Download
Bug Fixes
- Issues in Set view, Unique data view, Sindex view, Namespace view and Node view: [OM-37]
- Fixed issue in “Set view” dashboard to remove hardcoded datasource.
- Re-exported Set view, Unique data view, Sindex view, Namespace view and Node view dashboards with right configurations so they are suitable to be made available in Grafana Cloud.
Monitoring stack 2.3.0
April 3, 2023 | Download
- Aerospike Monitoring Stack version 2.3.0 adds support for metrics introduced in Aerospike 6.3.
New Features
- Added 6.3 metrics:
- Adds
aerospike_sindex_used_bytes
secondary index metric. - Adds
aerospike_namespace_nsup_cycle_deleted_pct
NSUP metric. - Adds
aerospike_sets_stop_writes_size
set level configuration.
- Adds
- Updated memory used panel in secondary index to consider
aerospike_sindex_used_bytes
oraerospike_sindex_memory_used
asaerospike_sindex_memory_used
is deprecated in Aerospike 6.3. - Added nsup metrics panel to Namespace view dashboard.
- Added set level quotas panel to Namespace view dashboard.
- Added a new dashboard displaying set level metrics.
- Added a new dashboard displaying unique data usage.
- Added 4 new prometheus alerts:
NamespaceSupervisorFallingBehind
when NSUP is falling behind and/or display the length of time the most recent NSUP cycle lasted.NamespaceFreeMemoryCloseToStopWrites
when one of your Aerospike nodes memory is close to the stop writes limit configured for a namespace.NamespaceSetQuotaWarning
when one of your Aerospike nodes is at 80% of the quota you have configured on a set.NamespaceSetQuotaAlert
when one of your Aerospike nodes is at 99% of the quota you have configured on a set.
Monitoring stack 2.2.0
August 26, 2022 | Download
- Aerospike Monitoring Stack version 2.2.0 adds support for metrics introduced in Aerospike 6.1.
New Features
- Add server 6.1 metrics. [TOOLS-2087]
- Adds aerospike_xdr_bytes_shipped.
- Adds aerospike_sindex_entries_per_bval.
- Adds aerospike_sindex_entries_per_rec.
- Replace latency panels with heat map and percentiles. [TOOLS-2132]
Monitoring stack 2.1.0
July 19, 2022 | Download
- Aerospike Monitoring Stack version 2.1.0 adds support for the batch-index latency metrics aerospike_latencies_batch_index_us_bucket and aerospike_latencies_batch_index_us_count.
New Features
- Add batch-index latency panels. [TOOLS-2069]
Monitoring stack 2.0.0
June 10, 2022 | Download
- Aerospike Monitoring version 2.0.0 adds support for many new Aerospike 6.0 metrics in the Grafana dashboards, like the following:
- Primary index queries.
- Secondary Index queries.
- Batch sub transactions. (non proxied)
- Add overall reads/writes (client_read/write_success + batch_sub_read/write_success) to cluster, node, and namespace dashboards.
- New job information such as job type.
- si-query and pi-query latencies.
- Add memory_used stats to SIndex dashboard, remove the many SIndex metrics dropped in Aerospike Server version 6.0.
- Remove any mention of scans.
- Other miscellaneous changes. See pull request 33 for more details.
New Features
- Display Aerospike 6 metrics. [TOOLS-2044]
Monitoring stack 1.4.0
March 14, 2022 | Download
New Features
- Add Jobs View and Secondary Index View dashboards [TOOLS-1956]
- Add support for per-job scan and query statistics [TOOLS-1946]
- Add support for secondary index statistics [TOOLS-1947]
Monitoring stack 1.3.2
September 7, 2021 | Download
Improvements
- Add new metrics introduced in Aerospike 5.7. [TOOLS-1785]
Monitoring stack 1.3.1
June 15, 2021 | Download
Improvements
- Adds “Exporters View” dashboard to track status of all Aerospike Prometheus Exporter targets.
Bug Fixes
- Fixes incorrect status of the exporters and Aerospike nodes in the “Node View” dashboard. [TOOLS-1721]
Monitoring stack 1.3.0
June 4, 2021 | Download
New Features
- Add support for user statistics introduced in Aerospike Server version 5.6. [TOOLS-1715]
Improvements
- Add connections opened/closed metrics introduced in Aerospike Server version 5.6. [TOOLS-1716]
- Add new all flash metrics introduced in Aerospike Server version 5.6. [TOOLS-1717]
- Add other new metrics introduced in Aerospike Server version 5.6.
Bug Fixes
- Fixed 90th percentile latency computation in Latency View dashboard to not use
rate()
. Thanks to @ashangit for the contribution.
Monitoring stack 1.2.1
January 27, 2021 | Download
Improvements
- Added DC
nodes
metric to XDR dashboard.
Monitoring stack 1.2.0
November 16, 2020 | Download
New Features
- Migrate dashboards to Grafana 7. [TOOLS-1589]
Improvements
- Make datasource configurable through a dashboard variable. Thanks to realmgic (Zohar) for the contribution. [TOOLS-1591]
- Alert when ‘close to’ stop writes, when node is proxying and when XDR lag is above a threshold. [TOOLS-1588]
- Add Prometheus’ docker swarm service discovery config to the example. [TOOLS-1590]
Bug Fixes
- Fix units for “Failure rate” panel in Namespace view. [TOOLS-1592]
Monitoring stack 1.1.1
August 31, 2020 | Download
Improvements
- Use latency time unit in queries to support Aerospike’s microsecond histograms. Add variable for latency time unit to
Latency View
andNode Overview
dashboards.
Bug Fixes
- Refresh variables on time range change.
Monitoring stack 1.1.0
July 27, 2020 | Download
New Features
- Add description info to each dashboard panel.
- Add
clock_skew_stop_writes
to Namespace View and Cluster View dashboards. - Add dashboard support for the new latency metrics change in Aerospike Prometheus Exporter v1.1.0.
- Show primary index usage for namespaces using index-type flash or pmem.
Improvements
- Remove aerospike_node_info metric as per Aerospike Prometheus Exporter v1.1.0.
- Increase default Grafana refresh rate to 1m.
Bug Fixes
- Fix primary index usage panel to show values in MiB/GiB.
Monitoring stack 1.0.0
July 27, 2020 | Download
- Initial release.