Aerospike Unique Data Agent (UDA)
This page describes the Aerospike Unique Data Agent (UDA), how to install and configure it, how it calculates unique data usage, and provides API usage examples.
Overview
A running Aerospike cluster contains data from client applications and internal data used by Aerospike processes. Many clusters also use replication to store multiple copies of application data for improved resiliency. This makes it difficult to manually determine how much cluster data is unique.
UDA monitors your Aerospike cluster and tracks its unique data usage. UDA polls the cluster for memory and disk usage statistics that are relevant to your license agreement and stores those statistics for later processing. UDA integrates natively with asadm and has its own REST API that you can use with a custom client.
Command-line help
Install UDA
UDA is in the Aerospike Tools package versions 7.1.1 and later, which is bundled with Aerospike Database 6.0.0.5 and later.
To install and test UDA, follow these steps:
Edit the TOML configuration file located at
/etc/aerospike/astools.conf
to connect to the Aerospike Database.Add the cluster-specific options as shown in the following example:
#------------------------------------------------------------------------------
# Cluster-specific options
#
# This section has connection / security / tls specific configuration optoins.
# Optionally many different named connection instances can be specified.
#------------------------------------------------------------------------------
#
[cluster]
host = "localhost:cluster_a:3000" # host = "<host>[:<tls-name>][:<port>][,...]"
user = "<username>"
password = "<password>"If the cluster has TLS enabled, edit the TLS-based connection section as shown in the following example. Uncomment the lines that apply to your deployment.
#------------------------------------------------------------------------------
# Transport Level Encryption
#------------------------------------------------------------------------------
#
#
# tls-enable = true # true enables tls, if false all other tls
# config are ignored
# tls-protocols = "TLSv1.2"
# tls-cipher-suite = "ALL:!COMPLEMENTOFDEFAULT:!eNULL"
# tls-crl-check = true
# tls-crl-check-all = true
# tls-keyfile = "/etc/aerospike/x509_certificates/MultiServer/key.pem"
# tls-keyfile-password required if tls-keyfile is password protected
# It can be one of following three format
# Environment variable: "env:<VAR>"
# tls-keyfile-password = "env:PEMPWD"
# File: "file:<PATH>"
# tls-keyfile-password = "file:/etc/aerospike/x509_certificates/MultiServer/keypwd.txt"
# String: "<PASSWORD>"
# tls-keyfile-password = ""
# One of the tls-cafile or tls-capath is required. If both are specified, everything is loaded.
#
# tls-cafile = "/etc/aerospike/x509_certificates/Platinum/cacert.pem"
# tls-capath = "/etc/aerospike/x509_certificates/Platinum"
#
# tls-certfile = "/etc/aerospike/x509_certificates/multi_chain.pem"
# tls-cert-blacklist = "/etc/aerospike/x509_certificates/blacklist.txt"Refer to Configuration and Aerospike Tools Configuration for more details.
Enable the agent.
For systems that support systemd:
systemctl enable uda.service
For other systems, such as Docker:
./usr/bin/uda --config ./etc/aerospike/astools.conf
Verify that the agent started successfully.
journalctl -u uda.service -f
or
systemctl status
To test the API and see API documentation, run the following command and navigate to the provided URL:
journalctl -u uda.service | grep API
Jun 29 14:09:05 ubuntu unique-data[1116531]: time="2021-06-29T14:09:05-07:00" level=info msg="API documentation can be found at http://192.168.1.1:8080/v1/swagger/index.html"
Configuration
Run uda start --help
for a list of parameters that may be required to run UDA within your cluster.
You can configure UDA using Aerospike Tools configuration files.
See Aerospike Tools Configuration for more details.
Instead of saving all parameters and their values in the configuration file, you can pass values in the form of separate files, environment variables, or base64-encoded values instead of plain text.
- To provide values in a file, create a file with a single line containing the value for the configuration parameter of your choice.
In the configuration file, in place of the value, use
file:<filename>
where<filename>
is the name of the file you just created. - To provide an environment variable, use
env:<variable-name>
. - To provide a base64-encoded environment variable, use
env-b64:<variable-name>
. - To provide a base64-encoded value, use
b64:<base64-encoded-value>
.
Command-line help for "start" function
Unique data calculation
An Aerospike cluster's unique data is the uncompressed data stored in each namespace, divided by its replication factor, excluding index metadata. UDA calculates unique data across each namespace in the cluster, iterating through the namespaces until it arrives at a grand total for the entire cluster.
UDA uses the following calculation:
Total unique usage bytes per namespace = (Uncompressed bytes in the namespace ÷ Replication factor) - (39 metadata bytes per object × Number of master objects across all nodes)
Aerospike Database versions 6.4 and earlier counted unique data with a more complex method.
Total unique usage bytes per namespace = ((Namespace memory bytes OR Uncompressed namespace device bytes OR Uncompressed namespace PMem bytes) ÷ Replication Factor) - (Metadata bytes per object × Number of master objects across all nodes)
Metadata bytes per object = 35 bytes for Database 5.7 and earlier, 39 bytes for Database 6.0 and later.
Only one of namespace memory bytes, uncompressed namespace device bytes, and uncompressed namespace PMem bytes should be greater than zero.
Storing Entries
UDA stores each entry in a store file every hour on the hour.
You can provide a custom log path with --store-file
or -f
, otherwise the logs are written to the default path /var/log/aerospike/uda.store
.
Entries are also printed to stderr using the logger.
Entries
There are two types of data entries, "info" and "error", determined by the "level" field.
Each entry type has the same fields.
An "info" entry has all fields filled in except "errors".
An "error" entry has an "errors" list with a length greater than zero and will not have accurate values for some or all other fields depending on where in the logging process the error occurred.
Each request to retrieve data-entries will receive zero or more JSON objects of the form:
{
"level":"info",
"cluster_name":"null",
"cluster_generation":1,
"cluster_stable": true,
"node_count":5,
"hours_since_start":15,
"time":"2021-06-19T15:51:00.020869089-07:00",
"master_objects":58754,
"unique_data_bytes":927463,
"namespaces": {
<ns1>: {
"master_objects":13507,
"unique_data_bytes":453211,
},
. . .
},
"errors":[]
}
API usage examples
This section includes examples for common operations you can perform with UDA, such as returning specific entries or checking the connection between UDA and your database
Get all entries
Endpoint
v1/entries?{key=key&val=value}
Get all entries.
Filtering uses the key
and val
parameters to filter by key-value pairs, such as only returning entries with "level"="error" or "cluster_name"="null".
Response:
{
"entries":
[
{
"cluster_name":"null",
"cluster_generation":1,
"node_count":1,
"hours_since_start":0,
"time":"2021-06-19T15:51:00.020869089-07:00",
"level":"info",
"master_objects":0,
"unique_data_bytes":0,
"errors":[]
},
. . .
]
}
Filter entries by index
Endpoint
v1/entries/range/index{?start=start-index&end=end-index&key=key&val=value}
Indicates that the request wants to filter the entries by their index. The first entry will have index = 0
and the last entry will have index = (number of entries - 1)
.
Both start
and end
are optional and will default to the first and last entry respectively.
Entries are always sorted by insertion order, so a lower index will be earlier in time than a higher index. Additionally, key-value filtering is allowed after index filtering.
Response:
{
"entries":
[
{
"cluster_name":"null",
"cluster_generation":1,
"node_count":1,
"hours_since_start":0,
"time":"2021-06-19T15:51:00.020869089-07:00",
"level":"info",
"master_objects":0,
"unique_data_bytes":0,
"errors":[]
},
. . .
]
}
Filter entries by date and/or time
Endpoint
v1/entries/range/time{?start=start-time&end=end-time&key=key&val=value}
Values for start-time
and end-time
should be in the ISO 8601 extended format RFC 3339.
Both start
and end
are optional and default to the first and last entry respectively.
Entries are always sorted by insertion order so the response list will be in temporal order.
Additionally, key-value filtering is allowed after date and time filtering.
Response:
{
"entries":
[
{
"cluster_name":"null",
"cluster_generation":1,
"node_count":1,
"hours_since_start":0,
"time":"2021-06-19T15:51:00.020869089-07:00",
"level":"info",
"master_objects":0,
"unique_data_bytes":0,
"errors":[]
},
. . .
]
}
Health check
Endpoint:
v1/health
A health check endpoint that returns metrics related to the health of the service.
Response:
{
"health": {
"total_missed_since_start": 1234,
"recent_missed_since_start": 0,
"hours_since_start":1484,
}
}
Ping
Endpoint
v1/ping
An endpoint to check if a connection can be established with the service.
Response:
string("ping")
Integration with Aerospike Admin
Aerospike Admin (asadm
) can connect to UDA when creating a collectinfo
archive and when running the summary
command.
Connecting to UDA gives asadm
the ability to display statistics for min, max, and average unique data usage.
Use the --agent-host
and --agent-port
flags with the asadm
summary
and collectinfo
commands to specify how to connect to UDA.
You can also use the --agent-raw-store
flag to display namespace-level license usage, include entries where the cluster was reportedly unstable in the summary
aggregation using the --agent-unstable
flag, and include the uda.store in the collectinfo
archive.
Admin> summary --agent-host <agent-host> --agent-port <agent-port>
~~~~~~~~~~~~~~~~~~~~~~~~~~Cluster Summary~~~~~~~~~~~~~~~~~~~~~~~~~~
Migrations |False
Server Version |E-6.0.0.1
OS Version |Ubuntu 20.04.3 LTS (5.4.0-121-generic)
Cluster Size |1
Devices Total |0
Devices Per-Node |0
Devices Equal Across Nodes|True
Memory Total |8.000 GB
Memory Used |8.812 MB
Memory Used % |0.11
Memory Avail |7.991 GB
Memory Avail% |99.89
License Usage Latest |0.000 B
License Usage Latest Time |2022-07-13T15:00:00-07:00
License Usage Min |0.000 B
License Usage Max |11.711 MB
License Usage Avg |4.888 MB
Active |0
Total |2
Active Features |SIndex
Number of rows: 20
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Summary~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Namespace|~~~~Drives~~~~|~~~~~~~Memory~~~~~~~|Replication| Master|~~~~~~~~~~~~~~~~~~~~~~~~~License Usage~~~~~~~~~~~~~~~~~~~~~~~~~
|Total|Per-Node| Total|Used|Avail%| Factors|Objects| Latest| Latest Time| Min| Max| Avg
| | | | %| | | | | | | |
bar | 0| 0|4.000 GB| 0.0| 100.0| 2|0.000 |0.000 B |2022-07-13T15:00:00-07:00|0.000 B |6.104 MB|918.963 KB
test | 0| 0|4.000 GB|0.22| 99.78| 2|0.000 |0.000 B |2022-07-13T15:00:00-07:00|0.000 B |5.607 MB| 3.990 MB
Number of rows: 2
The Latest
metric displayed shows the last measured data usage as reported by the agent.
This means the Latest
value may be up to an hour old.