Aerospike Unique Data Agent (UDA)
This page describes the Aerospike Unique Data Agent (UDA), how to install and configure it, how it calculates unique data usage, and provides API usage examples.
Overview
A running Aerospike cluster contains data from client applications and internal data used by Aerospike processes. Many clusters also use replication to store multiple copies of application data for improved resiliency. This makes it difficult to manually determine how much cluster data is unique.
UDA monitors your Aerospike cluster and tracks its unique data usage. UDA polls the cluster for memory and disk usage statistics that are relevant to your license agreement and stores those statistics for later processing. UDA integrates natively with asadm and has its own REST API that you can use with a custom client.
Command-line help
> uda --helpAn agent for monitoring and querying the unique data usage of yourAerospike database. Start the agent with the 'start' command. The agentwhen started puts entries in a store file. The agent listens on the --agent-portfor requests to query and filter unique data entries. For convenience, there isan additional 'log' command to query and filter the entries filecreated by the agent offline. Additionally, the 'data' command allows you to display yourcurrent unique data usage.
Usage: uda [command]
Available Commands: data Gets the current data point. Does not log an entry. help Help about any command log For offline processing of the uda.store file. start Start the agent.
Flags: --config string Config file (default is /etc/aerospike/astools.conf) -u, --help Display help information --instance string For support of the aerospike tools toml schema. Sections with the instance are read. For example, if instance `a` is specified, then sections `cluster_a` and `uda_a` are read. -v, --version version for uda
Use "uda [command] --help" for more information about a command.
Install UDA
UDA is in the Aerospike Tools package versions 7.1.1 and later, which is bundled with Aerospike Database 6.0.0.5 and later.
To install and test UDA, follow these steps:
-
Edit the TOML configuration file located at
/etc/aerospike/astools.conf
to connect to the Aerospike Database.Add the cluster-specific options as shown in the following example:
#------------------------------------------------------------------------------# Cluster-specific options## This section has connection / security / tls specific configuration optoins.# Optionally many different named connection instances can be specified.#------------------------------------------------------------------------------#[cluster]host = "localhost:cluster_a:3000" # host = "<host>[:<tls-name>][:<port>][,...]"user = "<username>"password = "<password>"If the cluster has TLS enabled, edit the TLS-based connection section as shown in the following example. Uncomment the lines that apply to your deployment.
#------------------------------------------------------------------------------# Transport Level Encryption#------------------------------------------------------------------------------### tls-enable = true # true enables tls, if false all other tls# config are ignored# tls-protocols = "TLSv1.2"# tls-cipher-suite = "ALL:!COMPLEMENTOFDEFAULT:!eNULL"# tls-crl-check = true# tls-crl-check-all = true# tls-keyfile = "/etc/aerospike/x509_certificates/MultiServer/key.pem"# tls-keyfile-password required if tls-keyfile is password protected# It can be one of following three format# Environment variable: "env:<VAR>"# tls-keyfile-password = "env:PEMPWD"# File: "file:<PATH>"# tls-keyfile-password = "file:/etc/aerospike/x509_certificates/MultiServer/keypwd.txt"# String: "<PASSWORD>"# tls-keyfile-password = ""# One of the tls-cafile or tls-capath is required. If both are specified, everything is loaded.## tls-cafile = "/etc/aerospike/x509_certificates/Platinum/cacert.pem"# tls-capath = "/etc/aerospike/x509_certificates/Platinum"## tls-certfile = "/etc/aerospike/x509_certificates/multi_chain.pem"# tls-cert-blacklist = "/etc/aerospike/x509_certificates/blacklist.txt"Refer to Configuration and Aerospike Tools Configuration for more details.
-
Enable the agent.
-
For systems that support systemd:
Terminal window systemctl enable uda.service -
For other systems, such as Docker:
Terminal window ./usr/bin/uda --config ./etc/aerospike/astools.conf
-
-
Verify that the agent started successfully.
Terminal window journalctl -u uda.service -for
Terminal window systemctl status -
To test the API and see API documentation, run the following command and navigate to the provided URL:
Terminal window journalctl -u uda.service | grep APIJun 29 14:09:05 ubuntu unique-data[1116531]: time="2021-06-29T14:09:05-07:00" level=info msg="API documentation can be found at http://192.168.1.1:8080/v1/swagger/index.html"
Configuration
Run uda start --help
for a list of parameters that may be required to run UDA within your cluster.
You can configure UDA using Aerospike Tools configuration files.
See Aerospike Tools Configuration for more details.
Instead of saving all parameters and their values in the configuration file, you can pass values in the form of separate files, environment variables, or base64-encoded values instead of plain text.
- To provide values in a file, create a file with a single line containing the value for the configuration parameter of your choice.
In the configuration file, in place of the value, use
file:<filename>
where<filename>
is the name of the file you just created. - To provide an environment variable, use
env:<variable-name>
. - To provide a base64-encoded environment variable, use
env-b64:<variable-name>
. - To provide a base64-encoded value, use
b64:<base64-encoded-value>
.
Command-line help for "start" function
uda start --help 15.408sStarts the agent, stores entries in the provided --store-file(default: /var/log/aerospike/uda.store), and listens on --agent-port (default: 8080) forrequests to query entries.
Usage: uda start [flags]
Flags: -a, --agent-port int Port number for agent to listen on. (default 8080) --auth INTERNAL,EXTERNAL,PKI The authentication mode used by the server. INTERNAL uses standard username/password. EXTERNAL uses external methods, like LDAP, which are configured on the server. EXTERNAL requires TLS. PKI allows TLS authentication and authorization based on a certificate, without needing to configure the username. (default INTERNAL) -h, --host host[:tls-name][:port][,...] The Aerospike host. (default 127.0.0.1) -P, --password "env-b64:<env-var>,b64:<b64-pass>,file:<pass-file>,<clear-pass>" The Aerospike password to use to connect to the Aerospike cluster. -p, --port int The default Aerospike port. (default 3000) -f, --store-file string Specify custom log file. (default "/var/log/aerospike/uda.store") --tls-cafile env-b64:<cert>,b64:<cert>,<cert-file-name> The CA for the agent. --tls-certfile env-b64:<cert>,b64:<cert>,<cert-file-name> The certifcate file of the agent for mutual TLS authentication. --tls-enable Enable TLS authentication. If false, other TLS options are ignored. --tls-keyfile env-b64:<cert>,b64:<cert>,<cert-file-name> The key file of the agent for mutual TLS authentication. --tls-keyfile-password "env-b64:<env-var>,b64:<b64-pass>,file:<pass-file>,<clear-pass>" The password used to decrypt the key-file if encrypted. --tls-name string The server TLS context to use to authenticate the connection. --tls-protocols "[[+][-]all] [[+][-]TLSv1] [[+][-]TLSv1.1] [[+][-]TLSv1.2]" Set the TLS protocol selection criteria. This format is the same as Apache's SSLProtocol documented at https://httpd.apache.org/docs/current/mod/mod_ssl.html#ssl protocol. (default TLSV1.2) -U, --user string The aerospike user to use to connect to the aerospike cluster.
Global Flags: --config-file string Config file (default is /etc/aerospike/astools.conf) -u, --help Display help information --instance string For support of the aerospike tools toml schema. Sections with the instance are read. For example, if instance `a` is specified, then sections `cluster_a` and `uda_a` are read.
Unique data calculation
An Aerospike cluster’s unique data is the uncompressed data stored in each namespace, divided by its replication factor, excluding index metadata. UDA calculates unique data across each namespace in the cluster, iterating through the namespaces until it arrives at a grand total for the entire cluster.
UDA uses the following calculation:
Total unique usage bytes per namespace = (Uncompressed bytes in the namespace ÷ Replication factor) - (39 metadata bytes per object × Number of master objects across all nodes)
Aerospike Database versions 6.4 and earlier counted unique data with a more complex method.
Total unique usage bytes per namespace = ((Namespace memory bytes OR Uncompressed namespace device bytes OR Uncompressed namespace PMem bytes) ÷ Replication Factor) - (Metadata bytes per object × Number of master objects across all nodes)
Metadata bytes per object = 35 bytes for Database 5.7 and earlier, 39 bytes for Database 6.0 and later.Only one of namespace memory bytes, uncompressed namespace device bytes, and uncompressed namespace PMem bytes should be greater than zero.
Storing Entries
UDA stores each entry in a store file every hour on the hour.
You can provide a custom log path with --store-file
or -f
, otherwise the logs are written to the default path /var/log/aerospike/uda.store
.
Entries are also printed to stderr using the logger.
Entries
There are two types of data entries, “info” and “error”, determined by the “level” field.
Each entry type has the same fields.
An “info” entry has all fields filled in except “errors”.
An “error” entry has an “errors” list with a length greater than zero and will not have accurate values for some or all other fields depending on where in the logging process the error occurred.
Each request to retrieve data-entries will receive zero or more JSON objects of the form:
{ "level":"info", "cluster_name":"null", "cluster_generation":1, "cluster_stable": true, "node_count":5, "hours_since_start":15, "time":"2021-06-19T15:51:00.020869089-07:00", "master_objects":58754, "unique_data_bytes":927463, "namespaces": { <ns1>: { "master_objects":13507, "unique_data_bytes":453211, }, . . . }, "errors":[]}
API usage examples
This section includes examples for common operations you can perform with UDA, such as returning specific entries or checking the connection between UDA and your database
Get all entries
Endpoint
v1/entries?{key=key&val=value}
Get all entries.
Filtering uses the key
and val
parameters to filter by key-value pairs, such as only returning entries with “level”=“error” or “cluster_name”=“null”.
Response:
{ "entries": [ { "cluster_name":"null", "cluster_generation":1, "node_count":1, "hours_since_start":0, "time":"2021-06-19T15:51:00.020869089-07:00", "level":"info", "master_objects":0, "unique_data_bytes":0, "errors":[] }, . . . ]}
Filter entries by index
Endpoint
v1/entries/range/index{?start=start-index&end=end-index&key=key&val=value}
Indicates that the request wants to filter the entries by their index. The first entry will have index = 0
and the last entry will have index = (number of entries - 1)
.
Both start
and end
are optional and will default to the first and last entry respectively.
Entries are always sorted by insertion order, so a lower index will be earlier in time than a higher index. Additionally, key-value filtering is allowed after index filtering.
Response:
{ "entries": [ { "cluster_name":"null", "cluster_generation":1, "node_count":1, "hours_since_start":0, "time":"2021-06-19T15:51:00.020869089-07:00", "level":"info", "master_objects":0, "unique_data_bytes":0, "errors":[] }, . . . ]}
Filter entries by date and/or time
Endpoint
v1/entries/range/time{?start=start-time&end=end-time&key=key&val=value}
Values for start-time
and end-time
should be in the ISO 8601 extended format RFC 3339.
Both start
and end
are optional and default to the first and last entry respectively.
Entries are always sorted by insertion order so the response list will be in temporal order.
Additionally, key-value filtering is allowed after date and time filtering.
Response:
{ "entries": [ { "cluster_name":"null", "cluster_generation":1, "node_count":1, "hours_since_start":0, "time":"2021-06-19T15:51:00.020869089-07:00", "level":"info", "master_objects":0, "unique_data_bytes":0, "errors":[] }, . . . ]}
Health check
Endpoint:
v1/health
A health check endpoint that returns metrics related to the health of the service.
Response:
{ "health": { "total_missed_since_start": 1234, "recent_missed_since_start": 0, "hours_since_start":1484, }}
Ping
Endpoint
v1/ping
An endpoint to check if a connection can be established with the service.
Response:
string("ping")
Integration with Aerospike Admin
Aerospike Admin (asadm
) can connect to UDA when creating a collectinfo
archive and when running the summary
command.
Connecting to UDA gives asadm
the ability to display statistics for min, max, and average unique data usage.
Use the --agent-host
and --agent-port
flags with the asadm
summary
and collectinfo
commands to specify how to connect to UDA.
You can also use the --agent-raw-store
flag to display namespace-level license usage, include entries where the cluster was reportedly unstable in the summary
aggregation using the --agent-unstable
flag, and include the uda.store in the collectinfo
archive.
Admin> summary --agent-host <agent-host> --agent-port <agent-port>~~~~~~~~~~~~~~~~~~~~~~~~~~Cluster Summary~~~~~~~~~~~~~~~~~~~~~~~~~~Migrations |FalseServer Version |E-6.0.0.1OS Version |Ubuntu 20.04.3 LTS (5.4.0-121-generic)Cluster Size |1Devices Total |0Devices Per-Node |0Devices Equal Across Nodes|TrueMemory Total |8.000 GBMemory Used |8.812 MBMemory Used % |0.11Memory Avail |7.991 GBMemory Avail% |99.89License Usage Latest |0.000 BLicense Usage Latest Time |2022-07-13T15:00:00-07:00License Usage Min |0.000 BLicense Usage Max |11.711 MBLicense Usage Avg |4.888 MBActive |0Total |2Active Features |SIndexNumber of rows: 20
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Summary~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace|~~~~Drives~~~~|~~~~~~~Memory~~~~~~~|Replication| Master|~~~~~~~~~~~~~~~~~~~~~~~~~License Usage~~~~~~~~~~~~~~~~~~~~~~~~~ |Total|Per-Node| Total|Used|Avail%| Factors|Objects| Latest| Latest Time| Min| Max| Avg | | | | %| | | | | | | |bar | 0| 0|4.000 GB| 0.0| 100.0| 2|0.000 |0.000 B |2022-07-13T15:00:00-07:00|0.000 B |6.104 MB|918.963 KBtest | 0| 0|4.000 GB|0.22| 99.78| 2|0.000 |0.000 B |2022-07-13T15:00:00-07:00|0.000 B |5.607 MB| 3.990 MBNumber of rows: 2
The Latest
metric displayed shows the last measured data usage as reported by the agent.
This means the Latest
value may be up to an hour old.