Skip to main content

Aerospike Unique Data Agent (UDA)

This page describes the Aerospike Unique Data Agent (UDA), how to install and configure it, how it calculates unique data usage, and provides API usage examples.

Overview

A running Aerospike cluster contains data from client applications and internal data used by Aerospike processes. Many clusters also use replication to store multiple copies of application data for improved resiliency. This makes it difficult to manually determine how much cluster data is unique.

UDA monitors your Aerospike cluster and tracks its unique data usage. UDA polls the cluster for memory and disk usage statistics that are relevant to your license agreement and stores those statistics for later processing. UDA integrates natively with asadm and has its own REST API that you can use with a custom client.

Command-line help
> uda --help
An agent for monitoring and querying the unique data usage of your
Aerospike database. Start the agent with the 'start' command. The agent
when started puts entries in a store file. The agent listens on the --agent-port
for requests to query and filter unique data entries. For convenience, there is
an additional 'log' command to query and filter the entries file
created by the agent offline. Additionally, the 'data' command allows you to display your
current unique data usage.

Usage:
uda [command]

Available Commands:
data Gets the current data point. Does not log an entry.
help Help about any command
log For offline processing of the uda.store file.
start Start the agent.

Flags:
--config string Config file (default is /etc/aerospike/astools.conf)
-u, --help Display help information
--instance string For support of the aerospike tools toml schema. Sections with the
instance are read. For example, if instance `a` is specified, then
sections `cluster_a` and `uda_a` are read.
-v, --version version for uda

Use "uda [command] --help" for more information about a command.

Install UDA

UDA is in the Aerospike Tools package versions 7.1.1 and later, which is bundled with Aerospike Database 6.0.0.5 and later.

To install and test UDA, follow these steps:

  1. Edit the TOML configuration file located at /etc/aerospike/astools.conf to connect to the Aerospike Database.

    Add the cluster-specific options as shown in the following example:

     #------------------------------------------------------------------------------
    # Cluster-specific options
    #
    # This section has connection / security / tls specific configuration optoins.
    # Optionally many different named connection instances can be specified.
    #------------------------------------------------------------------------------
    #

    [cluster]
    host = "localhost:cluster_a:3000" # host = "<host>[:<tls-name>][:<port>][,...]"
    user = "<username>"
    password = "<password>"

    If the cluster has TLS enabled, edit the TLS-based connection section as shown in the following example. Uncomment the lines that apply to your deployment.

    #------------------------------------------------------------------------------
    # Transport Level Encryption
    #------------------------------------------------------------------------------
    #
    #
    # tls-enable = true # true enables tls, if false all other tls
    # config are ignored
    # tls-protocols = "TLSv1.2"
    # tls-cipher-suite = "ALL:!COMPLEMENTOFDEFAULT:!eNULL"
    # tls-crl-check = true
    # tls-crl-check-all = true

    # tls-keyfile = "/etc/aerospike/x509_certificates/MultiServer/key.pem"

    # tls-keyfile-password required if tls-keyfile is password protected
    # It can be one of following three format
    # Environment variable: "env:<VAR>"
    # tls-keyfile-password = "env:PEMPWD"
    # File: "file:<PATH>"
    # tls-keyfile-password = "file:/etc/aerospike/x509_certificates/MultiServer/keypwd.txt"
    # String: "<PASSWORD>"
    # tls-keyfile-password = ""

    # One of the tls-cafile or tls-capath is required. If both are specified, everything is loaded.
    #
    # tls-cafile = "/etc/aerospike/x509_certificates/Platinum/cacert.pem"
    # tls-capath = "/etc/aerospike/x509_certificates/Platinum"
    #
    # tls-certfile = "/etc/aerospike/x509_certificates/multi_chain.pem"
    # tls-cert-blacklist = "/etc/aerospike/x509_certificates/blacklist.txt"

    Refer to Configuration and Aerospike Tools Configuration for more details.

  2. Enable the agent.

    • For systems that support systemd:

      systemctl enable uda.service
    • For other systems, such as Docker:

      ./usr/bin/uda --config ./etc/aerospike/astools.conf
  3. Verify that the agent started successfully.

    journalctl -u uda.service -f

    or

    systemctl status
  4. To test the API and see API documentation, run the following command and navigate to the provided URL:

    journalctl -u uda.service | grep API
    Jun 29 14:09:05 ubuntu unique-data[1116531]: time="2021-06-29T14:09:05-07:00" level=info msg="API documentation can be found at http://192.168.1.1:8080/v1/swagger/index.html"

Configuration

Run uda start --help for a list of parameters that may be required to run UDA within your cluster. You can configure UDA using Aerospike Tools configuration files. See Aerospike Tools Configuration for more details.

Instead of saving all parameters and their values in the configuration file, you can pass values in the form of separate files, environment variables, or base64-encoded values instead of plain text.

  • To provide values in a file, create a file with a single line containing the value for the configuration parameter of your choice. In the configuration file, in place of the value, use file:<filename> where <filename> is the name of the file you just created.
  • To provide an environment variable, use env:<variable-name>.
  • To provide a base64-encoded environment variable, use env-b64:<variable-name>.
  • To provide a base64-encoded value, use b64:<base64-encoded-value>.
Command-line help for "start" function
uda start --help                                                                                                                                             15.408s
Starts the agent, stores entries in the provided --store-file
(default: /var/log/aerospike/uda.store), and listens on --agent-port (default: 8080) for
requests to query entries.

Usage:
uda start [flags]

Flags:
-a, --agent-port int Port number for agent to listen on. (default 8080)
--auth INTERNAL,EXTERNAL,PKI The authentication mode used by the server.
INTERNAL uses standard username/password.
EXTERNAL uses external methods, like LDAP, which are configured on the server.
EXTERNAL requires TLS.
PKI allows TLS authentication and authorization based on a certificate, without needing to configure the username. (default INTERNAL)
-h, --host host[:tls-name][:port][,...] The Aerospike host. (default 127.0.0.1)
-P, --password "env-b64:<env-var>,b64:<b64-pass>,file:<pass-file>,<clear-pass>" The Aerospike password to use to connect to the Aerospike cluster.
-p, --port int The default Aerospike port. (default 3000)
-f, --store-file string Specify custom log file. (default "/var/log/aerospike/uda.store")
--tls-cafile env-b64:<cert>,b64:<cert>,<cert-file-name> The CA for the agent.
--tls-certfile env-b64:<cert>,b64:<cert>,<cert-file-name> The certifcate file of the agent for mutual TLS authentication.
--tls-enable Enable TLS authentication. If false, other TLS options are ignored.
--tls-keyfile env-b64:<cert>,b64:<cert>,<cert-file-name> The key file of the agent for mutual TLS authentication.
--tls-keyfile-password "env-b64:<env-var>,b64:<b64-pass>,file:<pass-file>,<clear-pass>" The password used to decrypt the key-file if encrypted.
--tls-name string The server TLS context to use to authenticate the connection.
--tls-protocols "[[+][-]all] [[+][-]TLSv1] [[+][-]TLSv1.1] [[+][-]TLSv1.2]" Set the TLS protocol selection criteria. This format is the same
as Apache's SSLProtocol documented at
https://httpd.apache.org/docs/current/mod/mod_ssl.html#ssl protocol. (default TLSV1.2)
-U, --user string The aerospike user to use to connect to the aerospike cluster.

Global Flags:
--config-file string Config file (default is /etc/aerospike/astools.conf)
-u, --help Display help information
--instance string For support of the aerospike tools toml schema. Sections with the
instance are read. For example, if instance `a` is specified, then
sections `cluster_a` and `uda_a` are read.

Unique data calculation

An Aerospike cluster's unique data is the uncompressed data stored in each namespace, divided by its replication factor, excluding index metadata. UDA calculates unique data across each namespace in the cluster, iterating through the namespaces until it arrives at a grand total for the entire cluster.

UDA uses the following calculation:

Total unique usage bytes per namespace = (Uncompressed bytes in the namespace ÷ Replication factor) - (39 metadata bytes per object × Number of master objects across all nodes)

Aerospike Database versions 6.4 and earlier counted unique data with a more complex method.

Total unique usage bytes per namespace = ((Namespace memory bytes OR Uncompressed namespace device bytes OR Uncompressed namespace PMem bytes) ÷ Replication Factor) - (Metadata bytes per object × Number of master objects across all nodes)

Metadata bytes per object = 35 bytes for Database 5.7 and earlier, 39 bytes for Database 6.0 and later.
Only one of namespace memory bytes, uncompressed namespace device bytes, and uncompressed namespace PMem bytes should be greater than zero.

Storing Entries

UDA stores each entry in a store file every hour on the hour. You can provide a custom log path with --store-file or -f, otherwise the logs are written to the default path /var/log/aerospike/uda.store. Entries are also printed to stderr using the logger.

Entries

There are two types of data entries, "info" and "error", determined by the "level" field.
Each entry type has the same fields.
An "info" entry has all fields filled in except "errors". An "error" entry has an "errors" list with a length greater than zero and will not have accurate values for some or all other fields depending on where in the logging process the error occurred. Each request to retrieve data-entries will receive zero or more JSON objects of the form:

{
"level":"info",
"cluster_name":"null",
"cluster_generation":1,
"cluster_stable": true,
"node_count":5,
"hours_since_start":15,
"time":"2021-06-19T15:51:00.020869089-07:00",
"master_objects":58754,
"unique_data_bytes":927463,
"namespaces": {
<ns1>: {
"master_objects":13507,
"unique_data_bytes":453211,
},
. . .
},
"errors":[]
}

API usage examples

This section includes examples for common operations you can perform with UDA, such as returning specific entries or checking the connection between UDA and your database

Get all entries

Endpoint v1/entries?{key=key&val=value}

Get all entries. Filtering uses the key and val parameters to filter by key-value pairs, such as only returning entries with "level"="error" or "cluster_name"="null".

Response:

{
"entries":
[
{
"cluster_name":"null",
"cluster_generation":1,
"node_count":1,
"hours_since_start":0,
"time":"2021-06-19T15:51:00.020869089-07:00",
"level":"info",
"master_objects":0,
"unique_data_bytes":0,
"errors":[]
},
. . .
]
}

Filter entries by index

Endpoint v1/entries/range/index{?start=start-index&end=end-index&key=key&val=value}

Indicates that the request wants to filter the entries by their index. The first entry will have index = 0 and the last entry will have index = (number of entries - 1). Both start and end are optional and will default to the first and last entry respectively. Entries are always sorted by insertion order, so a lower index will be earlier in time than a higher index. Additionally, key-value filtering is allowed after index filtering.

Response:

{
"entries":
[
{
"cluster_name":"null",
"cluster_generation":1,
"node_count":1,
"hours_since_start":0,
"time":"2021-06-19T15:51:00.020869089-07:00",
"level":"info",
"master_objects":0,
"unique_data_bytes":0,
"errors":[]
},
. . .
]
}

Filter entries by date and/or time

Endpoint v1/entries/range/time{?start=start-time&end=end-time&key=key&val=value}

Values for start-time and end-time should be in the ISO 8601 extended format RFC 3339. Both start and end are optional and default to the first and last entry respectively. Entries are always sorted by insertion order so the response list will be in temporal order. Additionally, key-value filtering is allowed after date and time filtering.

Response:

{
"entries":
[
{
"cluster_name":"null",
"cluster_generation":1,
"node_count":1,
"hours_since_start":0,
"time":"2021-06-19T15:51:00.020869089-07:00",
"level":"info",
"master_objects":0,
"unique_data_bytes":0,
"errors":[]
},
. . .
]
}

Health check

Endpoint: v1/health

A health check endpoint that returns metrics related to the health of the service.

Response:

{
"health": {
"total_missed_since_start": 1234,
"recent_missed_since_start": 0,
"hours_since_start":1484,
}
}

Ping

Endpoint v1/ping

An endpoint to check if a connection can be established with the service.

Response:

string("ping")

Integration with Aerospike Admin

Aerospike Admin (asadm) can connect to UDA when creating a collectinfo archive and when running the summary command. Connecting to UDA gives asadm the ability to display statistics for min, max, and average unique data usage. Use the --agent-host and --agent-port flags with the asadm summary and collectinfo commands to specify how to connect to UDA.

You can also use the --agent-raw-store flag to display namespace-level license usage, include entries where the cluster was reportedly unstable in the summary aggregation using the --agent-unstable flag, and include the uda.store in the collectinfo archive.

Admin> summary --agent-host <agent-host> --agent-port <agent-port>
~~~~~~~~~~~~~~~~~~~~~~~~~~Cluster Summary~~~~~~~~~~~~~~~~~~~~~~~~~~
Migrations |False
Server Version |E-6.0.0.1
OS Version |Ubuntu 20.04.3 LTS (5.4.0-121-generic)
Cluster Size |1
Devices Total |0
Devices Per-Node |0
Devices Equal Across Nodes|True
Memory Total |8.000 GB
Memory Used |8.812 MB
Memory Used % |0.11
Memory Avail |7.991 GB
Memory Avail% |99.89
License Usage Latest |0.000 B
License Usage Latest Time |2022-07-13T15:00:00-07:00
License Usage Min |0.000 B
License Usage Max |11.711 MB
License Usage Avg |4.888 MB
Active |0
Total |2
Active Features |SIndex
Number of rows: 20

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Summary~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Namespace|~~~~Drives~~~~|~~~~~~~Memory~~~~~~~|Replication| Master|~~~~~~~~~~~~~~~~~~~~~~~~~License Usage~~~~~~~~~~~~~~~~~~~~~~~~~
|Total|Per-Node| Total|Used|Avail%| Factors|Objects| Latest| Latest Time| Min| Max| Avg
| | | | %| | | | | | | |
bar | 0| 0|4.000 GB| 0.0| 100.0| 2|0.000 |0.000 B |2022-07-13T15:00:00-07:00|0.000 B |6.104 MB|918.963 KB
test | 0| 0|4.000 GB|0.22| 99.78| 2|0.000 |0.000 B |2022-07-13T15:00:00-07:00|0.000 B |5.607 MB| 3.990 MB
Number of rows: 2

The Latest metric displayed shows the last measured data usage as reported by the agent. This means the Latest value may be up to an hour old.