# Aerospike Unique Data Agent (UDA)

This page describes the Aerospike Unique Data Agent (UDA), how to install and configure it, how it calculates unique data usage, and provides API usage examples.

::: danger
UDA uses an obsolete calculation and has been removed from Aerospike tools 12.0.0. You can use the [Aerospike Grafana dashboards](https://grafana.com/orgs/aerospike/dashboards) if you have Prometheus installed, or the [Aerospike monitoring stack](https://aerospike.com/docs/database/observe/monitor/components/) to track unique data usage by your clusters.
:::

## Overview

A running Aerospike cluster contains data from client applications and internal data used by Aerospike processes. Many clusters also use replication to store multiple copies of application data for improved resiliency. This makes it difficult to manually determine how much cluster data is unique.

UDA monitors your Aerospike cluster and tracks its unique data usage. UDA polls the cluster for memory and disk usage statistics that are relevant to your license agreement and stores those statistics for later processing. UDA integrates natively with [asadm](#integration-with-aerospike-admin) and has its own [REST API](#api-usage-examples) that you can use with a custom client.

Command-line help

```text
> uda --help

An agent for monitoring and querying the unique data usage of your

Aerospike database. Start the agent with the 'start' command. The agent

when started puts entries in a store file. The agent listens on the --agent-port

for requests to query and filter unique data entries.  For convenience, there is

an additional 'log' command to query and filter the entries file

created by the agent offline. Additionally, the 'data' command allows you to display your

current unique data usage.

Usage:

  uda [command]

Available Commands:

  data        Gets the current data point. Does not log an entry.

  help        Help about any command

  log         For offline processing of the uda.store file.

  start       Start the agent.

Flags:

      --config string     Config file (default is /etc/aerospike/astools.conf)

  -u, --help              Display help information

      --instance string   For support of the aerospike tools toml schema. Sections with the

                          instance are read. For example, if instance `a` is specified, then

                          sections `cluster_a` and `uda_a` are read.

  -v, --version           version for uda

Use "uda [command] --help" for more information about a command.
```

## Install UDA

UDA is available in Aerospike tools packages 7.1.1 to 11.2.2, which are bundled with Aerospike Database 6.0.0.5 and later.

To install and test UDA, follow these steps:

1.  Edit the TOML configuration file located at `/etc/aerospike/astools.conf` to connect to the Aerospike Database.
    
    Add the cluster-specific options as shown in the following example:
    
    ```text
    #------------------------------------------------------------------------------
    
     # Cluster-specific options
    
     #
    
     # This section has connection / security / TLS specific configuration options.
    
     # Optionally many different named connection instances can be specified.
    
     #------------------------------------------------------------------------------
    
     #
    
     [cluster]
    
     host = "localhost:cluster_a:3000"                     # host = "<host>[:<tls-name>][:<port>][,...]"
    
     user = "<username>"
    
     password = "<password>"
    ```
    
    If the cluster has TLS enabled, edit the TLS-based connection section as shown in the following example. Uncomment the lines that apply to your deployment.
    
    ```text
    #------------------------------------------------------------------------------
    
    # Transport Level Encryption
    
    #------------------------------------------------------------------------------
    
    #
    
    #
    
    # tls-enable = true                  # true enables tls, if false all other tls
    
                                          # config are ignored
    
    # tls-protocols = "TLSv1.2"
    
    # tls-cipher-suite = "ALL:!COMPLEMENTOFDEFAULT:!eNULL"
    
    # tls-crl-check = true
    
    # tls-crl-check-all = true
    
    # tls-keyfile = "/etc/aerospike/x509_certificates/MultiServer/key.pem"
    
    # tls-keyfile-password required if tls-keyfile is password protected
    
    # It can be one of the following three formats
    
    # Environment variable: "env:<VAR>"
    
    # tls-keyfile-password = "env:PEMPWD"
    
    # File: "file:<PATH>"
    
    # tls-keyfile-password = "file:/etc/aerospike/x509_certificates/MultiServer/keypwd.txt"
    
    # String: "<PASSWORD>"
    
    # tls-keyfile-password = ""
    
    # One of the tls-cafile or tls-capath is required. If both are specified, everything is loaded.
    
    #
    
    # tls-cafile = "/etc/aerospike/x509_certificates/Platinum/cacert.pem"
    
    # tls-capath = "/etc/aerospike/x509_certificates/Platinum"
    
    #
    
    # tls-certfile = "/etc/aerospike/x509_certificates/multi_chain.pem"
    
    # tls-cert-blacklist = "/etc/aerospike/x509_certificates/blacklist.txt"
    ```
    
    Refer to [Configuration](#configuration) and [Aerospike Tools Configuration](https://aerospike.com/docs/database/tools/conffile) for more details.
    
2.  Enable the agent.
    
    -   For systems that support systemd:
        
        Terminal window
        
        ```bash
        systemctl enable uda.service
        ```
        
    -   For other systems, such as Docker:
        
        Terminal window
        
        ```bash
        ./usr/bin/uda --config ./etc/aerospike/astools.conf
        ```
        
3.  Verify that the agent started successfully.
    
    Terminal window
    
    ```bash
    journalctl -u uda.service -f
    ```
    
    or
    
    Terminal window
    
    ```bash
    systemctl status
    ```
    
4.  To test the API and see API documentation, run the following command and navigate to the provided URL:
    
    Terminal window
    
    ```bash
    journalctl -u uda.service | grep API
    
    Jun 29 14:09:05 ubuntu unique-data[1116531]: time="2021-06-29T14:09:05-07:00" level=info msg="API documentation can be found at http://192.168.1.1:8080/v1/swagger/index.html"
    ```
    

## Configuration

Run `uda start --help` for a list of parameters that may be required to run UDA within your cluster. You can configure UDA using Aerospike Tools configuration files. See [Aerospike Tools Configuration](https://aerospike.com/docs/database/tools/conffile) for more details.

Instead of saving all parameters and their values in the configuration file, you can pass values in the form of separate files, environment variables, or base64-encoded values instead of plain text.

-   To provide values in a file, create a file with a single line containing the value for the configuration parameter of your choice. In the configuration file, in place of the value, use `file:<filename>` where `<filename>` is the name of the file you just created.
-   To provide an environment variable, use `env:<variable-name>`.
-   To provide a base64-encoded environment variable, use `env-b64:<variable-name>`.
-   To provide a base64-encoded value, use `b64:<base64-encoded-value>`.

Command-line help for "start" function

Terminal window

```bash
uda start --help                                                                                                                                             15.408s

Starts the agent, stores entries in the provided --store-file

(default: /var/log/aerospike/uda.store), and listens on --agent-port (default: 8080) for

requests to query entries.

Usage:

  uda start [flags]

Flags:

  -a, --agent-port int                                                                          Port number for agent to listen on. (default 8080)

      --auth INTERNAL,EXTERNAL,PKI                                                              The authentication mode used by the server.

                                                                                                INTERNAL uses standard username/password.

                                                                                                EXTERNAL uses external methods, like LDAP, which are configured on the server.

                                                                                                EXTERNAL requires TLS.

                                                                                                PKI allows TLS authentication and authorization based on a certificate, without needing to configure the username. (default INTERNAL)

  -h, --host host[:tls-name][:port][,...]                                                       The Aerospike host. (default 127.0.0.1)

  -P, --password "env-b64:<env-var>,b64:<b64-pass>,file:<pass-file>,<clear-pass>"               The Aerospike password to use to connect to the Aerospike cluster.

  -p, --port int                                                                                The default Aerospike port. (default 3000)

  -f, --store-file string                                                                       Specify custom log file. (default "/var/log/aerospike/uda.store")

      --tls-cafile env-b64:<cert>,b64:<cert>,<cert-file-name>                                   The CA for the agent.

      --tls-certfile env-b64:<cert>,b64:<cert>,<cert-file-name>                                 The certificate file of the agent for mutual TLS authentication.

      --tls-enable                                                                              Enable TLS authentication. If false, other TLS options are ignored.

      --tls-keyfile env-b64:<cert>,b64:<cert>,<cert-file-name>                                  The key file of the agent for mutual TLS authentication.

      --tls-keyfile-password "env-b64:<env-var>,b64:<b64-pass>,file:<pass-file>,<clear-pass>"   The password used to decrypt the key-file if encrypted.

      --tls-name string                                                                         The server TLS context to use to authenticate the connection.

      --tls-protocols "[[+][-]all] [[+][-]TLSv1] [[+][-]TLSv1.1] [[+][-]TLSv1.2]"               Set the TLS protocol selection criteria. This format is the same

                                                                                                as Apache's SSLProtocol documented at

                                                                                                https://httpd.apache.org/docs/current/mod/mod_ssl.html#ssl protocol. (default TLSV1.2)

  -U, --user string                                                                             The aerospike user to use to connect to the aerospike cluster.

Global Flags:

      --config-file string   Config file (default is /etc/aerospike/astools.conf)

  -u, --help                 Display help information

      --instance string      For support of the aerospike tools toml schema. Sections with the

                             instance are read. For example, if instance `a` is specified, then

                             sections `cluster_a` and `uda_a` are read.
```

## Unique data calculation

::: caution
UDA’s unique data calculation is incorrect. See the FAQ [How is unique data counted?](https://aerospike.com/docs/database/reference/faq/#how-is-unique-data-counted) for more information.
:::

An Aerospike cluster’s unique data is the uncompressed data stored in each namespace, divided by its replication factor, excluding index metadata. UDA calculates unique data across each namespace in the cluster, iterating through the namespaces until it arrives at a grand total for the entire cluster.

UDA uses the following calculation:

```text
Total unique usage bytes per namespace = (Uncompressed bytes in the namespace ÷ Replication factor) - (39 metadata bytes per object × Number of master objects across all nodes)
```

Aerospike Database versions 6.4.0 and earlier counted unique data with a more complex method.

```text
Total unique usage bytes per namespace = ((Namespace memory bytes OR Uncompressed namespace device bytes OR Uncompressed namespace PMem bytes) ÷ Replication Factor) - (Metadata bytes per object × Number of master objects across all nodes)

Metadata bytes per object = 35 bytes for Database 5.7.0 and earlier, 39 bytes for Database 6.0.0 and later.

Only one of namespace memory bytes, uncompressed namespace device bytes, and uncompressed namespace PMem bytes should be greater than zero.
```

## Storing Entries

UDA stores each entry in a store file every hour on the hour. You can provide a custom log path with `--store-file` or `-f`, otherwise the logs are written to the default path `/var/log/aerospike/uda.store`. Entries are also printed to stderr using the logger.

## Entries

There are two types of data entries, “info” and “error”, determined by the “level” field.  
Each entry type has the same fields.  
An “info” entry has all fields filled in except “errors”. An “error” entry has an “errors” list with a length greater than zero and will not have accurate values for some or all other fields depending on where in the logging process the error occurred. Each request to retrieve data-entries will receive zero or more JSON objects of the form:

```plaintext
{

    "level":"info",

    "cluster_name":"null",

    "cluster_generation":1,

    "cluster_stable": true,

    "node_count":5,

    "hours_since_start":15,

    "time":"2021-06-19T15:51:00.020869089-07:00",

    "master_objects":58754,

    "unique_data_bytes":927463,

    "namespaces": {

        <ns1>: {

                "master_objects":13507,

                "unique_data_bytes":453211,

        },

        . . .

    },

    "errors":[]

}
```

## API usage examples

This section includes examples for common operations you can perform with UDA, such as returning specific entries or checking the connection between UDA and your database

### Get all entries

_Endpoint_ `v1/entries?{key=key&val=value}`

Get all entries. Filtering uses the `key` and `val` parameters to filter by key-value pairs, such as only returning entries with “level”=“error” or “cluster\_name”=“null”.

_Response:_

```plaintext
{

    "entries":

        [

            {

                "cluster_name":"null",

                "cluster_generation":1,

                "node_count":1,

                "hours_since_start":0,

                "time":"2021-06-19T15:51:00.020869089-07:00",

                "level":"info",

                "master_objects":0,

                "unique_data_bytes":0,

                "errors":[]

            },

            . . .

        ]

}
```

### Filter entries by index

_Endpoint_ `v1/entries/range/index{?start=start-index&end=end-index&key=key&val=value}`

Indicates that the request wants to filter the entries by their index. The first entry will have `index = 0` and the last entry will have `index = (number of entries - 1)`. Both `start` and `end` are optional and will default to the first and last entry respectively. Entries are always sorted by insertion order, so a lower index will be earlier in time than a higher index. Additionally, key-value filtering is allowed after index filtering.

_Response:_

```plaintext
{

    "entries":

        [

            {

                "cluster_name":"null",

                "cluster_generation":1,

                "node_count":1,

                "hours_since_start":0,

                "time":"2021-06-19T15:51:00.020869089-07:00",

                "level":"info",

                "master_objects":0,

                "unique_data_bytes":0,

                "errors":[]

            },

            . . .

        ]

}
```

### Filter entries by date and/or time

_Endpoint_ `v1/entries/range/time{?start=start-time&end=end-time&key=key&val=value}`

Values for `start-time` and `end-time` should be in the ISO 8601 extended format RFC 3339. Both `start` and `end` are optional and default to the first and last entry respectively. Entries are always sorted by insertion order so the response list will be in temporal order. Additionally, key-value filtering is allowed after date and time filtering.

_Response:_

```plaintext
{

    "entries":

        [

            {

                "cluster_name":"null",

                "cluster_generation":1,

                "node_count":1,

                "hours_since_start":0,

                "time":"2021-06-19T15:51:00.020869089-07:00",

                "level":"info",

                "master_objects":0,

                "unique_data_bytes":0,

                "errors":[]

            },

            . . .

        ]

}
```

### Health check

_Endpoint:_ `v1/health`

A health check endpoint that returns metrics related to the health of the service.

_Response:_

```plaintext
{

    "health": {

        "total_missed_since_start": 1234,

        "recent_missed_since_start": 0,

        "hours_since_start":1484,

    }

}
```

### Ping

_Endpoint_ `v1/ping`

An endpoint to check if a connection can be established with the service.

_Response:_

```plaintext
string("ping")
```

## Integration with Aerospike Admin

Aerospike Admin (`asadm`) can connect to UDA when creating a `collectinfo` archive and when running the `summary` command. Connecting to UDA gives `asadm` the ability to display statistics for min, max, and average unique data usage. Use the `--agent-host` and `--agent-port` flags with the `asadm` `summary` and `collectinfo` commands to specify how to connect to UDA.

You can also use the `--agent-raw-store` flag to display namespace-level license usage, include entries where the cluster was reportedly unstable in the `summary` aggregation using the `--agent-unstable` flag, and include the uda.store in the `collectinfo` archive.

Example: summary command output

```text
Admin> summary --agent-host <agent-host> --agent-port <agent-port>

~~~~~~~~~~~~~~~~~~~~~~Cluster Summary~~~~~~~~~~~~~~~~~~~~~~~

Migrations                |False

Cluster Name              |mydc

Server Version            |E-8.1.0.1

OS Version                |--

Cluster Size              |3

Devices Total             |3

Devices Per-Node          |1

Devices Equal Across Nodes|True

Shmem Index Used          |122.119 MB

Device Total              |12.000 GB

Device Used               |122.123 MB

Device Used%              |0.99 %

Device Avail              |11.760 GB

Device Avail%             |98.0 %

License Usage Latest      |(61.062 MB) ?

Namespaces Active         |1

Namespaces Total          |1

Active Features           |KVS,PIndex Query,Index-on-shmem

Number of rows: 18

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Summary~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Namespace|~~~~Drives~~~~|~~~~~~~~~Device~~~~~~~~|Replication|Cache| Master|Compression|~License~

         |Total|Per-Node|    Total| Used%|Avail%|    Factors|Read%|Objects|      Ratio|~~Usage~~

         |     |        |         |      |      |           |     |       |           |   Latest

test     |    3|       1|12.000 GB|0.99 %|98.0 %|          2|0.0 %|1.000 M|        1.0|(61.062 MB) ?

Number of rows: 1

The license usage calculation is inaccurate due to compression.
```

::: note
Starting in `asadm` 4.1.0 and tools package 12.1.0, if `asdb-compression` is `true` in the Aerospike feature key file, the `summary` command includes a warning to indicate that the license usage calculation is inaccurate.
:::

The `Latest` metric displayed shows the last measured data usage as reported by the agent. This means the `Latest` value may be up to an hour old.