Data validation (asvalidation)
The asvalidation
tool checks the validity of collection data type (CDT) bins in a namespace. It also has an option to fix some of the problems it discovers.
Recommendations and server versionsโ
The Aerospike CDT Validation Tool addresses the following types of CDT issues, which require certain Aerospike Database versions to detect or correct.
Download asvalidationโ
Download asvalidation
, or visit its repo aerospike/aerospike-tools-validation
.
Descriptions of possible corruption reasonsโ
Reason | Description | Disposition |
---|---|---|
Has non-storage | The bin contains an infinite or wildcard element which is not allowed as storage. | This type of error is unfixable without your manual intervention. |
Has duplicate keys | A map bin has duplicate key entries. | This type of error is unfixable without your manual intervention. |
Corrupted | A problem not attributable to any of the other categories of errors. | This type of error is unfixable without your manual intervention. |
Invalid Keys | The bin has a map with at least one invalid key. | This type of error is unfixable without your manual intervention. |
Order | The bin has elements out of order. | Can be fixed by reordering the list with the --cdt-fix-ordered-list-unique option. |
Padding | The bin has garbage bytes after the valid list or map. | Can be fixed by truncating the extra bytes. |
asvalidation Modesโ
asvalidation
can be run in the following modes.
Records without CDTs or detected errors are ignored. Records with detected errors are backed up unless otherwise specified. By default, no fixes are applied.
- "Validation" mode discovers problems and produces a report.
- "Fix" mode, triggered by the
--cdt-fix-ordered-list-unique
option, attempts to correct discovered problems where possible.
You should first run asvalidation
in validation mode, limited by partition or --max-records
, to see the kinds of errors it discovers before running it in fix mode to fix them.
Basic options for asvalidationโ
This is a minimal set of asvalidation
options.
Option | Description |
---|---|
--cdt-fix-ordered-list-unique | Fix lists whose elements were not stored in order and remove duplicate elements. Without this option, the tool only validates, not fixes. |
--no-cdt-check-map-keys | Do not check cdt map keys. |
-o | Output file name for validation report. |
-d | Output directory. |
--help | Get a comprehensive list of options for tool. |
Namespace data selection optionsโ
Option | Default | Description |
---|---|---|
-n NAMESPACE or --namespace NAMESPACE | - | Namespace to validate. Mandatory. |
-s SETS or --set SETS | All sets | The set(s) to validate. May pass in a comma-separated list of sets to validate. |
-B BIN1,BIN2,... or --bin-list BIN1,BIN2,... | All bins | The bins to validate. |
-M or --max-records N | 0 = all records. | An approximate limit for the number of records to process. Available in Database 4.9 and later. Note: this option is mutually exclusive to --partition-list and --after-digest . |
-X , or --partition-list PARTITIONID | 0 = all records. | Scan a specific partition number 1-4096, or list of partition IDs. |
Running asvalidation in validation modeโ
Example of asvalidation in validation mode
In the following example, myNamespace
is checked and its output stored in the file asvalidationOutput.txt
.
Notice CDT Mode: validate
.
> asvalidation -n myNamespace -o asvalidationOutput.txt
...
2024-05-01 22:12:28 GMT [INF] [24662] Found 10 invalid record(s) from 1 node(s), 2620 byte(s) in total (~262 B/rec)
2024-05-01 22:12:28 GMT [INF] [24662] CDT Mode: validate
2024-05-01 22:12:28 GMT [INF] [24662] 100 Lists
2024-05-01 22:12:28 GMT [INF] [24662] 0 Unfixable
2024-05-01 22:12:28 GMT [INF] [24662] 0 Has non-storage
2024-05-01 22:12:28 GMT [INF] [24662] 0 Corrupted
2024-05-01 22:12:28 GMT [INF] [24662] 0 Invalid Keys
2024-05-01 22:12:28 GMT [INF] [24662] 10 Need Fix
2024-05-01 22:12:28 GMT [INF] [24662] 0 Fixed
2024-05-01 22:12:28 GMT [INF] [24662] 0 Fix failed
2024-05-01 22:12:28 GMT [INF] [24662] 10 Order
2024-05-01 22:12:28 GMT [INF] [24662] 0 Padding
2024-05-01 22:12:28 GMT [INF] [24662] 0 Maps
2024-05-01 22:12:28 GMT [INF] [24662] 0 Unfixable
2024-05-01 22:12:28 GMT [INF] [24662] 0 Has duplicate keys
2024-05-01 22:12:28 GMT [INF] [24662] 0 Has non-storage
2024-05-01 22:12:28 GMT [INF] [24662] 0 Corrupted
2024-05-01 22:12:28 GMT [INF] [24662] 0 Invalid Keys
2024-05-01 22:12:28 GMT [INF] [24662] 0 Need Fix
2024-05-01 22:12:28 GMT [INF] [24662] 0 Fixed
2024-05-01 22:12:28 GMT [INF] [24662] 0 Fix failed
2024-05-01 22:12:28 GMT [INF] [24662] 0 Order
2024-05-01 22:12:28 GMT [INF] [24662] 0 Padding
Interpreting the validation mode reportโ
In the example above:
asvalidation
was run in validation mode, because the--cdt-fix-ordered-list-unique
option was not specified.- It found 100 lists:
100 Lists
. - It found no maps:
0 Maps
. - 10 of the lists were corrupted:
10 Need Fix
underLists
. - The reason for corruption is shown as out of order:
10 Order
. - Because it was run in validation mode, no fixes were applied:
0 Fixed
.
Numbers under a heading do not necessarily add up to the count of the line. For example, there could be 1 Need Fix
record with both an Order
and Padding
error.
Running asvalidation in fix modeโ
Fixing is triggered by the --cdt-fix-ordered-list-unique
option.
Fixes are applied to the server. Failed fixes can be due to (but not limited to) an unsupported server version.
Example of asvalidation fix mode
In the following example, myNamespace
is checked and its output stored in the file asvalidationOutput.txt
.
Notice CDT Mode: fix
.
asvalidation -n test -o temp.txt -r --cdt-fix-ordered-list-unique
validation of 127.0.0.1 (namespace: test, set: [all], bins: [all], after: [none], before: [none]) to temp.txt
2024-05-01 22:08:25 GMT [INF] [10999] [src/main/aerospike/as_cluster.c:132][as_cluster_add_nodes_copy] Add node BB909000027000A 127.0.0.1:3000
2024-05-01 22:08:25 GMT [INF] [10999] Processing 1 node(s)
2024-05-01 22:08:25 GMT [INF] [10999] Node ID Objects Replication
2024-05-01 22:08:25 GMT [INF] [10999] BB909000027000A 130 1
2024-05-01 22:08:25 GMT [INF] [10999] Namespace contains 130 record(s)
2024-05-01 22:08:25 GMT [INF] [10999] Created new output file temp.txt
2024-05-01 22:08:25 GMT [INF] [11018] Starting validation for node BB909000027000A
2024-05-01 22:08:25 GMT [INF] [11018] Completed validation for node BB909000027000A, records: 30, size: 9200 (~306 B/rec)
2024-05-01 22:08:26 GMT [INF] [11017] 23% complete (~8 KiB/s, ~30 rec/s, ~306 B/rec)
2024-05-01 22:08:26 GMT [INF] [11017] ~3s remaining
2024-05-01 22:08:26 GMT [INF] [11017] Found 30 invalid record(s) from 1 node(s), 9200 byte(s) in total (~306 B/rec)
2024-05-01 22:08:26 GMT [INF] [11017] CDT Mode: fix
2024-05-01 22:08:26 GMT [INF] [11017] 110 Lists
2024-05-01 22:08:26 GMT [INF] [11017] 0 Unfixable
2024-05-01 22:08:26 GMT [INF] [11017] 0 Has non-storage
2024-05-01 22:08:26 GMT [INF] [11017] 0 Corrupted
2024-05-01 22:08:26 GMT [INF] [11017] 0 Invalid Keys
2024-05-01 22:08:26 GMT [INF] [11017] 10 Need Fix
2024-05-01 22:08:26 GMT [INF] [11017] 10 Fixed
2024-05-01 22:08:26 GMT [INF] [11017] 0 Fix failed
2024-05-01 22:08:26 GMT [INF] [11017] 10 Order
2024-05-01 22:08:26 GMT [INF] [11017] 0 Padding
2024-05-01 22:08:26 GMT [INF] [11017] 20 Maps
2024-05-01 22:08:26 GMT [INF] [11017] 10 Unfixable
2024-05-01 22:08:26 GMT [INF] [11017] 10 Has duplicate keys
2024-05-01 22:08:26 GMT [INF] [11017] 0 Has non-storage
2024-05-01 22:08:26 GMT [INF] [11017] 0 Corrupted
2024-05-01 22:08:26 GMT [INF] [11017] 0 Invalid Keys
2024-05-01 22:08:26 GMT [INF] [11017] 10 Need Fix
2024-05-01 22:08:26 GMT [INF] [11017] 0 Fixed
2024-05-01 22:08:26 GMT [INF] [11017] 0 Fix failed
2024-05-01 22:08:26 GMT [INF] [11017] 10 Order
2024-05-01 22:08:26 GMT [INF] [11017] 0 Padding