Storage Format Upgrade in 6.0 Release
Overview
This page describes how to upgrade to Database 6.0 and the new storage format.
New storage format
In Database 6.0, the Aerospike Database internal storage format was changed to include a four-byte record end mark. This change addresses a potential, but unlikely, data-loss condition where a partial write block persisted, causing a record to be corrupted during an unclean shutdown.
When upgrading from Database 5.x or 4.9, each persisted namespace storage device, with the exception of PMem-backed namespaces, must be erased and its Aerospike header wiped (zeroized). The header is stored in the first 8MiB of the device.
If you are upgrading from 5.x to 6.x and there is already data on the device:
- If the device supports RZAT (Read Zero After Trim), use
blkdiscard -z
or a similar command on the 8MiB header to wipe it, then perform a TRIM of the entire device with the appropriate command for your device and operating system. - If the device does not support RZAT, you must wipe the entire device.
See this Wikipedia article for more details on the hardware process and a non-exhaustive list of devices that support TRIM.
PMem-backed namespaces do not need to be zeroized because they do not require the use of an end-mark.
Versions 6.0 and pre-6.0 are cluster-compatible, so a rolling restart upgrade with mixed versions is supported.
Keep a copy of your original aerospike.conf
configuration file in case of an unlikely event requiring a downgrade.
The record overhead increased by 4 bytes in Database 6.0, so records which previously met the write-block-size
or max-record-size
value might exceed that limit. The former will cause replication and migration issues, while the latter will only cause master writes to fail that would have previously succeeded. Check your object size histogram to identify any records at the
write-block-size
limit.
See Important pre-upgrade considerations for more information.
Configuration changes
The following configuration items have changed in 6.0:
scan-max-done
has moved toquery-max-done
.scan-threads-limit
has moved toquery-threads-limit
.background-scan-max-rps
has moved tobackground-query-max-rps
.single-scan-threads
has moved tosingle-query-threads
.
The following configuration items have been removed in 6.0:
query-threads
query-worker-threads
query-microbenchmark
query-batch-size
query-in-transaction-thread
query-long-q-max-size
query-priority
query-priority-sleep-us
query-rec-count-bound
query-req-in-query-thread
query-short-q-max-size
query-threshold
query-untracked-time-ms
batch-without-digests
Important pre-upgrade considerations
Aerospike recommends that you check for records that approach the
configured write-block-size
before you upgrade. Due to the overhead increase of 4 bytes per record in Database 6.0, some records could exceed the write-block-size
after the upgrade causing write transactions to fail and migrations to get stuck.
If you are upgrading from a version prior to Database 5.7, consider upgrading to Database 5.7 before you upgrade to Database 6.0 so that you can take advantage of max-record-size
.
Introduced in Database 5.7, max-record-size
provides a way to configure an extra threshold to prevent adding records above the configured size. It can be used to account for the extra overhead and prevent adding records that would then breach the write-block-size
.
Find potential problem records before you upgrade
Here are different ways to check for record sizes that may cause problems:
- Leverage the object-size and object-size-linear histograms.
- Run a query with a filter expression to identify records that have a size equal to
write-block-size
. An example Python script (using the Python client) is provided at aerospike-examples/6.0-record-size-checker. This script handles uncompressed records, and has a configurable compression ratio variancethreshold
, which it uses to identify compressed records that will exceed the write-block size.
# after editing global dry_run = True
$ python3 main.py
2023-08-02 17:44:53,097 [INFO]: Scanning node: BB9C00F800A0142
2023-08-02 17:44:53,098 [INFO]: Node BB9C00F800A0142 does not have compression enabled.
2023-08-02 17:44:53,098 [INFO]: Checking for records of compressed size larger than 1048560 bytes
2023-08-02 17:46:59,366 [INFO]: Namespace: bar, Set: testset, Primary Key: None, Digest: fe0f17700e1b7fcc82401b535f7933667634f8bf
2023-08-02 17:46:59,366 [INFO]: Namespace: bar, Set: testset, Primary Key: None, Digest: fe0f14b652ecbaf4afc46de605d7a4a0b6452f3a
...
2023-08-02 17:46:59,366 [INFO]: Node: BB9C00F800A0142 Returned Record Count: 328633
2023-08-02 17:46:59,366 [INFO]: Node: BB93E00800A0142 Returned Record Count: 330107
2023-08-02 17:46:59,366 [INFO]: Node: BB93F00800A0142 Returned Record Count: 329541
If no records are close in size to the configured write-block-size
, the upgrade should not be impacted.
Mitigations, prior to upgrading
If you find records that are close to the size limit, consider the following options and their risks:
You can double the
write-block-size
. However, this doubles the memory used by the configuredpost-write-queue
. If you choose to double thewrite-block-size
, you can reduce thepost-write-queue
by a factor of two to keep the same memory footprint for thepost-write-queue
. If the use case allows it, you can also break up the records that would not fit with the overhead, and delete the original records.As of 6.0, Aerospike logs the digests of records that prevent migrations for some specific failure types. Failures to migrate due to an excessive record size are logged under the
drv_ssd
context at the DETAIL level. See the Server Log Reference for details on how to change the log level dynamically.Make sure to revert to the default INFO level after a few seconds to avoid polluting the logs and risking running out of log disk space.
How to fix a stuck migration
When upgrading from Database 5.7
Use
max-record-size
configuration to set the value to write-block-size - 16 bytes.Identify the record(s) causing the issue by enabling detail log level:
asadm -e “asinfo -v ‘log-set:id=0;drv_ssd=detail’”
The server will log a message similar to this:
`DETAIL (drv_ssd): (drv_ssd.c:1550) {namespace} write: size 1048577 - rejecting <digest id>`
Delete the record, or shorten it, using the printed digest.
Issue the recluster command, to force migrations to reprocess the updated or deleted record(s).
When upgrading from a version prior to Database 5.7
- Identify the record(s) causing the issue by enabling detail log level:
asadm -e “asinfo -v ‘log-set:id=0;drv_ssd=detail’”
The server logs a message similar to the following:
`DETAIL (drv_ssd): (drv_ssd.c:1550) {namespace} write: size 1048577 - rejecting <digest id>`
Delete the record, or shorten it, using the printed digest.
Issue the recluster command, to force migrations to reprocess the updated or deleted record(s).
For more information, contact Aerospike Support.
Upgrade steps
The general guidelines for upgrading a cluster for the common steps involved in an Aerospike cluster upgrade.
Namespaces with replication-factor
1 require that the node be quiesced
and that migrations complete before
stopping and upgrading the node. Alternatively, these namespaces may have their
data restored from a backup or through XDR or other clients.
Persisted data must be deleted prior to starting a node with Database 6.0. Aerospike recommends that you backup your data, or have a redundant cluster, prior to proceeding with the upgrade.
For each node in the cluster:
- (Optional)
Quiesce
the node and wait for migrations to complete. This optional step protects against the unlikely event of an irrecoverable node crash while a node has been taken out to be upgraded. - Stop the Aerospike daemon.
- Delete the stored data for storage-engine device-configured namespaces. Delete the file if using a file. For raw devices, see the process in Initializing SSDs.
- If there is already data on the device, and it supports RZAT (Read Zero After Trim), use
blkdiscard -z
or a similar command on the 8MiB Aerospike header to wipe it, then perform a TRIM of the entire device with the appropriate command for your device and operating system. If the device does not support RZAT, you must wipe the entire device before continuing to ensure no Aerospike data remains. - Start the Aerospike daemon.
- Wait for migrations to complete after the node joins the cluster to allow all the data to be repopulated from other nodes (assuming replication factor 2 or more). Refer to the knowledge base article, Monitoring Migrations on a Live Cluster.
- Proceed with the next node.
Post upgrade
The truncate
privilege is now a
standalone granular permission, and no longer part of the write
privilege.
Users representing applications that perform truncates should be granted the
truncate
privilege to one of their roles.