Transactions
This page describes Aerospike's transactions feature.
Overview​
A transaction is several commands isolated from commands outside the transaction and executed atomically.
Aerospike transactions guarantee one of two outcomes: either all commands succeed together, or one or more commands fail. In case of failure, all records roll back to the state prior to the attempted transaction. No commands outside the transaction can see the state changes being created inside the transaction. This allows you to enforce serializability requirements by putting all requests with such requirements into their own transactions.
Read sequence​
Aerospike transactions consist of an arbitrary sequence of reads and writes of records within the same namespace, submitted by the Aerospike client, bracketed by a begin and end transaction request. The records being read or written do not need to be known beforehand, enabling applications like graph traversals or linked data structures to function seamlessly.
Aerospike transactions guarantee strict serializability for applications using both transactions and read and write commands.
Write locks​
Write locks are held until the transaction ends. Reads are verified for correctness at the transaction's end ensuring the version read is still current. This guarantees the absence of cycles in the read/write precedence graph, ensuring the precedence graph edges follow strict wall-clock time order of transaction commits.
Supported operations and commands​
- All single-record read and write commands
- All batched commands
Unsupported operations and commands​
info
commands- queries
High-Level client API for transactions​
The client library exposes several operations to the application, with detailed API specifications available in the official documentation. Following is a high level view of the operations.
Key “conceptual” operations​
The actual mechanisms to achieve some of the following conceptual operations depend on the client language being used.
- Begin Transaction (timeout): Starts a transaction with a timeout parameter and returns a transaction ID, which must be included in all subsequent requests within this transaction.
- Read (transaction_id, X): Reads the record with key X as part of the transaction corresponding to transaction_id.
- Write (transaction_id, X): Writes the record with key X as part of the transaction corresponding to transaction_id.
- Abort Transaction (transaction_id): Requests the abortion of the transaction.
- Commit Transaction (transaction_id): Requests the commitment of the transaction.
A transaction can have an arbitrary number of read operations and 4096 write operations.
Client library as the coordinator for distributed transactions​
The Aerospike client as the primary coordinator of the distributed transaction. Because multiple records within a transaction involve multiple cluster nodes, the client plays a central role in ensuring transaction atomicity.
The concurrency control algorithm used by Aerospike combines optimistic concurrency control for read operations and pessimistic concurrency control, or locking, for write operations.
The following sections describe how write locking is implemented at the server and detail the client's role as the coordinator.
Handling writes at the server​
The server must ensure two things for a record written as part of a transaction: locking the record and guaranteeing that it can either roll back or complete the transaction as directed by the coordinator.
The design achieves both using a simple and effective technique. When a record is updated by a transaction, the primary index entry for that record holds two versions: one for the prior committed version and one for the currently updated version. The presence of these two versions acts as a lock on the record. If the transaction successfully commits, the primary index entry will have only one version corresponding to the currently updated version. If the transaction fails, then the primary index will have only one version corresponding to the prior committed version. This dual record concept is integral to the Aerospike design and code, ensuring that replication is aware of dual records. When a record's primary copy has a dual record, the replica also has it. Conversely, when the primary copy goes back to a single version of the record, so does the replica, aligning with strong consistency and replication.
Read and write commands behavior in the transaction system​
The client API for read and write commands remains unchanged, as does most of the client-side implementation. The server undergoes a minor change to ensure read and write commands are strictly serializable with transactions.
Reads: All reads succeed. If the primary index has only one entry, it is returned, consistent with read behavior before transactions were introduced. If the primary index has a dual record (indicating a transaction is updating this record), the previously committed version is returned. This ensures the read always succeeds and precedes the ongoing transaction in the serializable schedule.
Writes: Writes succeed immediately if there is only one primary index entry, creating a newer record version. If the primary index has two entries, the write fails immediately, and the client is informed of the failure.
Client and server interaction for transaction reads and writes​
Transaction reads: At the client, when a transaction reads an item, a read request is sent to the appropriate node. The server node fails the request if another transaction is currently updating the record (indicated by the primary index having two entries). If the same transaction is updating the item, the current version and version number are returned. If there is only one entry, it is returned along with its version number. Upon a failed read, the client notifies the application, which can choose to retry the read or abort the transaction. If the read succeeds, the client records the version number before passing the record to the application. The client tracks all items read by the transaction and their version numbers.
Transaction writes: When the client writes an item, it first notifies the monitor of the record or records which it intends to write to. If this is the first of such notifications to the monitor, the notification creates the monitor and establishes the transaction's deadline. The client then sends the write request to the appropriate node. The server node fails the request if another transaction is currently updating the record. The client informs the application, which can choose to retry or abort the transaction. If the same transaction is updating the item, the record is updated. If there is only one primary index entry, a second entry is created, locking the record. The client tracks all items written by the transaction.
Handling transaction abort or commit from the application​
Abort: When the application requests an abort, it processes the list of writes for the transaction and instructs the server nodes to undo the writes, reverting the record to its pre-transaction state. Certain metadata will advance for conflict resolution purposes. The client ensures all writes are undone, retrying until successful. Reads do not require any action.
Commit: The client first verifies that all transaction reads are still valid by checking current version numbers with the server nodes. If any read is invalid, the commit is aborted, and the application is informed. If all reads are valid, the transaction is committed. The client then instructs the server nodes to finalize the writes, making the newer version the only record version.
Failure handling of clients during transaction​
The client creates a monitor record when the transaction begins, recording the deadline value or the time after which when the transaction will be considered aborted if it has not successfully committed by then. Before sending a write to a server node, the client logs the write in the monitor record in one of the monitor record's bins. Once the transaction successfully validates all reads, the monitor record is updated to indicate that it is committed by setting a flag which is another bin in the record. This flag in the record acts as the commit bit; when it is written to disk, the transaction is considered committed. If the client were to fail during a transaction then the commit bit (flag) informs us of the state of the transaction. The server system periodically scans monitor records and either rolls back or finalizes the transaction based on the commit bit's state and the transaction's deadline.
txn field​
The txn
field is a property of the base policy class. All of the command level policy classes inherit the field from the base class. You can then set the txn
property of the appropriate policy class instance and pass it in as you would in a normal client operation call.
ScanPolicy
and QueryPolicy
also inherit from the base policy class, however the txn
field is ignored for those commands. InfoPolicy
does not extend the base policy class and therefore does not have a txn
property.
Read operations​
All read operations included in the transaction must have their policy.txn
parameter set.