Policies
Aerospike allows reading and writing with great flexibility. With an Aerospike
client policy
, you can create read-modify-write patterns of optimistic
concurrency, control the time to live, and choose whether to write a record based on the existence of the same record.
Operations of this type are very quick, because information like the generation and time-to-live are stored in the primary key index. No extra work is needed to retrieve the data object.
These policies affect both database operations and client operations. Many
policies are used to send the appropriate wire protocol commands to the server.
Other policies (like maxRetries
) affect client operation.
These policies exist with each client, and have slightly different APIs. After understanding which policies you need for your application, see the client-specific documentation for precise syntax.
Set Default Client Policiesโ
You can create default client policies for each AerospikeClient instance. The following example demonstrates how to set policy defaults in the Java client. For language-specific examples, see the documentation for your client.
// Set client default policies.
ClientPolicy clientPolicy = new ClientPolicy();
clientPolicy.readPolicyDefault.replica = Replica.MASTER;
clientPolicy.readPolicyDefault.readModeAP = ReadModeAP.ONE;
clientPolicy.readPolicyDefault.socketTimeout = 100;
clientPolicy.readPolicyDefault.totalTimeout = 100;
clientPolicy.writePolicyDefault.commitLevel = CommitLevel.COMMIT_ALL;
clientPolicy.writePolicyDefault.socketTimeout = 500;
clientPolicy.writePolicyDefault.totalTimeout = 500;
// Connect to the cluster.
AerospikeClient client = new AerospikeClient(clientPolicy, new Host("seed1", 3000));
Set Per-Transaction Client Policiesโ
To set policies on a per-transaction basis, pass the desired policy settings to
the individual API call. For example, to perform writes with the master
commit
level:
// Make a copy of the client's default write policy.
WritePolicy policy = new WritePolicy(client.writePolicyDefault);
// Change commit level.
policy.commitLevel = CommitLevel.COMMIT_MASTER;
// Write record with modified write policy.
client.put(policy, key, bins);
The client policies provided at the client connection level can be overridden at the individual transaction level.
Policy Definitionsโ
The following section describes the Aerospike Java client policies. Other clients use similar constructs.
Replicaโ
Policy.replica
specifies which replica the client reads from during
a single-record transaction. Write operations always go against the node that owns
the master partition of the record.
When the client is reading from a strong consistency namespace the replica policy is ignored, unless SC read mode explicitly relaxes the strong consistency guarantees by selecting ALLOW_REPLICA
or ALLOW_UNAVAILABLE
.
SEQUENCE
(default) โ Read from the node that owns this record's master partition first. If a timeout occurs and retries are enabled, try a node that owns the record's replica partition.PREFER_RACK
โ Read from a node on the same rack as the client first, which holds either the master or a replica partition for this record. If no nodes in the specified rack have a record partition the client switches toSEQUENCE
.MASTER
โ Read from the node that owns this record's master partition.MASTER_PROLES
โ Distribute reads across nodes that owns the record's master and replica partitions (proles) in a round-robin fashion.RANDOM
โ Distribute reads across all nodes in the cluster in round-robin fashion. Only recommended when the namespacereplication-factor
equals the number of nodes in the cluster.
The performance impact of reading a hot key can be reduced along the order of the replication factor. Consider using Replica.MASTER_PROLES
to distribute reads across master and replicas (proles).
AP Read Modeโ
For read operations against namespaces configured to operate in AP mode
(high availability), Policy.ReadModeAP
specifies how many partitions should
be consulted when the cluster is undergoing data rebalancing, in order to determine
the most recent copy of the record. This policy is ignored when the cluster is stable.
ONE
(default) โ Read a single replica. Might return a stale version of the record when the cluster is rebalancing.ALL
โ Read from all the nodes holding the master and replica partitions of this record.
You can dynamically override this client policy using the namespace configuration parameter
read-consistency-level-override
.
SC Read Modeโ
Policy.ReadModeSC
determines consistency for read operations against namespaces configured to operate with strong consistency
(CP mode).
SESSION
(default) โ Ensures session consistency. The client sees an increasing sequence of record versions. The replica policy is ignored when this read mode is selected.LINEARIZE
โ Ensures linearizability. All clients see only an increasing sequence of record versions. The replica policy is ignored when this read mode is selected.ALLOW_REPLICA
โ The client may read from the master or any full (non-migrating) replica. Strong consistency guarantees that the reads either will be the latest copy or a valid ancestor of the latest copy. This read mode combines with the replica policy.ALLOW_UNAVAILABLE
โ The client may read from the master or any full (non-migrating) replica or from unavailable partitions. Strong consistency guarantees are relaxed, and an increasing sequence of record versions is not guaranteed. This read mode combines with the replica policy.
Send Keyโ
If enabled, send key (Policy.sendKey
) sends the user-defined key in addition
to hash digest on both reads and writes. If the key is sent on a write, the key
will be stored with the record on the server and returned to the client on primary and secondary index queries.
Socket Timeoutโ
Socket timeout (Policy.socketTimeout
) specifies socket idle timeout in
milliseconds when processing a database command.
If socketTimeout
is not zero and the socket has been idle for at least
socketTimeout
, both maxRetries
and totalTimeout
are checked. If maxRetries
and
totalTimeout
are not exceeded, the transaction is retried.
If both socketTimeout
and totalTimeout
are non-zero and socketTimeout
>
totalTimeout
, then socketTimeout
will be set to totalTimeout
.
If socketTimeout
is zero, there will be no socket idle limit.
Total Timeoutโ
Total timeout (Policy.totalTimeout
) specifies total transaction timeout in
milliseconds.
The totalTimeout
is tracked on the client and sent to the server along with the
transaction in the wire protocol. The client will most likely timeout first, but
the server also has the capability to timeout the transaction.
If totalTimeout
is not zero and totalTimeout
is reached before the transaction
completes, the transaction will abort with a timeout exception.
Setting Policy.totalTimeout
to 0 is equivalent to setting no timeout at all on the client side,
with the result that the server uses its default timeout setting instead.
Max Retriesโ
Max retries (Policy.maxRetries
) specifies the maximum number of retries before
aborting the current transaction. The initial attempt is not counted as a retry.
If maxRetries
is exceeded, the transaction will abort with a timeout exception.
Database writes that are not idempotent, such as add(), should not be retried
because the write operation may be performed multiple times if the client timed
out previous transaction attempts. It's important to use a distinct WritePolicy
for non-idempotent writes which sets maxRetries
to zero.
Default for read: 2 (initial attempt + 2 retries = 3 attempts)
Default for write/query/scan: 0 (no retries)
Sleep Between Retriesโ
Sleep between retries (Policy.sleepBetweenRetries
) is the milliseconds to
sleep between retries. Enter zero to skip sleep. This field is ignored when
maxRetries
is zero. This field is also ignored in async mode.
The sleep only occurs on connection errors and server timeouts which suggest a
node is down and the cluster is reforming. The sleep does not occur when the
client's socketTimeout
expires.
Reads do not have to sleep when a node goes down because the cluster does not shut out reads during cluster reformation. The default for reads is zero.
The default for writes is also zero because writes are not retried by default.
Writes can wait for the cluster to reform when a node goes down. Immediate
write retries on node failure have been shown to consistently result in errors.
If maxRetries
is greater than zero on a write, then sleepBetweenRetries should
be set high enough to allow the cluster to reform (>= 500ms).
Write Modeโ
WritePolicy.recordExistsAction
specifies how to handle writes
when the record already exists.
CREATE_ONLY
โ Insert the record, and fail if it already exists.UPDATE_ONLY
โ Update the record, and fail if it does not exist. Merges new bin data into the existing record.UPDATE
(default) โ Update or insert (upsert) the record. Merges new bin data if the record exists.REPLACE
โ Create or replace record. Delete existing bins not mentioned in this write operation.REPLACE_ONLY
โ Replace the record, and fail if it does not exist. Delete existing bins not mentioned in this write operation.
Write commit levelโ
WritePolicy.commitLevel
specifies whether the node that owns the master partition of the record must wait until it successfully writes to all replicas before it returns success. Writes operations include insert, update, upsert, delete and calling a UDF.
COMMIT_ALL
(default) โ Wait until the node that owns the master partition writes to all replicas.COMMIT_MASTER
โ Return success after writing to the master replica, and replicate to the prole replica(s) asynchronously.
COMMIT_ALL
is required when writing to a strong consistency namespace otherwise a write error will occur.
Since Database 5.7, if the client is pushing a higher rate of write transactions than the server's replication system
can handle, then the backpressure will cause the server to convert write transactions to
COMMIT_ALL
.
You can dynamically override this client policy using the namespace configuration parameter
write-commit-level-override
.
Write Generation Policyโ
The generation policy (WritePolicy.generationPolicy
) specifies how to handle
record writes based on record generation.
Record generation is an internal counter that uses integer values and that Aerospike increments every time you update a record. ("Generation" in this context does not mean "the act of generating", but "version".) When a record is inserted, the counter starts at 1. Therefore, a record for which the counter is currently at, say, 5, has been updated four times. Client applications cannot directly change the value of the counter. Reading a record does not cause Aerospike to increment its counter.
When Aerospike is in Available and Partition-tolerant (AP) mode, Aerospike resets a record's counter to 1 after it has been updated 64K times. When Aerospike is in strong-consistency mode, it resets a record's counter to 1 after the record has been updated 1K times.
Client applications can use this counter to coordinate a read-modify-write sequence of operations with other client applications.
For example, suppose a client application needs to read data from a record, modify the data, and then write the modified data back into the record. Reading the record requires a lock on it, as does writing to the record. However, during the time the client app modifies data, it holds no lock on the record. Another client app can update the same record before the first client app is able to obtain a write lock and write the modified data.
If the generation policy is set to GEN_EQUAL
or GEN_GT
:
During the read operation, the client app also reads the value of the generation counter for the record.
After the client app modifies the data and obtains a write lock on the record, it reads the current value of the counter.
One of the following situations occurs:
If the generation policy is set to
GEN_EQUAL
:- If the generation value sent by client is equal to the generation value on the server, then the client app writes the modified data to the record.
- If the generation value sent by client is not equal to the generation value on the server, the client app does not perform the write operation. The client app can retry the sequence of read-modify-write operations.
If the generation policy is set to
GEN_GT
:- If the generation value sent by client is strictly greater than the generation value on the server, the client app writes the modified data to the record.
- If the generation value sent by client is not greater than the generation value on the server, the client app does not perform the write operation. The client app can retry the sequence of read-modify-write operations.
If the generation policy is set to
NONE
:- The client app does not read the value of the counter when reading data from the record.
- After modifying the data that it read, it writes the modified data to the record.
With the generation policy set to
GEN_EQUAL
orGEN_GT
, write operations fail with error code 3, AS_ERR_GENERATION. In this case,the fail_generation and the generic client_write_error stats will tick on the server.
Possible values:
- If the value is set to
NONE
(default):- Client apps do not use the record-generation counter to restrict writes.
- If the value is set to
EXPECT_GEN_EQUAL
:- Client apps update or delete records where the generation value sent by client is equal to the generation value on the server.
- Otherwise, write operations fail, and client apps can retry them.
- If the value is set to
EXPECT_GEN_GT
:- Client apps update or delete records where the generation value sent by client is less than the generation value on the server.
- Otherwise, write operations fail and client apps can retry them.
- This value is useful for when you want to restore records from a backup, and want to write only records for which you have an older version.
- With the value set to
EXPECT_GEN_EQUAL
orEXPECT_GEN_GT
, write operations fail with error code 3, AS_ERR_GENERATION. In this case,the fail_generation and the generic client_write_error stats will tick on the server.
- If the value is set to
Expiration (Time To Live)โ
Record expiration (WritePolicy.expiration
), or time to live (TTL), is the number
of seconds the record will live before being removed by the server. Expiration
values:
- -2 โ Do not change ttl when record is updated.
- -1 โ Never expire.
- 0 โ Default to namespace configuration variable "default-ttl" on the server.
- > 0 โ Actual ttl in seconds.
Aerospike Database 7.1 introduced an LRU eviction behavior. Client versions implementing this functionality can control when reads extend record void-time, regardless of namespace configuration, with the Policy.readTouchTtlPercent
:
- A value of 0 instructs the server to use the
default-read-touch-ttl-pct
of the namespace or set. - A value of -1 states that this read operation will never modify the record's TTL.
- A value of 1-100 describes that this read should also touch the record, extending its TTL, if the record's void-time is within this percentage.
Durable Deleteโ
When WritePolicy.durableDelete
is true, a record delete leaves a tombstone. This prevents deleted records from reappearing after node
failures.
When a namespace is configured with strong consistency, regular deletes (expunges) are blocked unless the configuration parameter strong-consistency-allow-expunge
is used to relax strong consistency.