Blog

Serializable transactions and the price of getting concurrency right

Learn how serializable transactions ensure strict data consistency, how databases implement them, and the performance trade offs in distributed systems.

July 18, 2025 | 15 min read
Alex Patino
Alexander Patino
Solutions Content Leader

Serializable transactions are database transactions executed with the strongest level of isolation. This means the outcome of concurrently executing transactions is the same as if those transactions had run one after another in some order. So all transactions appear to execute serially, one-by-one, even if in reality they run in parallel. 

As a result, no transaction sees the partial effects of any other, which eliminates serialization anomalies and common transaction isolation issues such as dirty reads, non-repeatable reads, and phantom reads. While levels such as read uncommitted, read committed, and repeatable read are faster, under the SQL standard definitions, only the serializable isolation level prevents all of these.

With serializable transactions, developers do not need to reason about complex interleavings of operations. Instead, they assume each transaction runs in isolation on a consistent database state. If each individual transaction preserves correctness, then under serializability, any interleaving of transactions also preserves correctness. 

In practice, this simplifies application logic and helps keep data accurate. For example, with serializable isolation, two concurrent orders cannot both oversell the last item in stock, and a funds transfer cannot lose or double-count money due to timing issues. The database system produces a consistent outcome as if it had processed transactions one at a time.

How databases stay serializable

Enforcing serializable transactions requires concurrency control. Broadly, database systems use one of three approaches, or a blend of those approaches, for serializability:

Two-phase locking (2PL)

In classic relational databases, strict two-phase locking is a common mechanism for serializable execution. Under 2PL, a transaction acquires one or more shared locks for read operations and exclusive locks for writes. These are held until the current transaction completes. In systems such as Microsoft SQL Server, this means transaction A cannot modify data being read by transaction B until the locks are released. No other transaction has access to locked data in a conflicting way. 

By acquiring locks in a growing phase and releasing them only in a shrinking phase, after all operations, 2PL forces an equivalent serial order. This prevents anomalies because transactions queue up for access. For instance, if one transaction is transferring funds between two accounts and holds locks on those account records, a second transaction that tries to read or modify those same accounts must wait until the first commits. By blocking overlapping access, 2PL means the second transaction only sees a fully committed state. 

To prevent phantom reads, where new rows appear in a query result during a transaction, lock-based systems may use range locks or predicate locks on sets of rows, so that inserts or deletes by other transactions are also serialized. The downside is that locking reduces concurrency: long-held locks cause other transactions to wait, and cyclical waits lead to deadlocks that require aborting transactions.

Multi-version concurrency control (MVCC) with validation

Many databases use MVCC to reduce locking. MVCC lets readers and writers proceed in parallel by keeping multiple versions of data. A transaction sees a snapshot of the database as of its start time, so it never reads half-finished changes; writes create new versions without blocking readers. 

Basic snapshot isolation, as implemented in systems such as Oracle and PostgreSQL in “repeatable read” mode, prevents dirty and non-repeatable reads by using snapshots, but it isn’t serializable because it allows write skew or phantoms in certain cases.

For serializable isolation, MVCC-based systems add a validation step at commit time. This is often called serializable snapshot isolation (SSI). With SSI, the database checks whether any concurrent committed transaction wrote data that conflicts with what the current transaction read. If a conflict is detected, meaning the concurrent schedule could not be equivalent to any serial order, the committing transaction is aborted or rolled back to preserve serializability. 

For example, PostgreSQL’s serializable mode uses this technique. Transactions execute on a snapshot, and only upon commit does the system detect dangerous patterns, such as two transactions concurrently updating separate rows based on each other’s initial state. If such a pattern occurs, one transaction will be aborted with a serialization failure, forcing the application to retry. 

This optimistic approach avoids locking during execution, but it shifts the burden to commit time and leads to more transaction restarts under high contention.

Distributed commit and ordering

In a single-node database, enforcing serializability is relatively straightforward using these techniques. In distributed databases with multiple nodes or shards, an additional challenge is coordinating transactions across nodes so the global result is serializable. Typically, this involves a distributed commit protocol, such as two-phase commit, to atomically commit a transaction’s changes on all nodes, and a method to order transactions consistently. Some systems assign global timestamps or use logical clocks to order transactions and have every node commit transactions in that timestamp order. 

The strictest form of this in a distributed setting is called external consistency or strict serializability. It often requires additional coordination or synchronization; for example, Google Spanner uses TrueTime, an API backed by GPS and atomic clocks, to assign timestamps that respect real-time order. The core idea is that whether transactions touch data on one node or many, they are prevented from committing in a way that violates an equivalent sequential order.

Replacing Cassandra: A digital transformation story

Real-time identity and risk systems cannot afford slow or unpredictable data access. See how the world’s largest digital identity network replaced a 96-node Cassandra cluster with just 28 Aerospike nodes, cutting latency from 120 ms to 30 ms while processing more than 130 million transactions per day for fraud and trust decisions.

Performance tradeoffs of serializable isolation

Serializable transactions come at a cost. By design, serializability constrains concurrency, so a naive implementation hurts throughput and latency. Understanding these tradeoffs is important for enterprises running high-volume systems.

Locking overhead vs. aborts

If a database uses strict, two-phase locking (2PL) for serializability, transactions may spend a lot of time waiting. Locks on frequently accessed records create contention; other transactions queue until locks release. This reduces throughput under heavy load, as many transactions are effectively executed one at a time. 

Long-running transactions make this worse by holding locks longer, and deadlocks occur where two transactions are each waiting on the other’s locks. Deadlocks must be detected and resolved by aborting one of the transactions, meaning wasted work and retry overhead. Factors such as lock wait times and deadlock aborts add latency and limit the transaction per second rate a system sustains under serializable isolation.

On the other hand, if the database uses an optimistic MVCC+SSI approach, it avoids most blocking during transaction execution. This preserves more read/write concurrency. 

However, you pay for it at commit time: When conflicts are detected, one of the conflicting transactions must be rolled back. Under high contention, many transactions might not commit on their first try, leading to frequent transaction aborts and retries. 

Retrying a transaction means additional computation and delay for that operation, and it increases the overall load on the system as those retried transactions do their work again. In essence, the system optimistically allows more parallel work, but some of that work may be thrown away on commit conflicts. This affects latency because a transaction that restarts takes longer to finish and briefly spikes resource usage. There’s also some CPU overhead in checking for conflicts and maintaining version metadata, as well as storage overhead for keeping multiple data versions and cleaning them up later.

Scalability and hot spots

With serializable isolation, certain data items or tables become hot spots that limit scalability. For example, a high-traffic row, such as a product inventory count or a global account balance, will serialize many transactions either by lock waits or by conflict aborts. 

In a distributed database, enforcing a global serial order introduces additional synchronization. Some architectures might require a central coordinator or global locking service to sequence transactions across shards, which becomes a bottleneck. 

Even without one coordinator, transactions spanning multiple partitions often need an agreement protocol, such as two-phase commit, which adds messaging overhead and commit latency. These factors mean serializability at scale requires careful design; otherwise, throughput might not linearly increase by adding more nodes. Systems must mitigate these bottlenecks via techniques such as sharding data to avoid single hot records, batching operations, or using high-performance consensus algorithms.

Despite these costs, database technology and hardware have narrowed the performance gap. With faster networks, in-memory processing, and refined concurrency algorithms, some platforms offer serializable and even strictly serializable transactions with surprisingly high throughput

The key for practitioners is to use serializability judiciously, for important sections of an application, and to be aware of the workload. In low-contention scenarios, the overhead might be negligible. Under heavy contention, one may need to scale up hardware or tolerate higher latency to retain the safety of serializable isolation. Overall, the performance tradeoff is a classic balance: absolute correctness versus most concurrency. Each organization must weigh how much consistency it needs for a given use case.

Serializable transactions in distributed systems

Keeping applications serializable is harder in distributed, high-performance data systems. Enterprises increasingly use databases across clusters of machines, multiple data centers, or even globally, for scalability and resilience. In such environments, maintaining one serial order across all transactions means the system must coordinate updates and handle network delays or failures gracefully.

A fundamental principle that guides distributed databases is the CAP theorem. It states that in the presence of a network partition, a system cannot simultaneously have absolute consistency and high availability. Traditionally, relational (ACID) databases favored consistency over availability: they would rather refuse a request or wait than return possibly inconsistent data. 

In contrast, many early NoSQL systems went for high availability and partition tolerance, accepting eventual or limited consistency, also known as the BASE approach, to keep latency low and uptime high. This meant that many NoSQL databases did not offer multi-record serializable transactions at all; they might only guarantee atomicity on single keys or provide weaker “eventual consistency” across replicas. The tradeoff was performance and availability in exchange for less strict correctness guarantees.

However, as enterprise use cases evolved, users needed stronger consistency in distributed systems. Distributed databases have pushed the envelope to provide serializable and strictly serializable transactions without poor performance.

An additional constraint in distributed serializability is preserving real-time ordering across replicas. A system might be serializable yet allow a subtle anomaly: if one transaction commits on node A, a second transaction on node B might not immediately see it due to replication lag. If the second transaction is ordered in the serial schedule before the first, no serializability rule is technically broken; this execution can be viewed as equivalent to some serial order. But from the outside world, it’s puzzling: the second transaction started after the first committed, so intuitively it should have seen the first’s effects. 

This is where strict serializability comes in. Strict serializability, also known as external consistency, means the serial order of transactions respects the chronological order in which transactions are completed or started. In other words, if transaction X commits before transaction Y begins, then X will appear before Y in the global serial order. This property means no observer ever sees the system “go back in time”; a later transaction cannot be positioned before an earlier one. 

Strict serializability in a distributed database typically requires synchronizing clocks or coordinating commit order among nodes. For example, Google Spanner uses a globally synchronized clock called TrueTime to assign timestamps in a way that honors real-time order, resulting in external consistency across datacenters. The effect is that Spanner behaves like a single-machine database, where if one transaction finishes before another starts, every node agrees on that ordering. 

Other approaches use consensus protocols such as Paxos or Raft to agree on a commit sequence number for each transaction, so all replicas apply transactions in the same order. These measures introduce some latency, such as waiting for timestamp uncertainty or consensus rounds, but they prevent anomalies such as stale reads of committed data.

It’s important to note that providing full ACID transactions in a distributed high-performance system is hard. Only recently have some NoSQL/NewSQL databases begun to offer this capability natively. 

For instance, Aerospike 8.0 introduced strictly serializable multi-record transactions using a new coordination mechanism, so developers no longer have to implement their own transaction logic on top of an eventually-consistent store. 

Similarly, cloud databases enforce serializable or stricter isolation across shards by default. These systems show that it is possible to get the best of both worlds: the strong correctness guarantees of traditional databases with the scale-out and low-latency characteristics of NoSQL. 

The tradeoff, as the CAP theorem implies, is that in the rare event of a network partition or outage, these systems may choose consistency over availability by temporarily rejecting writes or reads on some nodes to avoid inconsistency. For many enterprise applications, this is an acceptable price to pay for never violating data integrity.

In summary, serializable transactions work in distributed environments through design: either by coordination with locks/timestamps spanning nodes, or with consensus by enforcing one global commit order. 

While doing so used to mean sacrificing performance, advances such as MVCC, better interconnects, and hybrid timekeeping have narrowed the gap. The result is that enterprises now consider strongly consistent, transactionally secure databases even for large-scale, geographically distributed deployments, a development that combines the safety of ACID with the speed and scale that businesses demand.

Customer story: TransUnion TruAudience

TransUnion's TruAudience platform powers real-time identity resolution at the scale modern marketing demands, but Cassandra was missing SLAs with p99 latencies of 3.9 seconds and a 550-server sprawl that drained engineering focus. After moving to Aerospike, they cut p99 latency 100x to 23 milliseconds, shrunk their footprint by 87%, and dropped TCO by 68% over three years. Read the full story.

When are serializable transactions needed?

Not every application needs the overhead of serializable transactions, but certain situations do. In enterprise settings where data correctness is important, serializability becomes required despite its cost.

Financial systems and banking

In finance and banking, even the smallest anomaly has consequences. Consider money transfers or trading transactions: Concurrent operations can’t violate fundamental invariants such as the conservation of money. 

Serializable isolation is often required for financial transactions, so, for example, two concurrent withdrawals don’t together overdraft an account that shouldn’t go negative, or that the total sum of money across accounts remains consistent at all times. 

Payment systems, banking ledgers, stock exchanges, and similar systems use serializable or stricter transactions to avoid inconsistencies. A classic example is if a bank offers “free checking” when a customer’s total balance exceeds a threshold, no timing quirk allows a withdrawal without the deposit meant to cover it. Only serializable and strict serializable ordering means that if a deposit was confirmed before a withdrawal, every part of the system reflects that order, preventing an undue penalty or an incorrect negative balance.

Inventory and order management

In e-commerce and supply chain systems, accurate inventory counts and order fulfillment depend on avoiding concurrency bugs. Suppose an online store has one item left in stock and two customers try to buy it at nearly the same time. With weaker isolation, it’s possible that both transactions see the last item available and both “purchase” it, resulting in an oversell. 

Serializable transactions prevent this by processing such orders in a definitive sequence: One order sees the stock decremented by the other, and knows the item is no longer available. 

Similarly, serializability prevents phantom reads in inventory queries. For example, if one transaction generates a report of all pending orders while another transaction adds new orders, serializable isolation means the report transaction sees a stable set of orders, with no missing or double-counted entries. In logistics, manufacturing, or ticket booking systems, this protects against duplicated bookings or inconsistent stock levels that could otherwise occur under high concurrency.

Critical consistency and compliance

Beyond finance and inventory, any application that requires consistency may need serializable transactions. This includes examples such as auditing and regulatory compliance, where reports must reflect a state of the data as if no other activity were interleaved. 

For instance, generating an end-of-day balance sheet or regulatory audit report should use serializable isolation to avoid discrepancies due to in-flight updates. Similarly, complex batch processing or analytics that involve multiple steps of reading and writing data benefit from serializability so they aren’t affected by outside changes mid-process. 

In multi-step workflows that update several records, such as updating an account and logging an audit trail together, wrapping those steps in a serializable transaction means other transactions won’t see a half-completed result or introduce conflicts. 

In summary, whenever the correctness of an operation depends on viewing the database as a stable, isolated system, even while many users are interacting with it, that operation should use serializable isolation.

Many applications don’t need full serializability for every transaction. For performance reasons, they might use weaker isolation for simple, high-frequency operations and use serializable transactions for the critical sections. 

The key is to identify the parts of the workload where data integrity and business rules must never be violated. In those parts, serializable transactions act as an insurance policy, so no matter how the load or timing varies, the outcome will be as if transactions ran one by one in a sensible order. Enterprise systems that cannot afford surprises or inconsistencies need this.

Five signs you've outgrown Cassandra

Does your organization offer real-time, mission-critical services? Do you require predictable performance, high uptime and availability, and low TCO? If you answered yes to one or both of these questions, it is likely that your Cassandra database solution isn’t cutting it. Check out our white paper and learn how Aerospike can help you.

Aerospike and serializable transactions

For functions such as financial ledgers, inventory locks, or entitlement checks, eventually consistent is data corruption waiting to happen. You need deterministic behavior that holds up even when there’s a problem with the network. 

Aerospike handles this by moving away from the all-or-nothing coordination of traditional relational databases, which usually run into performance issues as you scale. Instead, it anchors transactional logic to partition primaries. This allows for:

  • Linearizable CP mode: Strong consistency for single-record operations without adding latency

  • ACID multi-record transactions: Strictly serializable updates so you never see a "partial" state or a ghost record

  • Localized coordination: Because coordination happens at the partition level, you aren't forcing the entire cluster to pause to validate one transaction

Why this matters in production

In a standard distributed setup, you're often forced to choose between "fast but risky" or "safe but sluggish." Aerospike bridges that gap by scoping correctness. You get all-or-nothing updates and predictable tail latency because the system doesn't need to check in with a central coordinator for every move.

When a failure occurs, the recovery is deterministic. You don't have to untangle intermediate states or reconcile logs to figure out what transactions committed. It’s about building user-facing systems that stay accurate at scale, without the fragility distributed coordination often has.

Try Aerospike Cloud

Break through barriers with the lightning-fast, scalable, yet affordable Aerospike distributed NoSQL database. With this fully managed DBaaS, you can go from start to scale in minutes.