We are excited to be a part of AWS re:Invent 2024. Visit us at booth #1844 in Las Vegas.More info
Blog

Achieving high performance for mixed workloads

Discover how Aerospike tackles mixed read-write workload challenges, hotspots, and strong consistency, delivering high performance and low latency for real-time applications.

matt-headshot-jan2020
Matt Sarrel
November 6, 2024|15 min read

Enterprises rely on NoSQL databases, such as Aerospike, MongoDB, Cassandra, and Redis, to support real-time applications with low-latency query responses over large amounts of data. However, many NoSQL databases struggle with mixed read and write workloads due to choices made during their design – their chosen data consistency model, how they work with underlying hardware, handle contention, manage concurrency, replicate, and distribute/partition/shard data across nodes. 

It isn’t easy to optimize a distributed database for both reads and writes; being fast at one typically goes hand in hand with being slow at the other. Below I explain some of the key challenges associated with mixed workloads and how we solved them in the Aerospike Database.

What are mixed workloads?

Mixed workloads in databases refer to systems where both read and write operations occur concurrently and at significant volume. In contrast to environments optimized purely for reads (like analytic databases) or those tuned for high write throughput (such as logging systems), mixed workloads must balance the demands of both types of operations. This balance is crucial in real-time applications—such as financial services, social media, or Internet of Things (IoT) platforms—where fresh data must be written and read almost simultaneously, requiring low-latency responses for both actions.

In a mixed workload, the challenge lies in ensuring that write-heavy processes (such as updating or inserting new records) do not bottleneck read operations, which may rely on indexes or caches. Conversely, read-heavy demands should not slow down writes, which need to maintain rapid throughput to keep data current. This complexity is compounded in distributed NoSQL databases, where data is sharded and replicated across multiple nodes. Here, balancing mixed workloads often involves optimizing partitioning, reducing hotspots (nodes with disproportionately high access), and managing contention for resources like memory, disk I/O, and network bandwidth.

Successful mixed workload management allows a database to support high-frequency transactions and real-time queries, a common requirement for mission-critical applications that must handle rapid, large-scale data flows while maintaining consistent performance.

Why are mixed read-write workloads hard?

When it comes to mixed workloads - where frequent writes coexist with real-time read requests - NoSQL database systems can face significant challenges. Let's explore some of the key issues and how they impact database performance.

The partitioning predicament

NoSQL databases often use partitioning and sharding to distribute data across multiple nodes, but there are significant differences in the methods they use to partition and shard. Two common methodologies are application-based partitioning and hash-based partitioning. Application-based partitioning, such as Cassandra’s key-based partitioning and HBase’s range-based partitioning, relies on a partitioning scheme based on ranges, key values, or specific attributes. When a database has to manage frequent writes while also serving real-time read requests, the system’s partitioning strategy may not align optimally with the access patterns. This can lead to hotspots (discussed in more detail below), where some partitions are accessed more frequently, causing uneven load distribution across the system and decreasing overall performance. Alternatively, hash-based partitioning used by Aerospike and MongoDB, distributes data evenly across nodes, reducing the risk of data skew and hotspots. 

Hotspots: The bane of balanced distribution

Hotspots occur when certain partitions experience disproportionately high access rates compared to others because the system’s partitioning strategy does not align optimally with access patterns. 

In a mixed workload scenario, this uneven distribution can lead to:

  • Overloaded nodes handling popular partitions

  • Underutilized resources on nodes with less frequently accessed data

  • Increased latency for operations targeting busy partitions

  • Potential system instability due to resource exhaustion on hotspot nodes

The consistency conundrum

Many NoSQL systems implement eventual consistency models to enhance write performance. While this approach can boost throughput for write-heavy workloads, it introduces complications in mixed scenarios. In highly distributed environments where replication and sharding are used extensively, data propagation across nodes can lead to temporary inconsistencies, affecting read accuracy. Stale reads that do not reflect the latest writes potentially return outdated information. 

In some async replication systems like Cassandra, there is no known master, so reads can go to any node, leading to potential inconsistencies unless quorum reads and writes are used, which increases latency. However, systems with a multi-master approach, such as Aerospike and Yugabyte, avoid this issue by always reading and writing to a designated master first. There’s always a tradeoff between consistency and latency – ensuring stronger consistency levels can increase latency, which impacts both read and write performance.

Resource rivalry: Reads vs. writes

Read and write operations compete for system resources, each placing different demands on the database. Writes generally require more intensive disk I/O because they can involve appending or updating records on storage devices. Reads, on the other hand, may be cached or indexed, reducing the need for disk access. However, in a mixed workload environment, write-heavy operations can overwhelm the storage subsystem, making reads slower, especially if the database is not optimized for concurrent operations. Traditional NoSQL systems may suffer from I/O bottlenecks due to inefficient disk usage, compaction, and garbage collection processes. See below for more about compaction and its impact on performance. 

Concurrency and contention

Managing concurrent reads and writes typically requires sophisticated locking mechanisms. In many NoSQL databases, high write volumes can lead to locking contention, where multiple write operations or read-write operations vie for access to the same data. If locks are not managed properly, performance degrades severely. Mixed workloads are especially prone to contention issues when there’s no clear optimization for concurrent operations. Some NoSQL databases (in particular Redis) even use a single thread for each partition to remove the need for locks, but this introduces other problems, such as difficulty in scaling performance.

Replication ramifications

Many NoSQL databases achieve fault tolerance through replication, where data is copied across multiple nodes to ensure availability in case of failure. The way data is placed on nodes (sharding) and replicated between nodes impacts the efficiency of read, write, and mixed read-write workloads. Hash-based sharding ensures an approximately even distribution of both reads and writes across the cluster. For mixed workloads, replication must balance read performance, write consistency, and fault tolerance. 

Most NoSQL databases are not designed for mixed workloads

Whether or not a database can perform predictably under a mixed read-write workload comes down to the database system and how it was optimized during design. Many NoSQL databases, such as ScyllaDB, Apache Cassandra, and Yugabyte, rely on Sorted Strings Tables (SSTables) to persist data to storage. During ingestion, they arrange in-memory data stored in memtables, order it for fast access, and then store it in an ordered, immutable set of files. As SSTables are immutable, record modifications, additions, and deletions are tracked in a commit log.

Such systems can perform this ingestion and write the SSTables very quickly. However, this design has problems reading quickly. Every query involves multiple SSTables: Old data is stored in separate tables from new data, and you have a massive log that must be queried in order to find specific records. Updated records may require scanning multiple SSTables just for that one record. Bloom filters accelerate the process of finding the right SSTables, distributing the actual query to each SSTable, and concatenating the responses, but the overall inefficiency of this methodology results in increased latency.  

The more data the system holds, the more SSTables that exist. Recall that SSTables are immutable after they are written, so any updates to or deletions of data are written to a new SSTable and, where this happens, parts of records stored in existing SSTables become obsolete. Writes are just as fast as they were when the system was new but reads get slower as the system grows. Without some way to prune deleted data and merge updated data contained in outdated SSTables, the system would end up with too many tables, resulting in slow reads and wasted storage space.

Compaction as a necessary evil

Enter compaction, a process that writes an entirely new SSTable (file) using data from existing SSTables. This process deduplicates obsolete records and only writes the most current changes for the same key on different SSTables, writing a new SSTable file. Deleted rows (indicated by a marker called a tombstone) or entire deleted columns are also cleaned up, and the process creates a new index for the compacted SSTable file.

Compaction improves read performance, but only after the fact. The compaction itself uses lots of CPU resources and disk cycles. Unfortunately, most systems can’t run queries in a low-latency fashion while compacting. After data is compacted, it is written to a smaller set of SSTables and commit logs. Reads are back to being fast again.

However, this process repeats itself repeatedly. As more data is ingested, the log gets longer, and there are more SSTables. Eventually, enough data will be written that reads become slow, so we compact again. There has to be a better way.

Aerospike excels at mixed read-write workloads

Aerospike is a NoSQL database designed for high-performance, real-time workloads. It was built to efficiently handle mixed read and write operations. Its architecture is optimized to address the challenges commonly experienced when processing concurrent reads and writes in a high-performance manner. 

Hybrid Memory Architecture

Aerospike uses a Hybrid Memory Architecture (HMA) that combines in-memory processing with persistent storage, typically SSDs (solid-state drives). Unlike many NoSQL databases that load the entire data set in memory or use a disk-centric model for data storage, Aerospike stores indexes in memory and data on SSD. This allows it to process reads quickly while efficiently managing write throughput. Aerospike addresses block storage natively to read and write directly from/to SSD, avoiding the underlying OS and minimizing I/O bottlenecks. HMA enables Aerospike to handle heavy read and write loads simultaneously.

It’s important to understand the difference between a database built for spinning disks that can run on SSDs and a database built for SSDs. Aerospike takes advantage of SSDs' physical characteristics. 

Highly optimized I/O path

Aerospike is designed with an optimized I/O path for SSDs, allowing it to handle write-heavy workloads without causing I/O bottlenecks. Traditional NoSQL databases often suffer from inefficient write amplification and garbage collection processes, but Aerospike minimizes these issues by optimizing how data is written to disk. This enables the system to maintain high write throughput while also serving low-latency reads.

Aerospike breaks the SSD up into fixed-size write blocks with a maximum size of 8MB. Records are written to write blocks. The write block allocates the drive space, and the record fills it. Aerospike holds records in a memory buffer until there’s enough data to fill a write block. Then, the data is flushed to SSD, where it fills the write block, and the system moves on to the next write block. In addition, relying on 8MB write blocks prevents the fragmentation and performance problems caused by writing many small files to SSD. 

Efficient data distribution and sharding

Aerospike’s sharding mechanism ensures that data is evenly distributed across nodes, avoiding the hotspot problem that plagues many NoSQL databases. As nodes are added to an Aerospike cluster, the system distributes data and workload evenly across all nodes. This is achieved through Aerospike's efficient data distribution mechanism, which uses a 160-byte digest created from the record's primary key to assign data to one of 4,096 partitions. These partitions are then distributed across the available nodes in the cluster, ensuring that both reads and writes are spread out evenly across the cluster and preventing the creation of hotspots during data access. This enables Aerospike to maintain high throughput for mixed workloads, as no single node is overwhelmed by excessive read or write operations.

Aerospike keeps track of data distributed to nodes with a primary index. The primary index keeps track of records, nodes, and drives in order to understand where a record is located and its size. Regardless of record size, each record consumes 64 bytes in the primary index, including a header that describes the record structure. Aerospike is optimized to read the header and the data at the same time instead of as two separate queries, and SSDs handle this efficiently because they read in 4KB chunks (you can fit multiple 64-byte headers and associated text data into a single 4KB SSD read, decreasing the number of round trips needed). 

Aerospike tracks every write block and the records saved to it in order to calculate the percent of the write block that is in use. Aerospike maintains a table of write blocks and the percentage that each block is used so it can read in records and defragment write blocks. Aerospike defragments in the background as it writes. 

Aggressively and proactively defragmenting write blocks prevents system performance from degrading from too much fragmentation. Aerospike will never allow a record to fragment across write blocks. You get the same performance on day one as you do on day 1000 – Aerospike is fully capable of reading, writing, and defragmenting at the same time.

Low-Latency performance with strong consistency

Aerospike provides strong consistency (SC) guarantees, ensuring that reads always reflect the most recent writes, even in a distributed environment. This is achieved through Aerospike’s SC mode without sacrificing performance, thanks to Aerospike's efficient replication and quorum-based system. SC mode guarantees that all writes to a single record will be applied sequentially, without reordering or skipping. In this way, committed writes are immediately queryable and accurate. 

This consistency model is crucial for mixed workloads. It ensures that real-time read requests do not return stale data, even during periods of heavy write activity. This is particularly important for applications that require immediate consistency, such as financial transactions or fraud detection.

In contrast, other NoSQL databases, such as Cassandra, take a different approach by offering tunable consistency. You can dial down the consistency for writes, prioritizing performance over data accuracy.  You decide on the consistency level based on your use case, choosing between ONE, TWO, QUORUM, ALL. With weaker consistency levels like ONE or TWO, data can become temporarily inconsistent across replicas. If a read request is sent to a node that hasn’t been updated yet, it can return stale data. Higher consistency levels like QUORUM or ALL introduce latency, as they require a response from multiple replicas. Stronger consistency levels decrease availability - a setting of ALL means that even a single node failure can cause the entire write operation to fail. 

Aerospike's approach to strong consistency provides developers with the tools to build robust, distributed applications that can maintain data integrity without sacrificing performance. The Jepsen test has independently validated Aerospike’s strong consistency claims.

Horizontal scalability with linear performance

One of Aerospike’s key advantages is its ability to scale horizontally without sacrificing performance. As nodes are added to an Aerospike cluster, the system continues to provide linear scalability for both reads and writes. This is particularly beneficial for mixed workloads, as the system can scale to handle growing data volumes and traffic without experiencing the performance degradation often seen in other NoSQL databases.

Benchmark tests have demonstrated Aerospike's impressive scalability under mixed read-write workloads. On Amazon EC2 instances, an Aerospike cluster using r3.large instances showed linear scaling from 27,000 transactions per second (TPS) on two nodes to 140,000 TPS on eight nodes with an 80/20 read/write workload. This linear scaling holds true for various read/write ratios, from 100% writes to 100% reads.

Aerospike's mixed-read-write workload scalability extends to larger clusters as well. In tests on Google Compute Engine, Aerospike achieved 1 million writes per second with just 50 nodes and 1 million reads per second with only 10 nodes. This scalability is maintained while preserving low latency, with median latencies of 7ms for writes and 1ms for reads.

Advanced concurrency control

Aerospike employs lock-free algorithms and sophisticated concurrency control mechanisms, allowing it to handle high levels of concurrency without suffering from the locking and contention issues that affect many other NoSQL databases. 

At the core of Aerospike's concurrency model is its use of record locks. When an operation is performed on a record, Aerospike locks that specific record, preventing other operations from modifying it until the current operation is complete. This granular approach ensures atomicity and consistency without locking entire tables.

Aerospike's concurrency model is further enhanced by its use of techniques like CPU pinning and NUMA pinning. These techniques maximize processing efficiency by aligning specific processes or threads to specific CPUs or CPU cores, minimizing latency, and providing a very high throughput of transactions per second for each single node.

Aerospike excels at mixed read-write workloads

Mixed read and write workloads present significant challenges for many NoSQL databases, often due to trade-offs in consistency, I/O bottlenecks, and inefficient data distribution. 

Aerospike’s architecture, with its hybrid memory design, optimized I/O paths, and advanced concurrency control, allows it to excel at handling mixed workloads. By ensuring low-latency reads, high-throughput writes, and strong consistency, Aerospike stands out as a leading solution for businesses that need to manage real-time data processing under demanding conditions.

Download Community Edition

Aerospike Server Community Edition (CE) is a free, open source Aerospike distribution. It is the common core of Aerospike Enterprise Edition (EE) with the same developer API and performance characteristics.