Cassandra vs Aerospike
The table below outlines key technology differences between Aerospike 7.0 and Apache Cassandra 4.1.
Data models
Wide column key-value
Cassandra is a distributed wide column store where data resides in rows, each with its own set of columns, which can be described as tabular but not quite a table. Rows are sorted using clustering keys.
Cassandra has a JSON data type, but document-based operations are not supported. Thus, operations on JSON elements can only be applied on the client side. This adds extra latency to these operations from the client fetching the record over the network.
Cassandra does not support graph use cases, though some vendors have provided graph database support in the past on their commercial versions.
Multi-model (key-value, document, graph)
Aerospike distributes and stores sets of records contained in namespaces (akin to “databases”). Each record has a key and named fields (“bins”). A bin can contain different types of data from the simple (e.g., integer, string) to the complex (e.g., nested data in a list of maps of sets).
This provides considerable schema flexibility.
Aerospike supports fast processing of Collection Data Types (CDTs) which contain any number of scalar data type elements and nesting elements such as lists and maps. Nested data types can be treated exactly as a document.
Aerospike’s structure enables users to model, store, and manage key-value data, JSON documents, and graph data with high performance at scale.
Implications
Aerospike's efficient support for multiple data models enables firms to use a single data platform for a wide range of applications and business needs.
Scalability options
Horizontal scaling is the only option, with data movement and performance impact reduced through multiple techniques.
Cassandra proudly references users with tens of thousands of nodes, exemplifying its horizontal scalability. Adding nodes is Cassandra’s means of scaling.
Cassandra uses consistent hashing to distribute data across nodes.
It employs Virtual Nodes (vnodes) to split the hash ring into chunks reducing data movement when adding/removing nodes.
They also employ incremental data movement in the background minimizing impact on performance.
Vertical and horizontal scaling. Automatic data movement and automatic rebalancing when adding nodes
Aerospike handles massive customer growth without having to add many nodes based on its SSD-friendly Hybrid Memory Architecture and flexible configuration options.
Data distribution
Aerospike distributes data across cluster nodes automatically. When a node joins or leaves the cluster, the data is automatically redistributed.
Aerospike automatically shards data into 4,096 logical partitions evenly distributed across cluster nodes. When cluster nodes are added, partitions from other cluster nodes are automatically migrated to the new node, resulting in very little data movement.
Vertical scaling
Aerospike exploits SSDs, multi-core CPUs, and other hardware and networking technologies to scale vertically, making efficient use of these resources. You can scale by adding SSDs. (There is no theoretical upper limit for the amount of resources that can be added to a single node.)
Horizontal scaling
The data distribution is random; therefore, scaling an Aerospike cluster in and out results in less data movement. Also, Aerospike follows a peer-to-peer architecture, meaning that no node has a special role. In this architecture, the load gets equally distributed across all the cluster nodes.
Implications
For a new deployment, the Aerospike cluster will have fewer nodes and thus lower TCO, easier maintainability, and higher reliability. Additionally, when expanding existing deployments, Aerospike’s horizontal scaling is automatic and without downtime.
Consistency
(CAP Theorem approach)High Availability (AP) mode only.
Cassandra is designed for availability and partition tolerance (AP). It has not passed the Jepsen test for strong consistency.
Its tunable consistency requires programmers to understand nine consistency level settings and select what is appropriate for their operational needs, taking availability, latency, throughput, and data correctness into account.
Cassandra’s quorum-based consistency algorithm requires 2N+1 copies to handle N failures. Cassandra automatically detects and responds to many network and node failures to ensure high availability of data without requiring operator intervention.
Both High Availability (AP) mode and Strong Consistency (CP) mode
Aerospike provides distinct high availability (AP) and (strong consistency) (CP) modes to support varying customer use cases.
The independent Jepsen testing in 2018 validated Aerospike’s claim of strong consistency. Strong consistency mode prevents stale reads, dirty reads, and data loss.
With strong consistency, each write can be configured for linearizability (provides a single linear view among all clients) or session consistency (an individual process sees the sequential set of updates).
Each read can be configured for linearizability, session consistency, allow replica reads (read from master or any replica of data), and allow unavailable responses (read from the master, any replica, or an unavailable partition).
Aerospike’s roster-based consistency algorithm requires only N+1 copies to handle N failures. Aerospike automatically detects and responds to many network and node failures to ensure high availability of data without requiring operator intervention.
High Availability (AP)/partition tolerant mode emphasizes data availability over consistency in failure scenarios.
Modes and consistency levels can be defined at the namespace level (database level).
Implications
Having a data platform that can easily enforce strict consistency guarantees while maintaining strong runtime performance enables firms to use one platform to satisfy a wider range of business needs.
The Aerospike roster approach to consistency requires about half as many servers as Cassandra to handle N failures.
Fault tolerance
Three replicas for High Availability. Automated failovers, but requires periodic repairs.
Cassandra users typically configure the system to automatically maintain three copies (RF3) of data for high availability.
Cassandra has a read_repair feature allowing tuning for data inconsistencies to be repaired automatically on an as-read basis, but it is selective and has performance impacts.
The repair function can also be executed proactively and comprehensively, but is resource intensive.
Two replicas for High Availability. Automated failovers.
Aerospike users typically maintain replication factor two (RF2) (one primary, one replica copy) for high availability.
Aerospike automatically detects and responds to many network and node failures (“self-healing”) to ensure high availability of data, prevent data loss or performance degradations without requiring operator intervention.
Implications
Achieving high availability with fewer replicas reduces operational costs, hardware costs, and energy consumption. Automated recovery from common failures and self-healing features promote 24x7 operations, helps firms achieve target SLAs, and reduces operational complexity.
Multi-site support
Synchronous replication (single cluster can span multiple sites)
Asynchronous replication (across multiple clusters)
Both synchronous and asynchronous data replication are supported.
Synchronous, with multi-data center replication.
Asynchronous: Replication between clusters is eventually consistent, but this can become delayed depending on the write consistency level.
Automated data replication across multiple clusters; A single cluster can span multiple sites
Supports multi-site deployments for varied business purposes: Continuous operations, fast localized data access, disaster recovery, global transaction processing, edge-to-core computing, and more.
Asynchronous active-active replication (via Cross Datacenter Replication, XDR): is achieved in sub-millisecond or single-digit milliseconds. All or part of the data in two or more independent data centers gets replicated asynchronously. The replication can be one way or two ways.
The clients can read and write from the data center close to them.
The expected lag between data centers is in the order of a few milliseconds. Optimizations to minimize transfer of frequently updated data.
XDR also supports selective replication (i.e., data filtering) and performance optimizations to minimize transfer of frequently updated data.
Synchronous active-active replication (via multi-site clustering): A single cluster is formed across multiple data centers. Achieved in part via rack awareness, pegging primary and replica partitions to distinct data centers. Automatically enforces strong data consistency.
The clients can read data from the node close to them; the expected latency will be less than a millisecond. However, the write requests may need to be written in a different data center, which may increase the latency to a few hundred milliseconds.
Implications
Global enterprises require flexible strategies for operating across geographies. This includes support for continuous operations, fast localized data access, disaster recovery, global transaction processing, and more.
Storage format
LSM tree
Cassandra employs a log-structured merge (LSM) tree for storage. LSM trees are designed to efficiently manage writes, but tend to slow reads.
To mitigate potential read performance challenges, operators can employ any of the following:
Bloom Filters for probabilistic checking before conducting full seeks
Cache frequently accessed data in the Memtable and/or row cache
Partition keys
Sparse indexes (but with a storage cost)
Compaction in merging smaller SSTables into larger ones reducing the number of files that need to be searched.
Raw block format optimized for SSDs
Aerospike employs a specialized log-structured file system that is optimized for the use of flash drives (SSDs) as primary data storage.
Instead of appending writes to large log files and deferring compaction to a CPU and disk I/O-intensive operation, Aerospike uses raw-device block writes and lightweight defragmentation.
Users can choose from hybrid memory (flash and RAM), all RAM, or all flash configurations.
Implications
Aerospike’s approach leads to great predictability and reliability without need for more complex configurations needed to improve read performance in LSM-tree databases.
Delivering RAM-like performance with SSDs means Aerospike clusters have fewer nodes. Clusters with fewer nodes have lower TCO, easier maintainability, and higher reliability.
Underlying language
Written in Java
Cassandra is written in Java, a higher-level language. JVM garbage collection (GC) can affect latency. This is often the cause of the large spikes you’ll see in P99 and other tail latency stats with Cassandra.
Written in C
Aerospike is written in C with deep underpinnings in networking, storage, and database expertise. It thus avoids Java runtime inefficiencies and, as a lower-level language, can optimize the hardware and directly manage memory, avoiding the overhead of garbage collection.
Implications
Aerospike clusters have far fewer nodes than the equivalent Cassandra cluster. They also require less tuning.
Indexing
Production-ready primary indexes, limited workaround options for secondary indexes
Primary: Primary indexes are standard.
Secondary: Significant data access latencies can occur if secondary index queries do not also use the partitioning key (primary key) as a filtering predicate. This is because secondary indexes are partition-based and index only the rows of a partition.
Third-party alternatives have emerged to address the limitations of native Cassandra secondary indexes. These include storage-attached indexes (SAI) and SSTable-attached secondary indexes (SASI). SAI has limits, like covering only equality-based lookups. SASI, developed by Apple, does not index types like lists and maps.
Production-ready primary, secondary indexes
Aerospike uses a proprietary partitioned data structure for its indexes, employing fine-grained individual locks to reduce memory contention across partitions. These structures ensure that frequently accessed data has locality and falls within a single cache line, reducing cache misses and data stalls. For example, the index entry in Aerospike is exactly 64 bytes, the same size as an X86 64-bit cache line.
By default, secondary indexes are kept in DRAM for fast access and are co-located with the primary index but can also be stored on SSD to save on memory.
Each secondary index entry references only primary and replicated records local to the node. When a query involving a secondary index executes, records are read in parallel from all nodes; results are aggregated on each node and then returned to the Aerospike client layer for return to the application.
Firms can also opt to store all user and index data (both primary and secondary) on SSDs to improve cost efficiency while still maintaining single-digit millisecond response times.
However, Aerospike can only use a single index in each query.
Implications
Both Aerospike and Cassandra have strong primary index support. However, while Cassandra's approach to secondary indexing has been challenging for years, Aerospike's technology has proven its effectiveness in production. This is particularly important for analytical applications, as secondary indexes play a crucial role in speeding up data access when filtering on non-primary key values.
Interoperability
(Ecosystem)Wide range of ready-made connectors available from third parties
Various open source and third-party offerings provide access to Cassandra. Performance, capabilities and technical support vary. Community offers suggestions for roll-your-own integration with many popular technologies.
Wide range of ready-made connectors available from Aerospike
Performance-optimized connectors for Aerospike are available for many popular open source and third-party offerings, including Kafka, Spark, Presto-Trino, JMS, Pulsar, Event Stream Processing (ESP), and Elasticsearch. These connectors, in turn, provide broader access to Aerospike from popular enterprise tools for business analytics, AI, event processing, and more.
Implications
Making critical business data quickly available to those who need it often requires integration with existing third-party tools and technologies. While connection points are readily available for both Aerospike and Cassandra, Aerospike offers turnkey connectors to many popular technologies to promote fast integration and high-performance data access.
Caching and persistence options
Persistent store only (no in-memory only configuration).
Cassandra cannot be configured as an in-memory only platform, which is for highest performance.
To increase write performance, Cassandra temporarily stores writes in memory before flushing to disk for persistence.
Caching of frequently accessed data, however, along with memory allocation tuning can be done.
Easily configured as a high-speed cache (in-memory only) or as a persistent store
Flexible configuration options enable Aerospike to act as
(1) a high-speed cache to an existing relational or non-relational data store to promote real-time data access and offload work from the back end
or
(2) an ultra-fast real-time data management platform with persistence.
Aerospike can store all data and indexes in DRAM, all data and indexes on SSDs (Flash), or a combination of the two (data on SSDs and indexes in DRAM). As of Aerospike 7.1, adds support for NVMe-compatible, low-cost cloud block storage, and common enterprise networked attached storage (NAS).
Implications
Aerospike’s flexible deployment options enable firms to standardize on its platform for a wide range of applications, reducing the overall complexity of their data management infrastructures and avoid cross-training staff on multiple technologies. Many firms initially deploy Aerospike as a cache to promote real-time access to other systems of record or systems of engagement and later leverage Aerospike’s built-in persistence features to support additional applications.
Multi-tenancy
Some multi-tenancy, though it can impact performance
Cassandra operators can employ a Tenant ID in the Partition Key to ensure data from different tenants is stored and queried separately. However, this could lead to data skew/imbalance across nodes if tenant data varies greatly.
Cassandra operators can also implement separate keyspaces for clear separation and easy management. However, it requires more operational overhead and potential resource allocation issues.
Various Aerospike server features enable effective multi-tenancy implementations
Aerospike’s key features for multi-tenancy are separate namespaces (databases), role-based access control in conjunction with sets (akin to RDBMS tables), operational rate quotas, and user-specified storage limits to cap data set size.
Implications
Aerospike has more features to execute multi-tenancy with more control to lessen any unwanted impacts of implementing.
Hardware optimization
Designed for commodity (low cost) servers
Cassandra is designed to run on low-cost commodity servers and promotes adding more nodes to address growing workloads and data volumes. Specific hardware or networking exploitation is at odds with this design approach.
Designed to exploit modern hardware and networking technologies
Aerospike is designed and implemented explicitly to exploit advances in modern hardware to maximize runtime performance and cost efficiency.
Aerospike is massively multi-threaded to get the most from today’s multi-core processors.
NVMe, Flash, and SSDs are treated as raw block devices to reduce I/O for the lowest latency, avoiding overhead from standard storage drivers and file systems. Aerospike data structures are partitioned with fine-grained locks to avoid memory contention for more efficient use of multi-core CPUs. Application Device Queues (ADQs) are used with certain networking devices to reduce context switching and keep data in local processor caches.
Implications
Aerospike clusters can manage more aggressive workloads and higher data volumes with fewer nodes than the equivalent Cassandra cluster, reducing operational complexity and TCO.
Change Data Capture
Data replication architecture makes CDC complex.
Table granularity with user implementation for log consumption
Each database change generates duplicate CDC records. For example, nine copies of three regions and RF=3. Complex workarounds and algorithms must be introduced to process and resolve these.
Integrated via change notifications with granular data options and automated batch shipments.
Change Data Capture (CDC) is an Aerospike cluster feature.
Granular options for capturing (and replicating) changed data, ranging from full namespaces (databases) to subsets of select records.
Aerospike logs minimal information about each change (not the full record), batching changed data and shipping only the latest version of a record.
Thus, multiple local writes for one record generate only one remote write – an important feature for “hot” data.
Aerospike also provides Change Data notifications to external systems, like Kafka or other databases.
Implications
Aerospike provides more granular options for determining what data changes are captured. This can reduce the cost and improve the latency of moving data between systems. It may be inappropriate for some CDC use cases where frequent updates must be captured since it summarizes multiple local writes. Cassandra’s architecture makes CDC use cases unwieldy.