MongoDB vs Aerospike
The table below outlines key technology differences between Aerospike Database 7.1 and MongoDB Enterprise Server 7.0.1.
Data model
Document store
MongoDB uses a flexible, document-oriented data model to embed related data within a single document structure. Documents are grouped in databases and collections.
While not a graph database or key-value store, MongoDB offers graph traversal capabilities, which allow users to model key-value concepts in documents and conduct vector searches.
Multi-model (key-value, document, graph) data platform
Aerospike distributes and stores sets of records contained in namespaces (akin to “databases”). Each record has a key and named fields (“bins”). A bin can contain different types of data from the simple (e.g., integer, string) to the complex (e.g., nested data in a list of maps of sets).
This provides considerable schema flexibility.
Aerospike supports fast processing of Collection Data Types (CDTs) which contain any number of scalar data type elements and nesting elements such as lists and maps. Nested data types can be treated exactly as a document.
Aerospike’s structure enables users to model, store, and manage key-value data, JSON documents, and graph data with high performance at scale.
Implications
Aerospike’s ability to efficiently support multiple data models allows firms to use a single data platform to support at least a wide range of applications and business needs. (Note: Aerospike Vector Search can be previewed.)
Storage model
B-tree based
To deliver good performance, MongoDB caches the data on RAM. The working set must also fit on RAM. This increases the running cost and reduces the database's performance predictability.
Since v3.2, MongoDB uses a B-tree-based storage engine (WiredTiger WT) by default, which it recommends for operational databases. WT supports SSDs but treats them as ordinary storage devices and thus cannot get high performance from SSDs.
Custom, high-performance format with storage engine choice
Designed as an operational distributed database, Aerospike employs a specialized log-structured file system that optimizes using flash drives (SSDs) as primary data storage without heavy dependence on RAM for performance.
Aerospike uses SSDs as raw devices, employing a proprietary log-structured file system rather than relying on file system, block, and page cache layers for I/O. This provides distinct performance and reliability advantages. Aerospike uses raw-device block writes and lightweight defragmentation.
Firms can choose from hybrid memory (indexes-only in DRAM and data on Flash), all DRAM (i.e. in-memory), all-flash, and as of Aerospike 7.1, support for NVMe-compatible, low-cost cloud block storage, and common enterprise networked attached storage (NAS).
Implications
Aerospike’s approach provides fast, predictable performance at scale, which is one of the primary reasons some MongoDB users have migrated to Aerospike. Providing sub-millisecond response times with SSDs means Aerospike clusters have fewer nodes, which lowers total cost of ownership (TCO), improves maintainability, and promotes greater reliability. Aerospike further lowers TCO with options for networked, NVMe-compatible storage.
Caching and persistence options
Primarily a persistent document store
Although MongoDB can be configured for in-memory storage only, most firms deploy it as a persistent document store, keeping indexes and the working data set in memory for performance and persisting user data to disk for durability.
MongoDB was not designed to serve as a caching layer for a back-end store.
Easily configured as a high-speed cache (in-memory only) or as a persistent store
Flexible configuration options enable Aerospike to act as
(1) a high-speed cache to an existing relational or non-relational data store to promote real-time data access and offload work from the back end
or
(2) an ultra-fast real-time data management platform with persistence.
Aerospike can store all data and indexes in DRAM, all data and indexes on SSDs (Flash), or a combination of the two (data on SSDs and indexes in DRAM). As of Aerospike 7.1, adds support for NVMe-compatible, low-cost cloud block storage, and common enterprise networked attached storage (NAS).
Implications
Aerospike’s flexible deployment options enable firms to standardize on its platform for a wide range of applications, reducing the overall complexity of their data management infrastructures and avoiding the need to cross-train staff on multiple technologies. Many firms initially deploy Aerospike as a cache to promote real-time access to other systems of record or systems of engagement and later leverage Aerospike’s built-in persistence features to support additional applications. By contrast, MongoDB is widely deployed as a persistent document store.
Client access
Sharded clusters use a proxy (mongos) to route client requests
MongoDB clients have to connect to a load balancer called Mongos. Mongos is responsible for accessing the cluster nodes to perform the query. This adds an additional network hop that, in high-performant use cases, is not desirable.
Smart Client knows where every data element is minimizing network “hops”
Aerospike clients (Smart Clients) are aware of the data distribution across the cluster; therefore, they can send their requests directly to the node responsible for storing the data. This reduces the network hops required, improving the database's performance.
Aerospike’s Smart Client layer maintains a dynamic partition map that identifies the primary node for each partition. This enables the client layer to route read or write requests directly to the correct nodes without any additional network hops.
Since Aerospike writes synchronously to all copies of the data, there is no delay for a quorum read across the cluster to get a consistent version of the data.
Implications
Aerospike’s on-premises approach minimizes network traffic, often enabling a client to access target data with a single network “hop.” This reduces latencies and promotes fast, predictable performance. By contrast, MongoDB’s proxy-based systems (both on-prem and cloud) frequently incur added network overhead and introduce greater operational complexity.
Scalability options
Vertical and horizontal scaling; advanced planning strongly advised
Data distribution
Data is distributed in shards of the cluster. The number of shards is defined by the operator, making it a difficult operational task. Some sharding strategies are less automatic than others.
Users must instruct MongoDB to shard data across multiple nodes by selecting one of three sharding strategies (range, hash, or zone) and configuring various components. Insufficient planning can lead to performance and scaling issues later, including poor data distribution, overused/underused shards, added overhead due to chunk migration, and troubleshooting challenges. A sharded cluster needs a minimum of eight nodes, and each added shard requires at least three added nodes to fulfill replica requirements.
Vertical scaling
MongoDB server scaling is a CPU-heavy process. Vertical scaling is supported and can be “cheaper and easier” than sharding. Increasing the resources beyond some point does not result in better performance because of all of the context switches the process requires. Currently, Atlas (cloud offering) can scale vertically to handle four TB databases. For larger databases, horizontal scaling (sharding) must be employed.
Horizontal scaling
Data distribution is not necessarily minimal. In most cases, a significant portion of data must be moved, which takes a long time.
MongoDB has a primary/secondary architecture in each replica set. In addition, Mongos processes act as load balancers in front of the cluster. Then, there are config servers that store the metadata about the cluster. Scaling a cluster may require changes to all of these components.
Vertical and horizontal scaling. Automatic data movement and automatic rebalancing when adding nodes
Aerospike handles massive customer growth without having to add many nodes based on its SSD-friendly Hybrid Memory Architecture and flexible configuration options.
Data distribution
Aerospike distributes data across cluster nodes automatically. When a node joins or leaves the cluster, the data is automatically redistributed.
Aerospike automatically shards data into 4,096 logical partitions evenly distributed across cluster nodes. When cluster nodes are added, partitions from other cluster nodes are automatically migrated to the new node, resulting in very little data movement.
Vertical scaling
Aerospike exploits SSDs, multi-core CPUs, and other hardware and networking technologies to scale vertically, making efficient use of these resources. You can scale by adding SSDs. (There is no theoretical upper limit for the amount of resources that can be added to a single node.)
Horizontal scaling
The data distribution is random; therefore, scaling an Aerospike cluster in and out results in less data movement. Also, Aerospike follows a peer-to-peer architecture, meaning that no node has a special role. In this architecture, the load gets equally distributed across all the cluster nodes.
Implications
Aerospike's better data distribution and more efficient horizontal and vertical scaling make it easier to operate and provide better performance at a lower cost.
Availability
High availability achieved with replication factor 3
MongoDB’s recommended number of replicas is 3. It is possible to have RF=2 with an arbiter, but that adds complexity to the architecture. Higher replication factors are possible, but as the architecture is quorum-based, the number of nodes in a replica set must be odd (5,7, ...).
Because MongoDB uses a primary/secondary architecture, failures cause intermittent outages that, on average, take up to 12 seconds.
Node failure doesn't cause long-term loss, but operator intervention at critical points is still required as a cluster with a failed node is in a critical state.
High availability achieved with replication factor 2
Aerospike automatically detects and responds to many network and node failures to ensure high availability of data without requiring operator intervention.
Aerospike's recommended number of replicas for achieving high availability is 2. Higher replication factors are possible (3,4,5, ...).
Because Aerospike uses a peer-to-peer architecture, failures don't cause intermittent outages. If a node fails, the cluster can still immediately respond to all of the requests related to the failed node.
Aerospike auto-heals itself after a failure, reducing the need for operator intervention during critical events.
Implications
Achieving high availability with fewer replicas reduces operational costs, hardware costs, and energy consumption. With Aerospike, automated recovery from common failures promotes 24x7 operations, helps firms achieve target SLAs, and reduces operational complexity.
Consistency
Causal consistency supported
MongoDB supports causal consistency, strong consistency, and multi-record transactions (MRT).
Both readers and writers of records must use the correct configuration for their requests to make their requests consistent.
To achieve consistency, MongoDB requires the majority (more than half) of the nodes of a replica set to respond. (if RF=3, 2 nodes).
MongoDB uses a custom protocol based on the replica set operations log (oplog) to achieve consistency. Although this protocol is based on Raft, it has many differences and improvements.
MongoDB’s support for causal consistency uses a quorum-based approach (RAFT consensus) in which heartbeats are exchanged per raft group. Growth in data volumes leads to more raft groups, more heartbeat checks, and more CPU consumption.
Causal consistency guarantees that causally related operations appear in the same order on all processes. This is considered a form of “available under partition” support. In addition, applications can instruct MongoDB to enforce linearizability when reading and writing operations on the primary copy.
Strong consistency supported
Aerospike supports strong consistency.
Aerospike consistency level is applied on the namespace level. No user can mistakenly write an inconsistent record into an otherwise consistent set.
Aerospike only needs a response from a single node to answer a read request consistently.
Aerospike uses a proprietary (roster-based) protocol that exchanges heartbeats between nodes, allowing CPU consumption to remain constant as data density per node increases.
With strong consistency, each write can be configured for linearizability (provides a single linear view among all clients) or session consistency (an individual process sees the sequential set of updates).
Each read can be configured for linearizability, session consistency, allow replica read from primary or any replica of data, and allow unavailable read from primary, any replica, or an unavailable partition.
Implications
While both Aerospike and MongoDB passed independent Jespen consistency tests, Aerospike requires fewer nodes and incurs less CPU overhead at scale to ensure high levels of data consistency.
Multi-site support
Automated data replication across multiple clusters; A single cluster can span multiple sites
MongoDB supports multi-data center replication in two ways:
Sharded distribution The shard key defines which data center stores the shard in which each record must be stored. The user has control over where to store the data, but the data will be only local to that data center. Local read and write requests will be fast, but the requests from the other data centers will be very slow.
Replicaset distribution The nodes of a single replica set can be deployed on multiple data centers. The reads can happen locally but the writes of only one data center will be fast. The other data centers will be slow. In case of a failure, the leader selection operation will cause unpredictable performance issues. Expanding the cluster where either of these two methods is used would be difficult.
MongoDB asynchronously replicates data between the primary copy and replicas, subject to certain restrictions (e.g., replica sets can have up to 50 members but no more than seven voting members). Replica set tags can be configured and referenced in client applications to promote performance and cost-efficiency objectives.
Offers Atlas Edge Server and Atlas Device Sync for edge-to-core computing.
Automated data replication across multiple clusters; A single cluster can span multiple sites
Supports multi-site deployments for varied business purposes: Continuous operations, fast localized data access, disaster recovery, global transaction processing, edge-to-core computing, and more.
Asynchronous active-active replication (via Cross Datacenter Replication, XDR): is achieved in sub-millisecond or single-digit milliseconds. All or part of the data in two or more independent data centers gets replicated asynchronously. The replication can be one way or two ways.
The clients can read and write from the data center close to them.
The expected lag between data centers is in the order of a few milliseconds. Optimizations to minimize transfer of frequently updated data.
XDR also supports selective replication (i.e., data filtering) and performance optimizations to minimize transfer of frequently updated data.
Synchronous active-active replication (via multi-site clustering): A single cluster is formed across multiple data centers. Achieved in part via rack awareness, pegging primary and replica partitions to distinct data centers. Automatically enforces strong data consistency.
The clients can read data from the node close to them; the expected latency will be less than a millisecond. However, the write requests may need to be written in a different data center, which may increase the latency to a few hundred milliseconds.
Implications
Global enterprises require flexible strategies for operating across data centers. Aerospike supports both synchronous and asynchronous replication of data across multiple data centers in a variety of configurations. Firms can simultaneously configure Aerospike clusters across sites, data centers, availability zones, regions, and even cloud providers. This enables applications to customize deployments according to their resilience and availability needs.
Indexing
Production-ready primary, secondary indexes
A MongoDB query can use multiple indexes.
Based on B-tree structures, MongoDB indexes can be defined for primary and secondary key values, including compound indexes that collect on sort data on two or more fields. With few exceptions, all indexes must be kept in RAM (along with the working data set).
Indexes are stored in both memory and disk. For best performance, it is recommended to have enough memory to keep all of the indexes in memory.
Production-ready primary, secondary indexes
Aerospike uses a proprietary partitioned data structure for its indexes, employing fine-grained individual locks to reduce memory contention across partitions. These structures ensure that frequently accessed data has locality and falls within a single cache line, reducing cache misses and data stalls. For example, the index entry in Aerospike is exactly 64 bytes, the same size as an X86 64-bit cache line.
By default, secondary indexes are kept in DRAM for fast access and are co-located with the primary index but can also be stored on SSD to save on memory.
Each secondary index entry references only primary and replicated records local to the node. When a query involving a secondary index executes, records are read in parallel from all nodes; results are aggregated on each node and then returned to the Aerospike client layer for return to the application.
Firms can also opt to store all user and index data (both primary and secondary) on SSDs to improve cost efficiency while still maintaining single-digit millisecond response times.
However, Aerospike can only use a single index in each query.
Implications
Aerospike and MongoDB both support varied indexing options to speed data access. Aerospike allows firms to store index data on SSDs, if desired, to reduce RAM dependency and improve cost efficiency while still maintaining high performance. MongoDB, however, allows for more complex indexing overall.
Query language and capabilities
Proprietary language (MQL); optional SQL connector
MongoDB Query Language (MQL) provides CRUD (create, read, update, and delete) operations tailored for document processing. Complex operators, aggregations, and specialized operators (e.g., geospatial) are supported. Query federation supported via Atlas Data Federation (cloud offering).
MongoDB Connector for Business Intelligence supports SQL queries, translating queries and data between a MongoDB instance and third-party SQL tools. Wide range of SQL operators supported.
SQL-like capabilities and SQL connectors with broad data retrieval features
Aerospike Quick Look (AQL) offers SQL-like querying capabilities combined with advanced features like secondary indexing, expressions, and user-defined functions (UDFs). It integrates with popular SQL-based tools while also providing a native API for developers to leverage Aerospike's full performance potential.
Aerospike recommends using its native API for optimal performance and full control over data access.
The Aerospike API allows implementation of specific SQL operations like SELECT, UPDATE, CREATE, and DELETE, with fine-grained control over performance
Aerospike's data modeling approach aims to minimize the need for SQL constructs like JOINs to maximize performance and scalability benefits
SQL access available via Aerospike-built connectors, optimized for high performance with Spark and Presto/Trino. Application developers can use simple SQL with JDBC and the community-contributed JDBC Connector.
The Aerospike Spark connector can employ up to 32K Spark partitions to read data from Aerospike namespaces. Predicate filtering and scan-by-partition also promote strong performance.
Joins, unions, intersections, aggregations, and other sophisticated SQL operations are fully supported. Query federation is supported. The use of secondary indexes is supported.
Implications
Although MongoDB’s primary query interface is proprietary, both MongoDB and Aerospike support SQL via connectors. This promotes integration with popular third-party analytic tools and leverages developers’ SQL programming skills.
Interoperability
(Ecosystem)A wide range of ready-made connectors are available from MongoDB
Connectors for Spark, Kafka, SQL, and other technologies are available. At least some are optimized to use MongoDB’s aggregation pipeline and secondary indexes.
Wide range of ready-made connectors available from Aerospike
Performance-optimized connectors for Aerospike are available for many popular open source and third-party offerings, including Kafka, Spark, Presto-Trino, JMS, Pulsar, Event Stream Processing (ESP), and Elasticsearch. These connectors, in turn, provide broader access to Aerospike from popular enterprise tools for business analytics, AI, event processing, and more.
Implications
Making critical business data quickly available to those who need it often requires integration with existing third-party tools and technologies. Both vendors offer connectors to popular technologies; performance characteristics and features vary.