DynamoDB vs. Aerospike
The table below outlines key technology differences between Aerospike Enterprise Edition 7.2 and AWS DynamoDB 2019.11.x.
Data models
Multi-model (key-value, document)
DynamoDB distributes and stores tables of items (rows). Each item can have a primary key, a sort key, and one or more attributes (named fields).
Each attribute can contain different types of data, from scalar (integer, string) to complex (lists and maps with up to 32 levels of nesting).
Each item has a size limit of 400KB. Modeling data with vertical partitioning is recommended as the “solution to the large item quandary” for items that exceed 400KB.
With DynamoDB, users can model, store, and manage key-value data and JSON documents with high performance at scale. AWS recommends Neptune as a graph database, although some users have developed modeling techniques and technologies to use DynamoDB as a backend for graph data storage. AWS recommends OpenSearch Service for vector storage and search.
Multi-model (key-value, document, graph, vector)
Aerospike distributes and stores sets of records contained in namespaces (akin to databases). Each record has a primary key and named fields (bins). A bin can contain different types of data from the simple (integer, string) to the complex (nested data in a list of maps of sets for a collection data type).
Each record has a size limit of 8 MB for data stored on devices (e.g., SSDs) and 134 MB for memory-only deployments.
This provides considerable schema flexibility.
Aerospike supports fast processing of collection data types (CDTs), which contain any number of scalar data type elements and nesting elements such as lists and maps. Nested data types can be treated exactly as a document.
Aerospike’s structure enables users to model, store, and manage key-value data, JSON documents, graph, and vector data with high performance at scale.
Implications
Both offerings support multiple data models, enabling firms to support a wide range of applications that benefit from key-value and document structures. Aerospike also offers high-performance solutions for graph and vector data, while AWS introduces complexity to the overall architecture by only offering distinct services for graph and vector data independent of DynamoDB. Aerospike supports larger record (item) sizes, enabling simpler designs. Aerospike also provides more flexible and robust operations on complex data types, such as Maps and Lists, with capabilities to apply filtering, sorting, and range queries directly on these structures.
Deployment options
AWS cloud only
DynamoDB is a fully automated database-as-a-service (DBaaS) for AWS only.
On-prem, multi-cloud, hybrid
Aerospike can be deployed
On-premises
As a cloud-managed service on AWS, Azure, or GCP
In hybrid configurations
With the type: entry-hyperlink id: 37YzjBztIfEYstZhhnf1He (AKO), which supports Amazon Elastic Kubernetes Service (EKS), Microsoft Azure Kubernetes Service (AKS), Google Kubernetes Service (GKS), and local deployments.
Implications
Aerospike supports more deployment options than DynamoDB, offering firms greater flexibility to tailor deployments for different applications and implement fail-over strategies spanning multiple cloud platforms or cloud and on-premises configurations.
Storage model
B-tree
DynamoDB is an AWS database service that distributes and stores data partitions in one or more cloud regions.
Data is stored in a B-tree structure, and SSDs are used to persist these data structures.
Custom, high-performance format with storage engine choice
Designed as an operational distributed database, Aerospike employs a specialized log-structured file system that optimizes using flash drives (SSDs) as primary data storage without heavy dependence on RAM for performance.
Aerospike uses SSDs as raw devices, employing a proprietary log-structured file system rather than relying on file system, block, and page cache layers for I/O. This provides distinct performance and reliability advantages. Aerospike uses raw-device block writes and lightweight defragmentation.
Firms can choose from hybrid memory (indexes-only in DRAM and data on Flash), all DRAM (i.e. in-memory), all-flash, and as of Aerospike 7.1, support for NVMe-compatible, low-cost cloud block storage, and common enterprise networked attached storage (NAS).
Implications
Aerospike enables users to tailor memory and storage options to fit their application and budget needs. This provides cost efficiency and predictability for high-performance applications at scale. Its internal storage format is optimized to exploit fast SSDs. Whereas insertions and deletions can trigger splits or merges for B-trees, leading to more disk I/O, especially in write-intensive workloads.
Client access
Request router determines storage node to handle the request
DynamoDB maintains multiple request router instances that perform user authentication and authorization and consult a partition metadata component to identify all partition replicas. Depending on the nature of the request (read or write) and the desired data consistency level, the router will forward the request to the replica, serving as the partition leader or follower.
Smart Client knows where every data element is minimizing network “hops”
Aerospike clients (Smart Clients) are aware of the data distribution across the cluster; therefore, they can send their requests directly to the node responsible for storing the data. This reduces the network hops required, improving the database's performance.
Aerospike’s Smart Client layer maintains a dynamic partition map that identifies the primary node for each partition. This enables the client layer to route read or write requests directly to the correct nodes without any additional network hops.
Since Aerospike writes synchronously to all copies of the data, there is no delay for a quorum read across the cluster to get a consistent version of the data.
Implications
Aerospike’s approach minimizes network traffic, enabling a client to access target data with a single network hop. The expectation of latency from DynamoDB is higher than that of Aerospike; having the Smart Client reduces Aerospike's overall latency and stabilizes it.
Scalability options
Automatic horizontal scaling, data distribution
Designed for distributed computing on AWS, DynamoDB automatically hashes data into partitions to promote even data distribution throughout the cluster. Items are hashed on unique key values (e.g., a single primary key or composite of the primary key and a sort key). DynamoDB implements a consistent hashing technique to minimize data movement when partitions are added or removed. No user involvement is required.
Horizontal scaling is achieved with a separate AWS Application Auto Scaling service, which allows users to set policies on DynamoDB tables and global secondary indexes to automatically adjust provisioning based on consumption. No user involvement is needed after policies are set.
Vertical scaling is not within user control.
Vertical and horizontal scaling. Automatic data movement and automatic rebalancing when adding nodes
Aerospike handles massive customer growth without having to add many nodes based on its SSD-friendly Hybrid Memory Architecture and flexible configuration options.
Data distribution
Aerospike distributes data across cluster nodes automatically. When a node joins or leaves the cluster, the data is automatically redistributed.
Aerospike automatically shards data into 4,096 logical partitions evenly distributed across cluster nodes. When cluster nodes are added, partitions from other cluster nodes are automatically migrated to the new node, resulting in very little data movement.
Vertical scaling
Aerospike exploits SSDs, multi-core CPUs, and other hardware and networking technologies to scale vertically, making efficient use of these resources. You can scale by adding SSDs. (There is no theoretical upper limit for the amount of resources that can be added to a single node.)
Horizontal scaling
The data distribution is random; therefore, scaling an Aerospike cluster in and out results in less data movement. Also, Aerospike follows a peer-to-peer architecture, meaning that no node has a special role. In this architecture, the load gets equally distributed across all the cluster nodes.
Implications
Cost-efficient use of hardware resources and self-managing features often enable Aerospike to deliver comparable or better performance on clusters with fewer nodes. This lowers total cost of ownership (TCO) and promotes ease of maintenance, including changes in cluster size.
Pricing
Based on ops (reads/writes) and storage
DynamoDB charges for reading, writing, and storing data. Optional services (such as backup/restore, data import/export, and others) incur additional fees.
The platform can be configured in on-demand capacity mode or provisioned capacity mode. With on-demand, charges are based on read and write request units (RRUs, WRUs) consumed by the user workload. AWS automatically ramps up or down in response. Users do not specify throughput. With provisioned capacity, users specify the number of RRUs and WRUs per second they expect to need and are charged accordingly. Auto scaling can be used in provisioned mode to address cost and performance concerns.
With either mode, a single application read or write command may require multiple RRUs or WRUs, depending on item size, data consistency level, transactional aspects, and other factors.
Based on unreplicated (unique) data volume under management
Aerospike charges primarily by unique production data volume and for additional features. On top of that, there is no charge for operations; there are no charges based on the number of servers or cores.
Additional features typically involve a fixed percentage uplift based on your unique production data volume. Features are individual and not bundled unless they are part of a service and are stated explicitly.
Low TCO at scale is one of the most common reasons that customers choose Aerospike.
Implications
DynamoDB offers two distinct pricing models to appeal to users with predictable or unpredictable throughput requirements. Careful understanding of which operations incur additional “request units” and which optional services are needed to support business requirements is essential for cost planning. Aerospike’s operational costs are typically easier to predict. Furthermore, Aerospike does not charge additional fees for backup / restore, data transfers to other regions, and caching configurations.
Availability
High availability achieved with replication factor 3
Each partition in DynamoDB is replicated across three servers (one leader, one synchronous follower, and one asynchronous follower). DynamoDB automatically detects and responds to many network and node failures to ensure high data availability without user intervention.
High availability achieved with replication factor 2
Aerospike automatically detects and responds to many network and node failures to ensure high availability of data without requiring operator intervention.
Aerospike's recommended number of replicas for achieving high availability is 2. Higher replication factors are possible (3,4,5, ...).
Because Aerospike uses a peer-to-peer architecture, failures don't cause intermittent outages. If a node fails, the cluster can still immediately respond to all of the requests related to the failed node.
Aerospike auto-heals itself after a failure, reducing the need for operator intervention during critical events.
Implications
Aerospike achieves durability and high availability with fewer replicas, reducing operational costs and energy consumption.
Consistency
(CAP Theorem approach)Eventual consistency and strong consistency modes
Supports eventual consistency by default for reads to maximize throughput. Strongly consistent reads are also supported on tables and local secondary indexes but consume added read capacity units, which results in higher operational costs.
Writes are strongly consistent.
No public record of Jepsen testing available.
Both High Availability (AP) mode and Strong Consistency (CP) mode
Aerospike provides distinct high availability (AP) and (strong consistency) (CP) modes to support varying customer use cases.
The independent Jepsen testing in 2018 validated Aerospike’s claim of strong consistency. Strong consistency mode prevents stale reads, dirty reads, and data loss.
With strong consistency, each write can be configured for linearizability (provides a single linear view among all clients) or session consistency (an individual process sees the sequential set of updates).
Each read can be configured for linearizability and session consistency. It can also allow replica reads (read from the primary or any replica of data) and unavailable responses (read from the primary, any replica, or an unavailable partition).
Aerospike’s roster-based consistency algorithm requires only N+1 copies to handle N failures. Aerospike automatically detects and responds to many network and node failures to ensure high data availability without requiring operator intervention.
High Availability (AP)/partition tolerant mode emphasizes data availability over consistency in failure scenarios.
Modes and consistency levels can be defined at the namespace level (database level).
Implications
Both platforms support high levels of consistency, including strong immediate consistency. Only Aerospike passed the Jepsen tests.
Multi-site support
Global table can span multiple regions
It supports multi-site deployments for varied business purposes, including continuous operations, fast localized data access, and more. Global tables enable replicas to be stored in different regions, with read/write access occurring in any region. Changes are automatically replicated across regions asynchronously, typically within a second.
Reads spanning multiple regions of global tables operate under eventual consistency. Write conflicts across regions are resolved with a last-writer-wins strategy. Transactions are not supported across regions in global tables. Note: As of Q4’24, global tables will soon have strong consistency as well.
Users who want to maintain copies of tables (or subsets of tables) across different DynamoDB database instances can implement their own solutions using other AWS services (e.g., Streams and Lambda) if desired.
Automated data replication across multiple clusters; A single cluster can span multiple sites
It supports multi-site deployments for varied business purposes, including continuous operations, fast localized data access, disaster recovery, global transaction processing, edge-to-core computing, and more.
Asynchronous active-active replication (via Cross Datacenter Replication, XDR) is achieved in sub-millisecond or single-digit milliseconds. All or part of the data in two or more independent data centers gets replicated asynchronously. The replication can be one-way or two-way.
The clients can read and write from the data center close to them. The expected lag between data centers is in the order of a few milliseconds. Optimizations to minimize the transfer of frequently updated data.
XDR also supports selective replication (i.e., data filtering) and performance optimizations to minimize the transfer of frequently updated data.
Synchronous active-active replication (via multi-site clustering): A single cluster is formed across multiple data centers. Achieved in part via rack awareness, pegging primary and replica partitions to distinct data centers. Automatically enforces strong data consistency.
The clients can read data from the node close to them; the expected latency will be less than a millisecond. However, the write requests may need to be written in a different data center, which may increase the latency to a few hundred milliseconds.
Implications
Global enterprises require flexible strategies for operating across data centers. Aerospike supports both synchronous and asynchronous data replication across multiple data centers in a variety of configurations. Firms can configure Aerospike clusters across sites, data centers, availability zones, regions, and even cloud providers simultaneously. This enables applications to customize deployments according to their resilience and availability needs.
Indexing
Primary and secondary key access
In addition to access by partition key (and an optional sort key), DynamoDB supports secondary indexes. These are stored as DynamoDB tables (and, therefore, B-tree structures).
A local secondary index (LSI) shares the same partition key as the user table it indexes but can have a different sort key and can copy a subset of the table’s attributes. Users may define up to 5 LSIs per table.
A global secondary index (GSI) can have a different partition key and sort key from the user table and copy a subset of the table’s attributes. GSIs support only eventually consistent reads because the index and corresponding user table partitions may not be co-located. Users may define up to 20 GSIs per table by default; increases can be requested via AWS support.
Production-ready primary, secondary indexes
Aerospike uses a proprietary partitioned data structure for its indexes, employing fine-grained individual locks to reduce memory contention across partitions. These structures ensure that frequently accessed data has locality and falls within a single cache line, reducing cache misses and data stalls. For example, the index entry in Aerospike is exactly 64 bytes, the same size as an X86 64-bit cache line.
By default, secondary indexes are kept in DRAM for fast access and are co-located with the primary index, but they can also be stored on SSD to save on memory.
Each secondary index entry references only primary and replicated records local to the node. When a query involving a secondary index executes, records are read in parallel from all nodes; results are aggregated on each node and then returned to the Aerospike client layer for return to the application.
Firms can also opt to store all user and index data (both primary and secondary) on SSDs to improve cost efficiency while still maintaining single-digit millisecond response times.
However, Aerospike can only use a single index in each query.
Implications
Aerospike and DynamoDB both support primary key access and secondary indexes to speed data retrieval. However, Aerospike gives firms more control over index storage (in RAM or on SSDs) to satisfy price/performance objectives. Furthermore, Aerospike’s secondary indexes support strong data consistency for both localized and multi-region scenarios but do not currently support sorting operations or using multiple secondary indexes in a single query.
Query language and capabilities
Native API; SQL-compatible access via PartiQL
DynamoDB supports PartiQL, a SQL-92 compatible language maintained by Amazon and offered as open source with Apache 2.0 licensing. Limitations apply to PartiQL operations supported by DynamoDB. For example, reads (SELECTs) support only certain aggregate and conditional functions. Writes (UPDATE, INSERT, DELETE) are also supported with certain limitations (e.g., updates and deletes can only apply to 1 item at a time). Use of secondary indexes is supported through explicit inclusion in query syntax (“... FROM table.index”).
Similar queries can result in significant cost differences (RCU consumption), leading some users to recommend exporting DynamoDB data to a system that supports a full SQL dialect in an efficient manner.
Dynamo also offers a native (noSQL) query API.
SQL-like capabilities and SQL connectors with broad data retrieval features
Aerospike Quick Look (AQL) offers SQL-like querying capabilities combined with advanced features like secondary indexing, expressions, and user-defined functions (UDFs). It integrates with popular SQL-based tools while providing a native API for developers to leverage Aerospike's full performance potential.
Aerospike recommends using its native API for optimal performance and full control over data access.
The Aerospike API allows the implementation of specific SQL operations, such as SELECT, UPDATE, CREATE, and DELETE, with fine-grained performance control.
Aerospike's data modeling approach aims to minimize the need for SQL constructs like JOINs to maximize performance and scalability benefits.
SQL access is available via Aerospike-built connectors, optimized for high performance with Spark and Presto/Trino. Application developers can use simple SQL with JDBC and the community-contributed JDBC Connector.
The Aerospike Spark connector can read data from Aerospike namespaces using up to 32K Spark partitions. Predicate filtering and scan-by-partition also promote strong performance.
JOINs, unions, intersections, aggregations, and other sophisticated SQL operations are fully supported. Query federation is supported, and the use of secondary indexes is supported.
Implications
Both platforms support native query APIs and SQL (or SQL-compatible) access. Aerospike’s SQL support is limited to reads but features a broader range of data retrieval functions, various performance optimizations, and query federation. DynamoDB supports SQL-compatible read/write access with a subset of capabilities available through open-source PartiQL. Price/performance considerations of PartiQL should be carefully considered, as seemingly similar queries can result in considerably more resource usage.
Interoperability
(Ecosystem)Turnkey integration with many AWS services. Third-party offerings available.
AWS offers various separate services for caching, auto-scaling, security, and other integrations with DynamoDB. Third-party offerings for integrating with Kafka, Spark, and other technologies are also available.
Wide range of ready-made connectors available from Aerospike
Performance-optimized connectors for Aerospike are available for many popular open-source and third-party offerings, including Kafka, Spark, Presto-Trino, JMS, Pulsar, Event Stream Processing (ESP), and Elasticsearch. These connectors, in turn, provide broader access to Aerospike from popular enterprise tools for business analytics, AI, event processing, and more.
Implications
Making critical business data quickly available to those who need it often requires integration with existing third-party tools and technologies. DynamoDB offers built-in integration with other popular AWS services; added fees for other services may apply. Aerospike offers performance-optimized connectors to many popular open-source and third-party offerings, including those not deployed on AWS. Some third-party offerings and open-source projects also offer connectors to DynamoDB; performance characteristics and features vary.
Caching and persistence options
Persistent store; separate service available for caching
DynamoDB is designed as a persistent, distributed cloud-based store. A separate service (DynamoDB Accelerator or DAX) is recommended for caching and is designed to be used with a DynamoDB backend to improve performance for target scenarios. DAX does not support other back-end stores.
Easily configured as a high-speed cache (in-memory only) or as a persistent store
Flexible configuration options enable Aerospike to act as:
(1) a high-speed cache to an existing relational or non-relational data store to promote real-time data access and offload work from the back end
or
(2) an ultra-fast real-time data management platform with persistence.
Aerospike can store all data and indexes in DRAM, all data and indexes on SSDs (Flash), or a combination of the two (data on SSDs and indexes in DRAM). As of Aerospike 7.1, adds support for NVMe-compatible, low-cost cloud block storage, and common enterprise networked attached storage (NAS).
Implications
Aerospike’s flexible deployment options enable firms to standardize on its platform for a wide range of applications, reducing the overall complexity of their data management infrastructures. Many firms initially deploy Aerospike as a cache to promote real-time access to other systems of record or systems of engagement and later leverage Aerospike’s built-in persistence features to support additional applications. By contrast, DynamoDB is widely deployed as a persistent store. If needed, a specialized caching service (DAX) is available only to front-end DynamoDB.