What is data modeling?
Data modeling is the exercise of mapping application objects onto the model and mechanisms provided by the database for persistence, performance, consistency, and ease of access. Data modeling with the Aerospike database allows applications to run efficiently and scale as needed.
The Aerospike database is purpose-built for applications that require predictable sub-millisecond access to billions and trillions of objects and need to store many terabytes and petabytes of data, while keeping the cluster size – and therefore the operational costs – small. The goals of large data size and small cluster size mean the capacity of high-speed data storage on each node must be high.
Aerospike pioneered the database technology to effectively use SSDs to provide high-capacity high-speed persistent storage per node. Among its key innovations are that Aerospike:
Accesses SSDs like direct addressable memory which results in superior performance,
Supports a hybrid memory architecture for index and data in DRAM, PMEM, or SSD,
Implements proprietary algorithms for consistent, resilient, and scalable storage across cluster nodes, and
Provides Smart Client for a single-hop access to data while adapting to the changes in the cluster.
Therefore, choosing the Aerospike database as the data store is a significant step toward enabling your application for speed at scale. By choosing the Aerospike database today, it is possible for a company of any size to leverage large amounts of data to solve real-time business problems and continue to scale in the future while keeping the operational costs low.
Data design should take into account many capabilities that Aerospike provides toward speed-at-scale such as data compression, Collection Data Types (CDTs), secondary indexes, multi-op requests, batch requests, server-side operations, cluster organization, and more. We discuss them later in this post.
NoSQL Data Modeling Principles
Aerospike is a NoSQL database, and does not have rigid schema as required by relational databases, To enable web-scale applications, Aerospike has a distributed architecture, and allows applications to choose availability or consistency during a network partition per the CAP theorem.
Typically, NoSQL data modeling starts with identifying the patterns of access in the application, that is, how the application reads and updates the data. The goal is to organize data for the required performance, efficiency, and consistency. In some NoSQL databases, design of keys, which serve as handles for access, is an important consideration for collocating them using a common property value. More on this later.In Aerospike, many key data modeling principles are applicable that are prevalent in NoSQL databases including the use of:
Denormalization: Allowing duplication of data, by storing it in multiple places, to simplify and optimize access.
Aggregates: Storing nested entities together in embedded form to simplify and optimize access.
Application joins: Performing joins in the application in rare cases when they are required, for example, to follow the stored references in many-to-many relationships.
Single record transactions: Storing data that must be updated atomically together in one record.