We are excited to be a part of AWS re:Invent 2024. Visit us at booth #1844 in Las Vegas.More info
Blog

Data Modeling for Speed At Scale (Part 2)

neel-phadnis-20ed9332d626250e23b8226a5208e50c
Neel Phadnis
Director - Developer Ecosystem
June 29, 2022|20 min read

This post focuses on the use of Collection Data Types (CDTs) for data modeling in Aerospike with a large number of objects. This is Part 2 in the two part series on Data Modeling. You can find the first post here.

Context

Data Modeling is the exercise of mapping application objects onto the model and mechanisms provided by the database for persistence, performance, consistency, and ease of access.

Aerospike Database is purpose built for applications that require predictable sub-millisecond access to billions and trillions of objects and need to store many terabytes and petabytes of data, while keeping the cluster size - and therefore the operational costs - small. The goals of large data size and small cluster size mean the capacity of high-speed data storage on each node must be high.

Aerospike pioneered the database technology to effectively use SSDs to provide high-capacity high-speed persistent storage per node. Among its key innovations are:

  • Access SSDs like direct addressable memory which results in superior performance,

  • Support a hybrid memory architecture for index and data in DRAM, PMEM, or SSD,

  • Implement proprietary algorithms for consistent, resilient, and scalable storage across cluster nodes, and

  • Provide Smart Client for a single-hop access to data while adapting to the changes in the cluster.

Therefore, choosing Aerospike Database as the data store is a significant step toward enabling your application for speed at scale. By choosing the Aerospike Database, a company of any size can leverage large amounts of data to solve real-time business problems and continue to scale in the future while keeping the operational costs low.

Part 1 described many capabilities that Aerospike provides toward speed-at-scale such as indexes, data compression, server-side operations, namespace and cluster configuration, multi-op requests, batch requests, and more.

This post focuses on Collection Data Types (CDTs), specifically List and Map data types, and discusses how applications can optimize speed as well as storage density by leveraging them.

. . . .

Collection Data Types (CDTs)

List and Map are the Collection Data Types (CDTs) in Aerospike. A List is a tuple or array of values, and a Map is a dictionary of key-value pairs. The element value can be of any supported type, including List and Map, and CDTs can be nested at an arbitrary level.

CDTs are essential to model:

  • aggregation of related objects in one record, allowing transactional semantics across multi-object updates,

  • containers for collection of objects to effectively store large number of objects, and

  • complex objects such as JSON documents.

We will briefly describe key CDT concepts like nesting and ordering before diving into specific modeling patterns and techniques.

Nested Elements and Context Path

A nested element in a CDT can be accessed directly by using its context-path. A context-path describes the path from the root or the top level of the CDT to a nested element, where each node in context-path uniquely identifies an element at that level by key, index, value, or rank (value order). The context-path points to only one element which can be of any data type, and an operation on the element identified by a context-path must be a valid operation on the element’s data type.

Consider a nested object represented as a Map (level 0): it has a List at level 1, and a Map at level 2.

Object = {  “id1”: [ {“a”: 1, “b”: 2}, {“c”: 3, “d”: 4} ], 
            “id2”: [ {“e”: 5, “f”: 6}, {“g”: 7, “h”: 8}] }

A context path to the nested element “c”, can look like: [By-Key(“id1”), By-Index(1), By-Key(“c”)].

Note, in the Aerospike API, the top CDT level (level 0) is implied by the bin and only lower level elements require a non-null context-path.

As a performance and convenience feature, CDTs allow creation of missing interim levels as part of the create operation of a nested element, as checking all path nodes prior to creation of a nested element in the application can be inconvenient and slow. For example, adding an element to a Map in a List, none of which exist, will first create the List, add the Map to the List, and then add the element to the Map.

Global Ordering of Values

Applications commonly use an Ordered List for use cases such as the Leaderboard. In Aerospike, List elements can be any type, and therefore Aerospike defines a deterministic ordering within and across supported types.

A CDT is frequently used as a container for objects that are represented as either List or Map. To support retrieval by rank (value-order) as well as by specific value-range, Aerospike defines how List values and Map values are compared. For example, two lists are compared by comparing their respective elements in stored order until the first inequality results or the end of either or both lists is reached. As well, there is a defined order across the data types. For example, any integer value is ranked lower than any string value. You can view the ordering rules here.

This deterministic comparison of values provides the basis for content or value based selection of elements such as By-Value and By-Value-Range.

. . . .

Modeling Objects with CDTs

CDTs make it possible to aggregate related objects in a single record, and therefore transactional updates across them.

Application objects can be stored in multiple ways in Aerospike: As a record, as a List, or as a Map.

Storing Objects As a Record

Object fields are stored in record bins. Flat objects can be stored in simple bins, that is, without use of CDTs, but objects with array and map fields require CDT bins.

For example:

Object:  id = 4, name = “Adam Smith”, start-date = 1/1/2015, department = finance, salary = 100000
Record bins: id:4, name: “Adam Smith”, start-date: 1/1/2015, department: finance, salary: 100000

Modeling objects as records have the following advantages:

  • Mapping of object fields to record bins is simple to understand.

  • Secondary indexes can only be defined at a bin level for field-value based access.

  • Certain data types like HyperLogLog (HLL) and BLOB can only be stored as a bin.

  • XDR sync granularity can be specified at a field (bin) level, allowing greater control and efficiency.

Storing Objects As a List

As a List, an object is stored as a tuple of its field values, where each field is placed at a specific position in the List.

For example:

Object: id = 4, name = “Adam Smith”, start-date = 1/1/2015, department = finance, salary = 100000
List bin: [4, “Adam Smith”, “20150101”, “finance”, 100000]

Lists offer these unique advantages:

  • The tuple form eliminates redundant storage of keys as in Map. Objects stored as tuples must use Unordered (or more precisely, insertion- or application-ordered) List, which preserves the tuple order irrespective of the field values. Application must manage the object schema, that is, how the tuple order matches the object fields, and also schema evolution. The Object Mapper library can be used to manage these aspects transparently.

  • Lists also allow convenient value-based selection of objects. The value of a List object is based on the initial field in the tuple. For example, objects represented as [type, size, color] can be retrieved by matching type values.

    • A wildcard based value match can be conveniently specified. For example, get-by-value(collection, [“type x”, *]) will match all List tuples in the collection with “type x” as their first element.

    • A value range based selection can be conveniently specified using the value delimiters NIL and INF, NIL denoting the absolute lowest and INF denoting the absolute highest value. For example, get-by-value-range(collection, [“value1”, NIL],[“value2”,INF]) selects all List objects with the first element between “value1” and “value2” (both values inclusive).

Storing Objects As a Map

As a Map, an object is stored as field specific key-value pairs.

For example:

Object: id = 4, name = “Adam Smith”, start-date = 1/1/2015, department = finance, salary = 100000
Map bin: {id: 4, “name”: “Adam Smith”, “start-date”: “20150101, “department”: “finance”, “salary”: 100000}

Advantages of using a Map are:

  • Maps provide a natural way for storing JSON documents.

  • The object schema is self defining, thus removing the burden on the application of managing it.

Map values are not convenient to compare for value based access as List values. Wildcard cannot be used to denote a range of Map values. For example, {a:*} or {a:1, *} cannot be used for value comparison. Exact value or value-range requires specifying all keys in the map, which is not convenient for large maps. For these reasons, objects should be modeled as List tuples if value based access is required.

. . . .

Speed-At-Scale with CDTs as Containers

CDTs unlock many advantages toward speed-at-scale when they are used as a container for a collection of objects.

Performance

Related objects stored in a CDT container can be written or retrieved together in a single operation, which can provide a significant throughput improvement.

Aerospike CDTs support the normal list and map operations, but more significant are the many common patterns requiring more complex processing. They are performed fully on the server side to eliminate retrieval of data to the client side, and therefore provide superior performance.

Additional performance aspects include:

  • Multi-element updates are akin to batch operations on a CDT. Applications can get the right semantics using the constructs provided for individual element failure handling.

  • Many selection criteria are available:

    • Access based on value and “rank” (value order) in addition to the normal index or key based access.

    • Single, multi, and range selectors by index, key, value, and rank.

    • Vicinity or relative selection, for example, using “relative rank” with respect to a value.

    • Negate the selection criterion with a convenient INVERT flag.

  • Multiple return types allow just the required data to be requested. For example, COUNT for the number of elements selected, and NONE when no values need to be returned such as when adding elements, among others.

  • CDT operations can be used in Expressions, which is another mechanism for efficient server side execution and minimizing the data transfer cost. Expressions allow server side record selection (filter expression), reading (read expression) or writing (write expression) a computed value.

You can review List performance analysis here and Map performance analysis here.

Scale

Each record in Aerospike incurs a 64 byte storage overhead in the primary index, which is typically stored in DRAM. For a large number of objects, the DRAM size and cost can be significant if each object is stored in a record. CDTs when used as containers can store multiple objects in a single record, providing greater density and scale.

Further, List provides a compact tuple form for storing objects which eliminates the bin overhead.

Ordering

Objects stored as records cannot be retrieved from the server in any specific order. Sorting must be performed on the client side.

On the other hand, a CDT supports identifying elements by different orders: index or key as well as value and rank order. For instance, Ordered List maintains a value-based sort order allowing easy modeling of common patterns like Leaderboard, time-ordered events, and ordered groups. Unordered List maintains insertion-order and can be used to store, for example, a Queue and objects in tuple form as described earlier. Note, the object should be stored as a tuple with the first field that is significant for value ordering. For example, type, id, and timestamp.

CDTs maintain internal indexes for fast access by key, index, or value.

Enforcing Unique Values

Ordered Lists provide a way to check and enforce uniqueness of values at the time of insertion.

Bin Limits

In Aerospike, bin names can be up to 14 characters long. Distinct bin names in a namespace are limited to 32K. Storing objects as Maps or Lists has no limit on the length or the number of distinct object field names.

Modeling Object Collections

When an object is stored as a record, storing a collection of objects is a matter of mapping it on one or more sets and one or more namespaces. Part 1 talks about some of the considerations in organizing records in sets and namespaces.

Multiple related objects can be stored in a single CDT container for access and storage efficiencies as discussed above. If the number of objects in the collection exceeds the record size limit, the collection must be split by some criteria across multiple records. The considerations in organizing records in sets and namespaces from Part 1 apply in this case. CDT containers work especially well for objects that will be stored or retrieved together.

When using the CDTs for collections, key design considerations include:

  • Direct access: When direct access to the object in a collection is needed, it can be achieved by proper object id design.

  • Query access: Multiple objects can be accessed by some criteria with a scan using filter expression, and/or a query using a secondary index.

How objects in the collection are stored - as a tuple in a List or as key-value pairs in a Map - affects direct access and query. Tuple or List object representation offers better support for both as described earlier.

Object ID Design for Direct Access

Objects stored in CDTs may need to be independently accessible. In this case, the object identifier must be designed for direct access.

To enable direct access using the object id, object ids should contain record id (key), say as a prefix. With this, the record key extracted from the object id can be used to navigate first to the record, and then to the specific object within the CDT. All objects stored in a record will contain the common record key. For example, if a CDT holds all store objects in a region, the record key can be the region id, which is also embedded in all store ids of the region.

Note, it is not necessary to aggregate objects in a record by a real world relationship. A subset of the hashed object id bits may be used as a record id. See this blog post for the details of this scheme.

Direct Access with List and Map Object Representation

A List object can be accessed within the container (List or Map) using the object id by value if the id is stored as the first field. A Map object is not easy to access in a List container because a Map does not allow wildcard based value comparison, for example, access by id like get-by-value(list, {“id”:”id1”, *}) is not possible.

Consider a List of Lists:

[   ["id1", 10, 11,], ["id2", 20, ], 
    ["id3", 30, ], ["id4", 10, 101]  ]

Direct access using id: get-by-value(outerList, ["id1", *])

Or a Map of Lists

{   "id1": ["id1", 10, 11,…], "id2": ["id2", 20, …], 
    "id3": ["id3", 30, …], "id4": ["id4", 10, 101…]  }`

Here it is not possible to directly access the object with id "id1" in the List container.

Or a Map of Maps:

{id1”: {“id”:"id1", “a”: 10, “b”: 11,}, “id2”: {id: id2’, “a”: 20,},   
    “id3”: {id: "id3", “a”: 30,}, “id4”: {id: "id4", “a': 10, “b”: 101…}  }

Direct access using id: get-by-key(outerMap, "id1")

Organizing Collections in Records

Due to the limit on the size of a record, a large object collection may need to be split across multiple CDTs, each stored in a separate record.

  • Single record collection: The record key represents the collection id.

  • Multi-record collection: Record keys are generated by appending the collection id with record-specific group ids. For example, the ticker for a stock can be organized in multiple records that have keys stock symbol (collection id) + date (group id).

Organizing Collections in Sets and Namespaces

All record organization concepts apply to records holding collections in CDTs as described in Part 1. For instance, a multi-record collection can be further organized if necessary in one or more dedicated sets, over one or more namespaces.

Querying Collections

A query or predicate based access to multiple objects is provided with these mechanisms:

  • In a single CDT: Internal indexes are maintained within a CDT that allow fast access to elements by specific keys (Map only), indexes, values, and ranks.

  • Across multiple records: A secondary index can be defined on List values, Map keys, and Map values, to allow queries across List or Map bins in multiple records. The secondary index efficiently identifies all matching records, and with appropriate CDT operation, specific CDT elements from each record can be retrieved. (Note indexing at any CDT level is planned for Aerospike Database 6.1. Prior to 6.1, CDT elements below the top level of the CDT can be replicated for indexing in a separate record bin or at the top level of the CDT.)

A filter expression can be specified in an operation so that the operation is applied only to matching records. A filter expression can be defined using the CDTs in the record. For example, a filter expression can select a record if a CDT bin has a nested List larger than a certain size or a certain value exists in a nested Map element.

As discussed earlier, it is easy to define value based predicates on List tuples using the wildcard, NIL, and INF values. Therefore, to be able to use filter expressions with value based predicates, use tuple representation. Note, the value based predicates cannot use arbitrary fields in the object tuple, only the first field.

Managing Temporary Objects

The record-level metadata such as the expiration time and update time are not applicable to individual objects within the CDTs, if they can be updated or expired independently. In such cases, these mechanisms must be implemented on a per-object basis. It is possible to piggyback such housekeeping expiration of objects on other regular application operations using a multi-op request and CDT delete operations based on the rank or value range (“delete all elements with expiration-time field less than current time”).

Index and rank based selection also allows one to keep the size of a CDT capped to a specified maximum.

Integration with Performance Features

CDTs are integrated into and benefit from the other Aerospike performance features discussed in Part 1 including:

Sets and set indexes

You can organize a collection efficiently in one or more sets, and define set indexes for scan performance.

Multi-op requests

You can perform multiple operations on a record in a single request. For instance, in the same request, you can add items to a list, sort it, get its new size, and top N items.

Batch requests

You can conveniently issue one request for operations on multiple records hosting a collection.

. . . .

Transactions and Consistency

Data stored in a CDT can be updated in a single transaction because all single record operations support transactional semantics in Aerospike. Different parts within the CDT can be updated or retrieved atomically and efficiently using a single multi-op request.

Maximum Record Size

A namespace is configured for a maximum record size, with the upper limit of 8MB, and represents the unit of transfer in a device IO operation. Record data cannot exceed the configured maximum record size, and is an important consideration for large object as well as multi-object record design. The application design may consider workarounds such as a multi-record map.

. . . .

Examples

The following examples illustrate the use of CDTs.

Events Data for Real-Time Applications

Event objects can be stored for access by event-id as well as timestamps by storing them in a Map container with each event object stored as a tuple in a List with timestamp as the first field.

{ event-id: [timestamp, other event attributes], … }

Event-id based access is a simple key access in the Map container. You can retrieve the event that is at or closest to a timestamp with:

get-by-relative-rank(map, value=timestamp, relative-rank=-1, count=2)

The above operation returns two elements in the events container Map with values starting just prior to the timestamp indicated by relative-rank of -1, and the one after that, which could be the exact timestamp or the one immediately after that. This scheme takes care of the timestamp not present because it is beyond the range of existing timestamps as well as there is no event at that exact timestamp.

The size of the container can be efficiently managed with remove-by-index-range (trim to specific size) or remove-by-value-range (remove old events) operations.

Rank Ordered Lists

Lists that need to be retrieved by rank such as players with highest game scores, blogs with highest views, and videos with most likes can be conveniently modeled with an Ordered List container with each object stored as a tuple in an Unordered (application ordered) List.

[ [score, other attributes], … ]

To obtain N objects with the highest scores:

get-by-rank-range(list, start-rank=-N)

In the above operation, the start-rank indicates the element with the Nth highest score. All elements after that are returned. The operation effectively returns the top N ranked elements.

Efficient Container Computations

Expressions allow operations involving one or more large containers in a record to be performed efficiently on the server side. For example, the top N elements common to two Lists A and B in a record can be computed on the server side with:

get-by-rank-range( list=remove-by-value-list( list=A, value-list=get-list(B), op-flag=INVERT ), start-rank=-N )

The first server computation is A-(A- B) which yields common elements in A and B. Note, op-flag=INVERT has the effect of inverting the selection, so in this case instead of A-B, it would return the elements in A that are not in A-B, or A-(A-B). The second computation returns the top N elements from the common list.

. . . .

Object Mapping Libraries

The following libraries are available for applications to use:

. . . .

Summary

Aerospike Database is built for speed at scale, and provides a path to companies of any size to leverage large data for real-time decisions without incurring huge operational cost, and to scale in the future. The blog post describes the Collection Data Types (CDTs), how they can be used to model objects for speed at scale, and their capabilities like ordering and server-side execution to improve performance and ease of implementation.