Blog

Modern database management in distributed systems

What are the hallmarks of a distributed database system operating at optimal performance? Aerospike’s CTO and Founder provides his breakdown.

December 13, 2023 | 6 min read

Alexander Patino

Solutions Content Leader

When managing a distributed database, it’s crucial to start by outlining the mission-critical essentials. Who doesn’t want every transaction, every read, and every write – to be in real-time? For many, the real-time factor is what counts most. So, what are the areas to concentrate on when high throughput and low latency are non-negotiables? In his webinar, “Four essentials of a modern, distributed database,” Srini Srinivasan, Aerospike’s CTO and Founder, outlines his core database management principles.”

Before taking a look at these four principles, let’s define a few concepts.

What is database management?

Database management is the organization, storage, and retrieval of data. It involves the design, implementation, and support of stored data to get the most value from it. One specific database management system (DBMS) type is a distributed database management system> (DDBMS). In this case, a database places data over various cluster nodes in varying locations, connecting all to one shared network.

However, operating a distributed database does come with its own set of challenges. With data spread out across multiple nodes in varying geographical locations, complexity naturally comes into the mix, and finding a balance between consistency, availability, and partition tolerance – the basis of the CAP theorem – becomes a prime objective.

Database management in distributed systems

Here’s how Srini breaks down the four essentials for a fully operational, real-time DDBMS.

Scaling out

In the simplest of terms, scaling out a database, or horizontal scaling, is when one provisions a cluster with extra nodes. “In Aerospike, we have tackled this problem by building out a shared-nothing database cluster,” says Srini. “Such a cluster essentially consists of a number of nodes, and every node is identical in terms of its capabilities – the amounts of storage, CPU, and networking capacity available in the particular node. It is also identical in terms of the code that runs in it, so there is no single master. Essentially, it’s a distributed system with every node being identical and able to perform various tasks required to run such a distributed database. Additionally, because it has no single point of failure, the system can run with no hotspots.”

In the session, Srini delves further into dynamic cluster management and the Aerospike Smart Client. This latter feature provides automatic load balancing and distributes data and traffic to all of the nodes in a cluster.

Scaling up

“Scaling up is the ability of the system to take advantage of every aspect of processing storage and networking capacity available within a node,” says Srini in his presentation. He posits that the key to scaling up, or vertical scaling, is leveraging solid-state drives (SSDs). SSDs are modern computer storage that use flash memory to store persistent data. They’ve become essential for real-time data access with sub-millisecond latency across reads and writes.“

Just storing all of the data in SSDs increases the amount of data that is available for real-time access per node by 10x, or even more,” he says. He continues to explain how this database management strategy sidesteps any reliance on a page cache, increasing the data accessibility per node considerably.

Aerospike is uniquely able to scale up better than virtually all real-time distributed systems for two reasons. The first is our Hybrid Memory Architecture, which provides the flexibility of managing data and indexes all in-memory or with indexes in-memory, with data stored in SSDs, but with DRAM-like performance. The latter is our efficiency on CPUs being multi-threaded, which enables more scale-up headroom.

Strong consistency and high availability

In distributed database systems, strong consistency is the property that ensures that all data across all nodes are synched up to reflect the most up-to-date information. In short, each read operation from any node should always generate the latest write.

Srini brings up strong consistency as a tentpole feature for optimized distributed transactions, highlighting how Aerospike uses a roster-based strong consistency scheme to not compromise on availability, based on the CAP theorem that states that distributed data can maximize on only two out of these three essentials: consistency, availability, and partition tolerance. Our documentation defines a roster of nodes as “a list of nodes that are part of the cluster in a steady state. When all the roster nodes are present and all the partitions are current, the cluster is in its steady state and provides optimal performance. In the case of strong consistency, these partitions are referred to as roster-master and roster-replica.”

Srini drills further into these concepts, including how Aerospike performs synchronous write logic and how we support rack awareness for high availability. He also clearly spells out how the Aerospike roster-based approach to strong consistency saves costs over Raft systems. For a deeper dive into consistency, look at our white paper, “Exploring data consistency in Aerospike Enterprise Edition.”

Geo-distribution for active-active systems

Finally, Srini expands on real-time active-active solutions for multi-site clusters. Applications that run global transactions need a geographically distributed database that can run strongly consistent transactions at scale, and synchronous active-active replication helps achieve this with no data loss. “If an entire rack goes away,” Srini explains in an example using three globally distributed racks, “or it’s in split-brain mode or down for maintenance, you can have two racks that can continue to run with both availability and strong consistency. There are no conflicts in the system.” In other words, a rack can go down and rejoin without any operative intervention.

Learn more about Aerospike Cross Datacenter Replication (XDR), which provides dynamic control for asynchronous data replication across geographically separated clusters in our solution brief on replication for global data hubs.

Optimize distributed database management workflows with Aerospike

Striking the right balance between consistency, availability, and partition tolerance in a distributed system requires keen database management and observability. Failure to appropriately equip a distributed database with horizontal and vertical scaling as required, along with strong consistency and geo-distribution for active-active systems to enhance distributed transactions, may lead to adverse impacts on your applications. In such instances, critical service level agreements (SLAs) may remain unmet, affecting the performance and overall reliability of your system.

Ready to set up and monitor multi-tenant, multi-cluster deployments? Explore Aerospike’s observability and management stack for simplifying configurations, diagnosing stop-writes, isolating failure states in complex deployments, and more. Read our blog, “Manage what matters with the new Aerospike Observability Stack,” for more in-depth details.