Aerospike 4.5: Persistent Memory and Compression
Aerospike is pleased to announce the availability of our 4.5 release.Aerospike Enterprise Edition 4.5 is the first commercially available open database supporting new Intel® Optane™ DC persistent memory, based on Intel® 3D XPoint™, a new class of storage technology architected specifically for data-intensive applications that require extremely low-latency, high durability, and strong data consistency.
Aerospike’s open database architecture means that it natively supports a wide variety of enterprise application environments, supporting Java, .NET, Node.js, Python, Go and other commonly used programming languages.
We expect Intel OptaneDC persistent memory to be used with Aerospike’s Strong Consistency, with a positive Jepsen Report, made available in March 2018 and now deployed in production with multiple demanding customers.Intel Optane DC persistent memory is currently available in beta from multiple cloud service providers and hardware vendors, giving companies the flexibility to run the Aerospike database in the cloud, on premises or both. If you wish to download 4.5 and begin testing it on Intel’s platform, contact your preferred OEM or cloud service vendor to find out how to become part of their beta program.
In contrast to in-memory datastores which simply map Intel Optane DC persistent memory to existing in-memory data structures, Aerospike has used its heritage as a database optimized for NAND to provide novel support for Intel Optane DC persistent memory. Our tiered architecture, using this technology’s AppDirect mode, allows multiple deployment environments, including using it as the primary key index layer, where its high performance and parallelism work best. By using a persistence layer for indexes, full restarts of Aerospike are possible without primary index rebuilds.
While we have discussed Intel 3D XPoint previously, a bit of background may be warranted. Intel Optane DC persistent memory uses Intel 3D XPoint technology. Instead of circuits used in DRAM and NAND, this technology uses a novel mechanism of storage. While Intel has stated only that the technology uses “bulk material properties”, a form of resistive or phase-change memory, has byte-addressable underlying hardware for both read and write, and the persistence and power consumption of NAND. Due to byte-addressability, no garbage collection is required at the device level.
Intel Optane DC persistent memory stands in contrast to Intel’s Optane SSD DC P4800X Series NVMe drive, which is also based on Intel 3D XPoint technology. Aerospike has extensively tested Intel Optane DC P4800X Series, and it is the drive of choice when write loads are high. In 2018, Intel shipped 1.5TB U.2 and AIC units, providing density levels similar to low latency NAND implementations. However, Intel Optane DC P4800X Series is limited by the speed and interface of NVMe, where Intel Optane DC persistent memory is directly connected to an Intel CPU through memory protocols (DDR4 pin compatible), unlocking the true potential of 3D XPoint technology.
Intel Optane DC persistent memory Performance. In our testing with early hardware, we have observed minimal performance difference when using Intel Optane DC persistent memory for indexes, compared to DRAM instances. As we expect Intel Optane DC persistent memory to be cheaper and higher density than DRAM, it is possible that it is used extensively, for it presents little to limited downtime compared to traditional high-DRAM database servers, and depending on the use case.
In our early testing with engineering sample hardware, we have found typical multi-million transaction per second per server read, write, and mixed read-write workloads degrade only by a few percent, with the worst case being pure insert workload, which can degrade by 10%.This exceptional performance – using a Hybrid installation with NAND storage and Intel Optane DC persistent memory indexes – shows the power of Aerospike’s unique optimizations.
Compression. With Aerospike 4.5, in-database compression is supported. More and more, applications have been written to use Aerospike’s high level List and Map data structures. Use of these larger and complex objects – similar to a document store – results in larger rows which can benefit greatly from row-based compression. Many of our deployments serialize and compress objects at the application level, which prevents use of Aerospike’s native List and Map operations. By switching to using List and Map, performance improves, but compression is needed at the database layer.
Compression is very simple to use. With this release, three different compression mechanisms can be enabled ( Snappy, LZ4, and Zstandard ). Compression is enabled on a real-time basis, with a per-namespace setting which determines how rows are written. If the targeted compression mechanism results in savings, the compressed data is written to storage. When data is read from storage, an in-storage byte shows which compression mechanism was used, allowing that mechanism to be applied.
Compression is highly input dependent, so we strongly suggest you test with your data. Based on a sample including customer data, we ran some numbers, and found classic tradeoffs between compression ratio and reduction in throughput.
Compression Method | Insert kTPS | Read kTPS | Compression Ratio |
None | 558 (baseline) | 1,617 (baseline) | – |
LZ4 | 535 (-4%) | 1,527 (-6%) | 0.659 |
Snappy | 487 (-13%) | 1,501 (-7%) | 0.507 |
Zstd ( level 1 ) | 496 (-11%) | 1,510 (-7%) | 0.379 |
Zstd ( level 9 ) | 139 (-75%) | 1,476 (-9%) | 0.361 |
With this level of dynamic switching, we recommend running with your production data using different settings to trial the impact of each compressor. This will allow you to determine both the CPU impact and the positive storage impact — different input data may result in very different compression ratios.In order to compress all data on a server — not just data as it is being written — it will be necessary to perform a maintenance cycle where each server is removed from the cluster, its storage is cleared, and it is brought back into the cluster empty. As the data is migrated into the server, it will be written with the configured compression mechanism. Once data is written compressed, those data volumes will not be readable by prior versions of Aerospike, so please be sure to appropriately use Aerospike’s backup tools if downgrading.
Aerospike has had an excellent 2018 engineering year. To read about features added this year, please see our blog posts on 4.0, 4.1, 4.2, 4.3, and 4.4.4.5 Release Notes