Certifying Flash Devices (SSDs)
The Aerospike database is optimized to run on Flash and SSD devices, and is capable - through Hybrid Memory index use, and direct device access - of providing high throughput and low latency on Flash.
Over the years, we have found the exact characteristics of different Flash devices are of crucial interest to Aerospike customers. The most important device characteristics are not captured by any specification sheet - the performance of read latency under sustained write load.
Of course, Aerospike can be run in a variety of other configurations, including pure in-memory mode or memory backed with devices for persistence.
For this purpose, we wrote the Aerospike Certification Tool (ACT), and started gathering and publishing results. The source code is available on GitHub, allowing device manufacturers to include ACT in their internal engineering. Both the justification and source code remain unchanged.
This page transmits the results we have observed and collected. We update this page with new drives, we archive information for drives that are no longer available. With these results, you will have a very specific result to a question of "whether this drive will work in my environment" - although your environment might be different from the data points we have collected.
A profile has three fundamental characteristics:
- Read / write ratio
- Object size
- Latency requirements of read operations (detailed below)
The test is run by increasing throughput and determining whether the latency requirement is met --- and increasing throughput again, if the device is within latency requirements, until the latency SLA is no longer met.
Be careful using these results if your intended workload varies from a published workload. We make available the tools to allow you to test your intended workload, and testing may be required in your individual case.
Finally, we have also found the results themselves are only a starting point for making a purchase decision.
Two fundamental factors must be considered: the price of the device, and the wear rating (DWPD).
A device that wears fast may not be a bargain in a high-write environment.
A slow but very inexpensive device may outperform a faster but more expensive device --- by purchasing more drives. As individuals have different available discounts, you may need to explore pricing of different drives before making a final decision.
An important calculation must be made: speed at a given capacity. It should not be assumed that a larger drive is faster. In general more Flash capacity should result in a faster drive, but internal controller and datapath bottlenecks will present themselves. For many manufacturers, 3.2TB drives are the "sweet spot" of performance, but we are starting to see 7.0TB and larger drives be the maximum performance per price ratio. We have performance measurements for some drives in different capacities.
While we make our raw data and findings available, Aerospike's solutions architects are available to help you size your system. With our years of experience helping deploy clusters both large and small, we'll help you through the process of choosing your next set of hardware.
Guide to these numbersā
Aerospike allows manufacturer-supplied numbers. In those cases, we receive log files from a manufacturer, and do our best to validate. The cases where we have received values from a manufacturer, we will detail the configuration information they have supplied, and are often also supplied drives, which we are in process of testing. In the appendix, we will detail what the manufacturer has told us about test environment.
Aerospike updates ACT periodically to track changes in Aerospike Database storage I/O algorithms. The core case, properly configured, produces very similar results regardless of version. However, the version used is included in test results for completeness. The current version of ACT is 6.4, released in November 2023.
Values for endurance are taken from manufacturer's spec sheets and have not been independently verified.
Operational Database Devicesā
Aerospike is often used as an operational database for transactions and user data.
This ACT workload has the following characteristics:
- 66 % reads to 33 % writes ratio
- 1.5KB object size
- Max latency
- < 5.0% exceed 1 ms
- < 1.0% exceed 8 ms
- < 0.1% exceed 64 ms
The tables below show the results from the Aerospike Certification Tool (ACT) on some popular flash devices that we have tested and that our customers are using in production with the Aerospike database. The test results are split into 6 different categories:
- PCIe/NVMe-based 3D XPoint
- PCIe/NVMe-based flash
- SATA/SAS-based flash
- M.2-based flash
- Networked storage
- Cloud-based flash
- Historical Flash Devices
To read the results of the test, the devices are run at a constant rate over a period of 24 hours. This is done to remove the effect of any caching and represents the long-term (rather than burst) performance of the disk. The steady state results - which represent the latency histogram after a number of hours and where the results appear to be stable after a number of hours, appear below.
Each column represents a time threshold in milliseconds. The numbers beneath are the percentage of transactions that exceed that threshold. For instance, in the table below, the Intel DC s3710 had 1.6% of transactions in excess of 1 millisecond.
These criteria may be relaxed by users depending on the need. In some cases, a manufacturer may have sacrificed a little performance to achieve greater consistency.
Your results may vary from these published numbers due to differences in the server, differences in the RAID controller, or even variations between devices of the same model.
PCIe/NVMe-based 3D XPointā
3DXpoint is a next-generation memory technology that is positioned between DRAM and flash. Practical devices have been brought to market by Intel under the Optane brand. These devices should be considered when low latency under sustained write load is required, as average latency is substantially lower under write load.
While transactions per second are high, the extraordinary low latency can easily be compared to NAND Flash drives below.
These devices were tested at the specified speed with a 67% read/33% write ratio of 1.5 KB objects over 24 hours.
3D XPoint NVMe Device | PCIe | Trans/sec | >256Āµs | >512Āµs | >1ms | >8ms | >64ms | Endurance | ACT | Source |
---|---|---|---|---|---|---|---|---|---|---|
Intel P5800X 1.6 TB | 4.0 | 2,010,000 | 11.11% | 0.29% | 0.00% | 0.00% | 0.00% | 100 DWPD | 6.1 | Intel |
Micron X100 750 GB | 3.0 | 1,400,000 | 0.00% | 0.00% | 0.00% | 100 DWPD | 6.1 | Micron | ||
Intel SSD DC P4800X 375G | 3.0 | 435,000 | 0.10% | 0.01% | 0.00% | 30 DWPD | 3.1 | Aerospike | ||
Intel SSD DC P4800X 750G | 3.0 | 435,000 | 0.06% | 0.00% | 0.00% | 30 DWPD | 4 | Aerospike |
PCIe/NVMe-Based Flashā
These devices were tested at the specified speed with a 67% read/33% write ratio of 1.5 KB objects over 24 hours.
Flash Device | Speed (tps) | >1ms | >8ms | >64ms | Endurance | Notes* | ACT | Source |
---|---|---|---|---|---|---|---|---|
Smart IOPS Data Engine 12.8TB | 1,080,000 | 3.94% | 0.00% | 0.00% | 3 DWPD | 6.1 | Aerospike | |
Smart IOPS DataEngine 6.4 TB | 825,000 | 0.30% | 0.00% | 0.00% | 3 DWPD | 3 | Smart IOPS | |
ScaleFlux CSD 3000 7.68 TB | 801,000 | 3.29% | 0.03% | 0.00% | 2 DWPD | CP=40 | 6.2 | ScaleFlux |
Micron 9400 Max 12.8 TB | 750,000 | 4.30% | 0.00% | 0.000% | 3 DWPD | 6.2 | Micron | |
Smart IOPS DataEngine 3.2 TB | 630,000 | 1.50% | 0.01% | 0.00% | 3 DWPD | 6.2 | Smart IOPS | |
WD Ultrastar DC SN840 3.2 TB | 570,000 | 2.29% | 0.00% | 0.00% | 3 DWPD | 6.1 | Western Digital | |
ScaleFlux CSD 2000 3.2 TB | 531,000 | 4.82% | 0.07% | 0.00% | 5 DWPD | CP=50 | 5.3 | Aerospike |
Micron 9300 Max 6.4 TB | 525,000 | 4.45% | 0.06% | 0.005% | 3 DWPD | 5.2 | Micron | |
WD Ultrastar DC SN655 15.36 TB | 516,000 | 4.96% | 0.00% | 0.00% | 1 DWPD | 6.4 | Western Digital | |
Intel P5510 7.68 TB | 480,000 | 4.65% | 0.02% | 0.00% | 1 DWPD | OP=10 | 6.2 | Aerospike |
Kioxia CM6-v 3.2 TB | 480,000 | 4.94% | 0.00% | 0.00% | 3 DWPD | 6.1 | Aerospike | |
WD SN650 15.36 TB | 465,000 | 4.48% | 0.00% | 0.00% | 1 DWPD | 6.2 | Western Digital | |
Kioxia CM6-v 1.6 TB | 420,000 | 5.00% | 0.00% | 0.00% | 3 DWPD | 6.1 | Aerospike | |
Micron 9300 Max 3.2 TB | 354,000 | 4.86% | 0.19% | 0.000 | 3 DWPD | 5.2 | Aerospike | |
Huawei ES3600P V5 3.2 TB | 384,000 | 2.72% | 0.02% | 0.00% | 3 DWPD | 4 | Aerospike | |
WD Ultrastar DC SN200 3.2 TB | 336,000 | 3.14% | 0.00% | 0.00% | 3 DWPD | 5.2 | Aerospike | |
ScaleFlux CSS1000 6.4 TB | 324,000 | 4.12% | 0.00% | 0.00% | 5 DWPD | 3 | Aerospike | |
Micron 9200 Max 1.6 TB | 316,500 | 4.64% | 0.13% | 0.00% | 3 DWPD | 3 | Aerospike | |
Intel P4610 1.6 TB | 300,000 | 4.56% | 0.85% | 0.00% | 3 DWPD | 4 | Aerospike | |
Intel P4610 6.4 TB | 300,000 | 4.98% | 0.14% | 0.00% | 3 DWPD | 4 | Aerospike | |
ScaleFlux CSS1000 3.2 TB | 300,000 | 4.59% | 0.00% | 0.00% | 5 DWPD | 4 | ScaleFlux | |
Micron 9200 Pro 3.84 TB | 283,500 | 4.54% | 0.13% | 0.00% | 1 DWPD | 3 | Micron | |
WD Ultrastar DC SN640 3.2 TB | 240,000 | 4.77% | 0.00% | 0.00% | 2 DWPD | 6.1 | WD | |
Huawei ES3600P V5 1.6 TB | 180,000 | 4.47% | 0.25% | 0.00% | 3 DWPD | 4 | Aerospike | |
Toshiba CM5 3.2 TB | 150,000 | 3.70% | 0.00% | 0.00% | 3 DWPD | 3 | Toshiba | |
Toshiba PX04PMB320 3.2 TB | 135,500 | 4.73% | 0.00% | 0.00% | 10 DWPD | 3 | Toshiba | |
Intel P4510 1 TB | 120,000 | 4.31% | 0.00% | 0.00% | 1 DWPD | 4 | Aerospike | |
Samsung PM983 1.92 TB | 108,000 | 0.09% | 0.00% | 0.00% | 1.3 DWPD | 3.1 | Aerospike |
* OP=n means the SSD was over-provisioned by n%.
** CP=n means the ACT compress-pct parameter was set to n%, which causes record data to be compressible (by default record data is incompressible). Some SSDs can compress/decompress data on-the-fly as it is being written/read.
SATA/SAS-Based Flashā
The flash device sizes listed below are after any over-provisioning that was done to the drive (by setting Host Protected Area using hdparm).
These devices were tested at the specified speed with a 67% read/33% write ratio of 1.5 KB objects over 24 hours. Any flash devices marked in red are not recommended.
Flash Device | Speed (tps) | >1ms | >8ms | >64ms | Endurance | ACT | Source |
---|---|---|---|---|---|---|---|
Micron 5300 MAX 3.8 TB | 30,000 | 4.93 | 0.02% | 0.0% | 3.5 DWPD | 5.2 | Micron |
Micron 5300 PRO 7.6 TB | 33,000 | 4.55 | 0.06% | 0.0% | 1.5 DWPD | 5.2 | Micron |
M.2-Based Flashā
M.2 is a standard for SSDs that was introduced in 2013. M.2's small form factor allows for a much higher density of storage at a more moderate price.
These devices were tested at the specified speed with a 67% read/33% write ratio of 1.5 KB objects over 24 hours.
Flash Device | Speed (tps) | >1ms | >8ms | >64ms |
---|---|---|---|---|
LiteOn EP1-KB480 (OP to 400 GB) | 9,000 | 4.05% | 0.01% | 0.00% |
Networked Storageā
Storage technology is continually evolving to provide innovative solutions for the demands of databases like Aerospike. Networked (or disaggregated) storage utilizes network protocols instead of dedicated I/O channels to transport data between the host and storage device. Modern protocols such as NVMe and fast networks like 100Gbit ethernet can provide comparable throughput and latency.
This section is dedicated to providing ACT results for networked storage solutions. Depending on the use case and SLA requirements these solutions may provide the right performance, efficiency, and cost for your deployment.
Storage Solution | Speed (tps) | >1ms | >8ms | >64ms | Device | Protocol | ACT | Source |
---|---|---|---|---|---|---|---|---|
Western Digital | 1,158,000 | 4.49% | 0.00% | 0.00% | OpenFlex F-Series | NVMe-oF | 5.3 | Western Digital |
Drivescale Composer | 300,000 | 3.85% | 0.01% | 0.00% | HGST Ultrastar SN200 3.2 TB | iSCSI | 4 | Drivescale |
Drivescale Composer | 300,000 | 2.60% | 0.00% | 0.00% | HGST Ultrastar SN200 3.2 TB | RoCEv2 | 4 | Drivescale |
Toshiba Kumoscale | 150,000 | 3.97% | 0.01% | 0.00% | Toshiba CM5 3.2 TB | RoCEv2 | 3.1 | Toshiba |
Cloud-Based Flashā
Cloud instances with attached NVMe flash devices have become common. This section includes a representative sample of ACT results performed by Aerospike on the infrastructure of major cloud providers. Although the methodology is the same, the results need to be interpreted differently. Cloud instances are subject to additional sources of variability not found in a lab setting with fixed hardware. These include "noisy neighbors" (unrelated instances running on the same physical server) and network congestion (again involving unrelated traffic).
Another source of variability across ACT runs is that the hardware may be slightly different across instances in the same class, and NVMe devices in particular may be newer or older, have different firmware revisions, or possibly be different models.
At a minimum, ACT results should be interpreted more conservatively: results from a given run may be better or worse than the long-term average, and that average may not be stationary. To the extent that time and budget allow, averaging multiple runs (possibly at different times of day or across regions) will produce a better estimate.
The results below include tests performed on both single and multiple flash devices. The Storage column shows the number of devices used by ACT, which may be less than the number available on that instance. Multiple flash devices attached to an instance are useful for measuring the linearity of throughput. The recorded times are for the best throughput.
Except as noted, these devices were tested at the specified speed with a 67% read/33% write ratio of 1.5 KB objects over 24 hours.
Amazon Web Services (AWS) Instancesā
Instance Name | Sole Tenant | Speed (tps) | >1ms | >8ms | >64ms | ACT | Storage |
---|---|---|---|---|---|---|---|
c5d.24xlarge | N | 564,000 | 4.25% | 0.00% | 0.00% | V6.1 | 4 x 900 |
r5d.24xlarge | N | 552,000 | 3.23% | 0.00% | 0.00% | V6.1 | 4 x 900 |
m5d.24xlarge | N | 432,000 | 1.67% | 0.00% | 0.00% | V6.1 | 4 x 900 |
r5d.4xlarge | N | 354,000 | 0.38% | 0.21% | 0.00% | V6.1 | 2 x 300 |
i3en.2xlarge | N | 300,000 | 4.93% | 0.06% | 0.00% | V6.1 | 2 x 2500 |
c5d.24xlarge | N | 282,000 | 1.59% | 0.00% | 0.00% | V6.1 | 2 x 900 |
r5d.24xlarge | N | 276,000 | 1.38% | 0.00% | 0.00% | V6.1 | 2 x 900 |
m5d.24xlarge | N | 216,000 | 5.33% | 0.00% | 0.00% | V6.1 | 2 x 900 |
c5d.4xlarge | N | 207,000 | 8.36% | 0.07% | 0.00% | V6.1 | 1 x 400 |
r5d.4xlarge | N | 177,000 | 7.25% | 0.00% | 0.00% | V6.1 | 1 x 300 |
m5ad.4xlarge | N | 174,000 | 4.63% | 0.00% | 0.00% | V6.0 | 1 x 300 |
r5ad.4xlarge | N | 171,000 | 0.81% | 0.00% | 0.00% | V6.0 | 1 x 300 |
i3en.12xlarge | N | 162,000 | 3.46% | 0.00% | 0.00% | V6.1 | 1 x 7500 |
i3en.2xlarge | N | 150,000 | 2.21% | 0.03% | 0.00% | V6.1 | 1 x 2500 |
c5d.24xlarge | N | 141,000 | 4.78% | 0.00% | 0.00% | V6.1 | 1 x 900 |
r5d.24xlarge | N | 138,000 | 4.95% | 0.00% | 0.00% | V6.1 | 1 x 900 |
r5ad.24xlarge* | N | 132,000 | 4.80% | 0.00% | 0.00% | V6.0 | 1 x 900 |
m5ad.24xlarge | N | 129,000 | 4.62% | 0.00% | 0.00% | V6.0 | 1 x 900 |
c5d.18xlarge | N | 114,000 | 2.26% | 0.00% | 0.00% | V6.0 | 1 x 900 |
m5d.24xlarge | N | 108,000 | 12.47% | 0.00% | 0.00% | V6.1 | 1 x 900 |
c5d.9xlarge | N | 90,000 | 1.30% | 0.00% | 0.00% | V6.0 | 1 x 900 |
r5d.12xlarge | N | 90,000 | 0.98% | 0.00% | 0.00% | 1 x 900 | |
m5d.4xlarge | N | 57,000 | 0.22% | 0.00% | 0.00% | V6.1 | 1 x 300 |
i3.8xlarge | N | 33,000 | 4.55% | 0.00% | 0.00% | V6.1 | 1 x 1900 |
i3.2xlarge | N | 12,000 | 1.24% | 0.00% | 0.00% | V6.0 | 1 x 1900 |
i3.metal | N | 12,000 | 1.06% | 0.00% | 0.00% | V6.0 | 1 x 1900 |
* Test duration was 12 hours.
Cloud-based flash devices show great variability. We recommend that you test each new instance. The best numbers tested for a single flash device are provided above.
Google Cloud Platform (GCP) Instances**ā
Instance Name | Sole Tenant | Speed (tps) | >1ms | >8ms | >64ms | ACT | Storage |
---|---|---|---|---|---|---|---|
n2-standard-80 | Y | 6,408,000 | 0.62% | 0.03% | 0.00% | V6.1 | 24 x 375 |
n2-standard-80 | Y | 4,272,000 | 1.02% | 0.03% | 0.00% | V6.1 | 16 x 375 |
n2-standard-80 | Y | 2,136,000 | 3.27% | 0.03% | 0.00% | V6.1 | 8 x 375 |
n2-standard-80 | Y | 1,308,000 | 4.96% | 0.45% | 0.05% | V6.1 | 4 x 375 |
n2-standard-80 | N | 1,068,000 | 0.17% | 0.00% | 0.00% | V6.1 | 4 x 375 |
n2-standard-80 | N | 801,000 | 0.10% | 0.00% | 0.00% | V6.1 | 3 x 375 |
n2-standard-16 | N | 768,000 | 1.87% | 0.02% | 0.00% | V6.1 | 4 x 375 |
n2-standard-80 | N | 534,000 | 2.57% | 0.00% | 0.00% | V6.1 | 2 x 375 |
n2-standard-16 | N | 384,000 | 3.11% | 0.01% | 0.00% | V6.1 | 2 x 375 |
n2-standard-80 | N | 267,000 | 4.62% | 0.01% | 0.00% | V6.1 | 1 x 375 |
n2-standard-64 | N | 267,000 | 4.78% | 0.01% | 0.00% | V6.1 | 1 x 375 |
n2-standard-32 | N | 192,000 | 1.48% | 0.03% | 0.00% | V6.1 | 1 x 375 |
n2-standard-16 | N | 192,000 | 7.26% | 0.00% | 0.00% | V6.1 | 1 x 375 |
** Google local SSDs are independent of instance type. These results are 'gen2' drives which are available primarily on Intel Cascade Lake or later, with linear performance gain through 4 devices per instance.
Microsoft Azure Instancesā
Instance Name | Sole Tenant | Speed (tps) | >1ms | >8ms | >64ms | ACT | Storage |
---|---|---|---|---|---|---|---|
L16s v2 | N | 15,000 | 3.37% | 0.10% | 0.00% | V5.0 | 1 x 160 |
Historical Flash Devicesā
The devices listed below are older (deemed "historical"). Therefore, we don't recommend they be used.
SATA/SAS-Based Flash: The flash device sizes listed in the table below are after any overprovisioning that was done to the drive (by setting Host Protected Area using hdparm). These devices were tested at the specified speed with a 67% read/33% write ratio of 1.5 KB objects over 24 hours.
Flash Device | Speed (tps) | >1ms | >8ms | >64ms |
---|---|---|---|---|
Intel DC s3700 + OP (318GB) | 18,000 | 1.60% | 0.00% | 0.00% |
Samsung 843T + OP (370GB) | 9,000 | 2.31% | 0.00% | 0.00% |
Micron M500DC 480 GB + OP (300GB) | 9,000 | 2.95% | 0.20% | 0.02% |
Seagate 600 Pro + OP (240GB) | 9,000 | 5.52% | 0.00% | 0.00% |
Intel DC s3500 + OP (240GB) | 9,000 | 8.12% | 0.00% | 0.00% |
Micron P400e (200 GB) | 9,000 | 12.30% | 4.86% | 3.89% |
PCIe/NVMe-Based Flash: The performance numbers below are to the highest levels passed. None of these drives were over-provisioned. These devices were tested at the specified speed with a 67% read/33% write ratio of 1.5 KB objects over 24 hours.
Flash Device | Speed (tps) | >1ms | >8ms | >64ms |
---|---|---|---|---|
Micron P320h 700GB | 450,000 | 3.32% | 0.04% | 0.00% |
Intel P3700 400 GB * | 210,000 | 2.20% | 0.18% | 0.00% |
Samsung SM1715 3.2 TB | 192,000 | 3.68% | 0.00% | 0.00% |
Micron P420m 1400 GB | 96,000 | 3.21% | 0.00% | 0.00% |
Intel P3608 | 84,000 | 4.37% | 0.00% | 0.00% |
Huawei es3000 2400 GB | 72,000 | 1.18% | 0.00% | 0.00% |
* These tests were run on a Dell R720 with dual E5-2690v2 @ 3GHz using RedHat 6.5 kernel 2.6.0.
Cloud-Based Flash: The performance of devices on cloud instances that might not be available for provisioning any longer, or have better alternatives now.
Cloud Provider | Instance Name | Speed (tps) | >1ms | >8ms | >64ms |
---|---|---|---|---|---|
Azure | L8s* | 30,000 | 1.26% | 0.05% | 0.00% |
Azure | GS2 | 30,000 | 1.30% | 0.21% | 0.13% |
Amazon | r3.4xlarge | 18,000 | 3.74% | 0.01% | 0.00% |
Amazon | m3.xlarge (HVM) | 18,000 | 2.49% | 0.00% | 0.00% |
Amazon | r3.2xlarge | 15,000 | 4.71% | 0.10% | 0.02% |
Amazon | r3.xlarge | 15,000 | 4.90% | 0.45% | 0.35% |
Amazon | r3.large | 12,000 | 4.12% | 0.24% | 0.00% |
Amazon | c3.2xlarge | 9,000 | 2.10% | 0.10% | 0.01% |
Amazon | m3.xlarge | 9,000 | 4.14% | 0.23% | 0.01% |
Rackspace | Performance 1 | 9,000 | 1.14% | 0.00% | 0.00% |
* Azure Ls performance is for the single drive on the instance. The Azure Lsv2 instances have multiple drives equivalent to 5x rating each, so in aggregate they perform much better than the first generation Ls instances.