As a vendor with customers and community who trust us and believe in us, we feel it is our duty to test the quality and performance of the software we deliver. As such, we test both speed and scale simultaneously, because we believe speed without scale is a dead end for successful applications, and scale without speed subjects users to unmitigated inflation in costs – server acquisition costs, operating costs, IT ops risks – and, worst of all, potential tech stack redesigns.
Benchmarks must be transparent. They should clearly describe the hardware and configuration used for the systems being tested. They should honestly try to tune a competitor’s system to give it a fair chance. They should be reproducible by a third party. They need to cover realistic use cases and avoid narrow tests that artificially constrain or improve results from unlikely scenarios. Honest benchmarks must separate science from marketing. Far too often, vendors “game” benchmarks in ways that deliberately or accidentally mislead development teams.
To this end, Aerospike will conduct periodic performance benchmarks of our technology and will release the results and code to reproduce the results publicly. Any technologist should be able to replicate our tests, critique our methodology, and discuss the results.
Aerospike will publish not only our own benchmarks but also those conducted by independent third parties. We will attempt to duplicate benchmarks published by other database vendors so development teams can make intelligent decisions about speed and scale.
A good benchmark should include:
- A large dataset (> 10 TB)
- Large object counts (> 1 billion)
- Realistic object sizes
- Object variation – a distribution of sizes
- Latency and variation under load
- 48-hour tests
- Multi-Node Clusters
- Node failure / consistency results
- Scale out by adding nodes
- Replication and persistence
- Mixed reads and write workloads
To be a fair and transparent test, a representative benchmark should test performance, then scale out and then failure modes. The test must include full, reproducible code examples (including configuration) and result in the published benchmark numbers.
A fair and transparent benchmark should avoid using designs that skew or obscure performance results. Some techniques to be explicitly called out when used, or avoided entirely, are:
- Short duration tests
- Small, predictable datasets entirely in DRAM or worse, fully in the L2-L3 cache
- Non-replicated datasets
- Lack of mixed read / write loads
- Single node tests
- Narrow, unique-feature benchmarks
We look forward to continuing this dialogue. Read our benchmark section to view our latest benchmarks.