Aerospike recently published a manifesto regarding benchmarking in the NoSQL space. Although database benchmarks are notoriously slanted, NoSQL arguably must deliver both speed and scale – and although every benchmark can’t be perfect, anyone reading a benchmark should be aware of its shortcomings.
Lynn Langit, independent Cloud Architect, Google Cloud Developer Expert, AWS Community Hero and Microsoft MVP, recently published “Speed with Ease – NoSQL on the Google Cloud Platform”, a discussion of the performance of Aerospike on the Google Cloud Platform. It is a long thread of several posts about NoSQL performance on Google Compute Engine, including this post from Ivan Filho, Performance Engineering Lead at Google.
This article will review the “Speed with Ease – NoSQL on the Google Cloud Platform” blog post, using the criteria for a “good database benchmark” set forth in Aerospike’s manifesto. Of note is that the “Speed with Ease” blog post focuses on enabling anyone to run their own benchmarks, and doesn’t claim to be a benchmark itself.
The Manifesto’s Criteria
According to Aerospike’s database manifesto, a good benchmark should include the following:
- A large dataset (1-10 TB)
- Object count from 100 million to 1 billion
- Object size from 8 bytes to 10K
- Object variation – a distribution of sizes
- Latency and variation under load
- Long term tests, such as for 48 hours
- Node failure / consistency results
- Replication and persistence
- Mixed reads and write
- Scale out by adding nodes
How Did “Speed with Ease” Do?
Dataset size
According to Ms. Langit’s post, the speed was measured with 50 GB of data across 20 nodes. This is RAM only, and can easily be changed to include more data, but the full size of the cluster (350 GB) isn’t used. While hers is a common approach to reducing the amount of time required to run the benchmark, and 50 GB is much higher than 1 GB we have seen in other systems, this is a far cry from even 1 TB. The headline numbers could have been improved by loading more data.
Object count, size, and variation
The number of keys in the test is 1 million, and with a stated size of 50 GB, object size is a fixed 50 bytes. 50 bytes is a reasonable size that stresses transaction processing speed without being a simple test of network throughput. Ms. Langit makes it fairly clear to a user how to run the benchmark themselves with their own object sizes; however, including some published numbers at different object sizes would be an improvement.
Latency testing
There is no statement about the latency of requests while running the benchmark framework.
Multi-day testing
Ms. Langit’s benchmark appears to be momentary. After writing data, the read speed is seen for only a few moments.
Node failure and consistency results
There is no testing of node faults or data correctness.
Replication and persistence
This test is done with a replication factor of 2, which is Aerospike’s common deployment pattern for high availability. Also, the configuration does not seem to cover persistence. While this is acceptable for a RAM cache, a stated configuration that includes persistence would be an improvement.
Mixed reads and writes
This test runs both a 100% read and a 100% write test as two different test runs. This is a sound methodology, as it allows a user to interpolate the results; the method of changing the benchmark run to cover their own read/write mix is well-explained.
Scale out by adding nodes
While this test does use a reasonable number of nodes (20), there are no tests to add or remove nodes. With a 20-node test recipe, trying different sizes – from 2 to 20 to 200 – can easily be achieved.
Final Comments
This article attempts to review the tooling for running a database benchmark. In her “Speed with Ease – NoSQL on the Google Cloud Platform” blog post, Ms. Langit, rather than focusing on numbers, results or a broad range of cases, aims to get a user up and running quickly (within minutes) in conducting their own benchmark. She includes clear instructions for different object sizes and read/write ratios. Running the test for longer periods can also be easily achieved. All in all, that’s pretty remarkable.
Yet, in a few key areas – like persistence and data model variation – the blog post is lacking. It would benefit from instructions on how to set up the tests with persistence and different data models. Moreover, while the tool used does support adding and removing nodes and visualizing recovery time, it doesn’t do so through scripting. Lastly, regarding data validation, Aerospike’s benchmark tools could be improved.
If you’d like to discuss benchmarking or have any questions or comments, feel free to post on our community forum.