HPE: AI at Hyperscale – How to go faster with a smaller footprint
-Theresa Melvin, Chief Architect of AI-driven Big Data Solutions, HPE
My name is Teresa Melvin. I work for HPE. I’m the AI- driven Big Data Solutions Architect, and I run the open-source solutions R&D lab out of Fort Collins Colorado.
In order to meet the requirements, I have to be able to persist millions of events per second, that’s millions of IO’s per second. The only solution that can get close with today’s technology hardware and software is Aerospike running on Intel pmem (persistent memory) DIMMs, and that is about 280,000 sustained read and write operations per second – which is about 2,000 percent more than anything else out there.
Designs that I put together, they have to be able to write as fast as they read. A lot of times I have a 1 to 10 write-read ratio. For every one terabyte that is inserted, I have to read out 10 terabytes. So, that requires a very special type of NoSQL database, and unfortunately every single database that I have tested over 20 months failed in that regard with the exception of Aerospike.
With Aerospike’s Spark connector, I have the ability to read the data out of Aerospike just as fast as it was inserted in. This allows me to do near-instantaneous machine learning on the data as it lands.
For the development that I do, I have to have one single stack that is deployed everywhere. From the developers’ laptops, to the end device which for automotive is the car, to the edge – any edge appliances that do additional parsing and coalescing of the data – to the core datacenters, and then cloud which is typically user portals so that’s where the users are going to connect into. So, in order to meet all five of those individual layers, I have to be able to containerize the solution, and it has to fit seamlessly into a CICD pipeline. So, that significantly narrows the scope of the products that I can work with. They have to be able to deploy in every single one of those situations, and they have to work the exact same way with no additional augmentation.
So, with Aerospike because it runs an official docker container, I can put it directly into a Kubernetes dataflow pipeline, and with the click of a button I can move that single stack from core – which is running on all the enterprise hardware – all the way out to the edge into the car into cloud, without having to change anything at all.
Aerospike and Intel persistent memory is incredibly fast, and it also affords the ability to be equally fast when I have to take those nodes down – those Aerospike database nodes for maintenance – because it allows the ability to do warm database restarts. Which means there’s no downtime really for these giant database nodes anymore.
When it comes to building real-time persist solutions that have the ability of performing 10 times the number of reads for every single write, there is nothing else out there other than Aerospike that can currently meet that objective.