Reducing time to insight by leveraging massive parallelism of Spark and Aerospike
A real-time transactional-analytical system needs to combine transactional and streaming data in a single high-performance database that can operate as fast as the inbound data streams in. It also needs to work with various analytics frameworks including machine learning and artificial intelligence. Aerospike Connect for Spark addresses these requirements by combining streaming data with historical data for enhanced real-time decisioning and insights.
Why Aerospike Connect for Spark
Aerospike Connect for Spark enables the creation of data-intensive applications such as AI/ML, ETL, and more with familiar and easy to use Spark tools.
Drastically reduce time to insight by combining massively parallel computation in Spark with the massively parallel reads from Aerospike.
Save time developing analytics and AI/ML applications that use data in Aerospike by using a Spark supported language of your choice and the rich ecosystem of libraries that are already available with Spark.
Gain Closed-loop Business Insights
Gain closed-loop business insights by operating on transactional data and streaming insights based into the database using Connect for Spark.
Lower TCO by enabling analysis of massive, larger datasets yet with a smaller storage cluster footprint.
Key features of Aerospike Connect for Spark
Aerospike Connect for Spark supports streaming APIs that leverage Structured Spark Streaming to provide very low latency for both reads and writes, enabling AI/ML use cases that leverage Aerospike as a system of engagement in their Spark Streaming pipeline. Aerospike Connect for Spark coupled with the Aerospike Database scan-by-partition capability, predicate filtering and mapping of Aerospike partitions to Spark partitions allows massive parallelization.
Supports both DataFrames and DataSets
Loads Aerospike data into both Spark DataFrame and DataSets to enable further complex processing in Spark such as ETL and AI/ML using SparkML and other open source libraries and frameworks that support PySpark.
Leverages Structured Spark Streaming
Leverages Structured Spark Streaming to support streaming reads (change notifications) from and writes to Aerospike.
Supports mainstream languages
Supports multiple languages (Python, Java, Scala, etc.)
Supports massive parallelism
Supports massive parallelism by allowing you to use up to 32,768 Spark partitions to read data from an Aerospike namespace. Each namespace can store up to 32 billion records across 4,096 partitions.
Enables SQL access to Aerospike
Leverages Spark SQL (ANSI SQL 2003 standard) to allow SQL access to Aerospike.
Learn more about Connect for Spark
Conduct advanced and predictive data analytics across massive amounts of multi-modal data in real-time across disparate sources.