AEROSPIKE CONNECTOR FOR SPARK

Spark connector for data intensive applications

The Aerospike connector for Spark enables the creation of data-intensive applications such as AI/ML and ETL with advanced Apache Spark 3.0 tools.

Reducing time to insight by leveraging massive parallelism of Spark and Aerospike

A real-time transactional-analytical system needs to combine transactional and streaming data in a single high-performance data platform that can operate as fast as the inbound data streams in. It also needs to work with various analytics frameworks including machine learning and artificial intelligence. Aerospike Connect for Spark addresses these requirements by combining streaming data with historical data so organizations can act in real time.

Aerospike Connect for Kafka diagram

Benefits

Why Aerospike Connect for Spark

Aerospike Connect for Spark enables the creation of data-intensive applications such as AI/ML, ETL, and more with familiar and easy to use Spark tools.

Real-time Analytics
Drastically reduce time to insight by combining massively parallel computation in Spark with the massively parallel reads from Aerospike.
Rapid Development
Save time developing analytics and AI/ML applications that use data in Aerospike by using a Spark supported language of your choice and the rich ecosystem of libraries that are already available with Spark.
Gain Closed-loop Business Insights
Gain closed-loop business insights by operating on transactional data and streaming insights combined in the database using Connect for Spark.
Reduce server footprint
Reduce server footprint by up to 80 percent while enabling analysis of massive datasets.

FEATURES

Key features of Aerospike Connect for Spark

Aerospike Connect for Spark supports streaming APIs that leverage Structured Spark Streaming to provide very low latency for both reads and writes. This enables AI/ML use cases that leverage Aerospike as a system of engagement in their Spark Streaming pipeline. Aerospike Connect for Spark coupled with the Aerospike Database scan-by-partition capability, predicate filtering and mapping of Aerospike partitions to Spark partitions allows massive parallelization.

Supports both DataFrames and DataSets

Loads Aerospike data into both Spark DataFrame and DataSets to enable further complex processing in Spark such as ETL and AI/ML using SparkML and other open source libraries and frameworks that support PySpark.

Leverages Structured Spark Streaming

Leverages Structured Spark Streaming to support streaming reads (change notifications) from and writes to Aerospike.

Supports mainstream languages

Supports multiple languages (Python, Java, Scala, etc.)

Supports massive parallelism

Supports massive parallelism by allowing you to use up to 32,768 Spark partitions to read data from an Aerospike namespace. Each namespace can store up to 32 billion records across 4,096 partitions.