Spark connector for data intensive applications

The Aerospike connector for Spark enables the creation of data-intensive applications such as AI/ML and ETL with advanced Apache Spark 3.0 tools.

Powerful analysis combining data from the edge and system of record

Reducing time to insight by leveraging massive parallelism of Spark and Aerospike


A real-time transactional-analytical system needs to combine transactional and streaming data in a single high-performance data platform that can operate as fast as the inbound data streams in. It also needs to work with various analytics frameworks including machine learning and artificial intelligence. Aerospike Connect for Spark addresses these requirements by combining streaming data with historical data so organizations can act in real time.

Aerospike Connect for Kafka diagram


Why Aerospike Connect for Spark

Aerospike Connect for Spark enables the creation of data-intensive applications such as AI/ML, ETL, and more with familiar and easy to use Spark tools.

  • Real-time Analytics

    Drastically reduce time to insight by combining massively parallel computation in Spark with the massively parallel reads from Aerospike.
  • Rapid Development

    Save time developing analytics and AI/ML applications that use data in Aerospike by using a Spark supported language of your choice and the rich ecosystem of libraries that are already available with Spark.
  • Gain Closed-loop Business Insights

    Gain closed-loop business insights by operating on transactional data and streaming insights combined in the database using Connect for Spark.
  • Reduce server footprint

    Reduce server footprint by up to 80 percent while enabling analysis of massive datasets.

Key features of Aerospike Connect for Spark

Aerospike Connect for Spark supports streaming APIs that leverage Structured Spark Streaming to provide very low latency for both reads and writes. This enables AI/ML use cases that leverage Aerospike as a system of engagement in their Spark Streaming pipeline. Aerospike Connect for Spark coupled with the Aerospike Database scan-by-partition capability, predicate filtering and mapping of Aerospike partitions to Spark partitions allows massive parallelization.

Supports both DataFrames and DataSets

Loads Aerospike data into both Spark DataFrame and DataSets to enable further complex processing in Spark such as ETL and AI/ML using SparkML and other open source libraries and frameworks that support PySpark.

Leverages Structured Spark Streaming

Leverages Structured Spark Streaming to support streaming reads (change notifications) from and writes to Aerospike.

Supports mainstream languages

Supports multiple languages (Python, Java, Scala, etc.)

Supports massive parallelism

Supports massive parallelism by allowing you to use up to 32,768 Spark partitions to read data from an Aerospike namespace. Each namespace can store up to 32 billion records across 4,096 partitions.

Enables SQL access to Aerospike

Leverages Spark SQL (ANSI SQL 2003 standard) to allow SQL access to Aerospike.

Learn more about Connect for Spark

Conduct advanced and predictive data analytics across massive amounts of multi-modal data in real-time across disparate sources.

Read solution brief
Use Cases

Typical use cases for Aerospike Connect for Spark

Aerospike Connect for Spark makes it easy for enterprises to address AI/ML use cases requiring real time actions across billions of transactions.

Learn what Spark and Aerospike can do together


Powering a real-time online profile store for a global ad tech

Data integration challenges, infrastructure complexity, 12 hours to complete Spark jobs all got remedied.

Read case study

Learn more about the Aerospike Connect product line