Install Aerospike Connect for Spark

For the complete documentation index see: llms.txt

All documentation pages available in markdown.

To create a Spark application that can access Aerospike Database, you have to download the appropriate connector JAR file and add it to your application’s environment.

Aerospike Connect for Spark prerequisites

Before installing Aerospike Connect for Spark (Spark connector), verify that you meet the following prerequisites:

Your Spark cluster must be at version 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, or 3.5.x.
The Java 8 SDK must be installed on the system where you plan to run the Spark connector. If you want to test with different versions of the Java 8 SDK, consider using sdkman to help you manage those versions.
Your Aerospike Database Enterprise Edition cluster must be at version 5.0 or later if you plan to use the Spark connector version 2.0 or later.
The connector does not bundle Spark and Hadoop related binaries within its jar, which means your production system must have Spark and Hadoop installed.

Spark connector artifacts

You can download Spark connector artifacts from the Aerospike downloads page or from JFrog, under the Scala 2.12 and Scala 2.13 group paths.

Aerospike publishes multiple versions of the Spark connector for different combinations of connector version, Spark version, Scala version, and packaging type. The available packaging types are:

allshaded indicates that all internal libraries, including the Aerospike Java client, are bundled in the JAR.
clientunshaded indicates that all libraries except the Aerospike Java client are bundled in the JAR.

The following artifacts are available in JFrog:

Connector version	Spark versions	Scala versions	Packaging types
5.0.3	3.4, 3.5	2.12, 2.13	allshaded, clientunshaded
5.0.2	3.4, 3.5	2.12, 2.13	allshaded, clientunshaded
5.0.1	3.4, 3.5	2.12, 2.13	allshaded, clientunshaded
5.0.0	3.4, 3.5	2.12, 2.13	allshaded, clientunshaded
4.5.1	3.0, 3.1, 3.2, 3.3, 3.4, 3.5	2.12, 2.13	allshaded, clientunshaded
4.5.0	3.0, 3.1, 3.2, 3.3, 3.4, 3.5	2.12, 2.13	allshaded, clientunshaded
4.4.0	3.0, 3.1, 3.2, 3.3, 3.4, 3.5	2.12, 2.13	allshaded, clientunshaded
4.3.1	3.0, 3.1, 3.2, 3.3, 3.4	2.12, 2.13	allshaded, clientunshaded
4.3.0	3.0, 3.1, 3.2, 3.3, 3.4	2.12, 2.13	allshaded, clientunshaded
4.2.0	3.0, 3.1, 3.2, 3.3, 3.4	2.12, 2.13	allshaded, clientunshaded
4.1.0	3.0, 3.1, 3.2, 3.3, 3.4	2.12, 2.13	allshaded, clientunshaded
4.0.0	3.0, 3.1, 3.2, 3.3, 3.4	2.12, 2.13	allshaded, clientunshaded
3.5.5	3.0, 3.1, 3.2	2.12	allshaded, clientunshaded

Artifact filenames indicate the the connector version, Spark version supported, Scala version, and packaging type. For example:

4.5.1-spark3.5-scala2.13-allshaded.jar
4.5.1-spark3.4-scala2.12-clientunshaded.jar

See the Aerospike Connect for Spark release notes for more information about each release.

Spark connector installation

Install using JFrog Artifactory

In build.sbt, add the JFrog resolver:

resolvers += "Artifactory Realm" at "https://aerospike.jfrog.io/artifactory/spark-connector"
Add the dependency, SBT %% will select the correct Scala artifact, _2.12 or _2.13:

"com.aerospike" %% "aerospike-spark" % "<artifact-version>" where <artifact-version> is the exact folder name in JFrog for the Spark version and packaging combination you want.

Install manually

You can download the .jar package from the Aerospike Downloads site.
Download the connector based on the Apache Spark version you are using. Connector versions 4.0.0 and later support only Apache Spark 3.x.

Add the .jar package to your application’s environment

You can do this in either of these ways:

If you plan to create a batch job or address the challenges of real-time business insights by leveraging the streaming job, write a Scala, Java, or Python application by following the interactive code in the Jupyter notebooks. Specify the downloaded JAR as a dependency. Once your Spark application is ready, submit it to the Spark cluster using either spark-submit or spark-shell. See Submitting Applications in the Spark documentation for detailed information.

Example using spark-submit

spark-submit --jars path-to-aerospike-spark-connector-jar --class application-entrypoint application.jar

If you plan to create a Jupyter notebook that uses the Spark connector, add the JAR path to the environment variables.

Example using Python

import os
os.environ["PYSPARK_SUBMIT_ARGS"] = '--jars aerospike-spark-5.0.3-spark3.5-scala2.12-allshaded.jar pyspark-shell'

Example using Scala

launcher.jars = ["aerospike-spark-5.0.3-spark3.5-scala2.12-allshaded.jar"]

See our notebooks for other examples.