Skip to main content
Loading

Install the Aerospike Kafka Sink Connector

The Aerospike Kafka sink (inbound) connector streams data from Apache Kafka to Aerospike.

There are two modes in Kafka Connect for running workers: standalone mode and distributed mode. You should identify which mode works best for your environment before getting started.

Standalone mode

All work is performed in a single process. This configuration is simpler to set up and get started with, but can use only a subset of the connector's features. For example, in this mode, fault tolerance is not available to the connector.

Consider using this mode for testing on a local machine.

Distributed mode

Kafka Connect workers run on multiple nodes. These form a Kafka Connect cluster, and Kafka Connect distributes running connectors across the cluster. You can add more nodes or remove nodes as your needs evolve.

This mode is more fault-tolerant. If a node unexpectedly leaves the cluster, Kafka Connect automatically distributes the work of that node to other nodes in the cluster. And, because Kafka Connect stores connector configurations, status, and offset information inside the Kafka Connect cluster where it is safely replicated, losing the node where a Connect worker runs does not result in any lost data.

Distributed mode is recommended for production environments because of scalability, high availability, and management benefits. Refer to the Kafka Connect documentation for more information.

Prerequisites

Ensure that the systems on which you plan to install instances of the Aerospike Kafka sink connector are running version 0.9.0.0 or later of Apache Kafka and Kafka Connect.

Install Java 8 or later, if it is not already installed.

RHEL or CentOS

sudo yum install java-1.8.0-openjdk

Debian or Ubuntu

sudo apt-get install openjdk-8-jre

If your Kafka cluster is configured to use Simple Authentication and Security Layer (SASL) for authentication, ensure that you configure the workers in your Kafka Connect cluster to use SASL for authenticating to your Kafka cluster. See the Kafka Connect documentation for details.

Procedure

Place Aerospike Connect for Kafka in a path accessible to Kafka Connect. You must copy the connector to every machine where Kafka Connect is running. Whether Kafka Connect should run in your Kafka cluster or in a separate Kafka Connect cluster is entirely up to you to determine according to your requirements.

Download the Aerospike Kafka sink (inbound) .zip file, and extract the content of the .zip file into a new directory

In the properties file that you are using to configure the Kafka Connect workers, add this line to point to the JAR file for the connector:

plugin.path=<connector-directory>

<connector-directory>: The directory that you created in the previous step.

For information about this properties file and setting other properties for Kafka Connect, see "Worker Configuration Properties" in the Kafka Connect documentation.

Enable the connector to run on each Kafka Connect node. Ensure that the feature mesg-kafka-connector is enabled in the Aerospike feature-key file features.conf.

caution

For versions before 2.2.0, place a copy of this file in a directory on each Kafka Connect node and configure the feature-key-file configuration option for the connector.

For versions 2.2.0 and later make sure that the feature file with the feature key mesg-kafka-connector turned on is loaded onto the Aerospike server. The feature-key is read directly from the Aerospike server.

Configure the connector on each node by creating and editing the file <connector-directory>/etc/aerospike-kafka-inbound.yml. Refer to the configuration documentation for creating this file.

Standalone mode

Create a properties file to enable Kafka Connect to run the Aerospike Kafka sink connector.

Create a file named aerospike-sink.properties and add to it the following content:

name=aerospike-sink
connector.class=com.aerospike.connect.kafka.inbound.AerospikeSinkConnector
tasks.max=<value>
topics=<value>
config-file=<connector-directory>/etc/aerospike-kafka-inbound.yml
  • tasks.max: The maximum number of tasks that can be created for this connector. A task runs as a process in Kafka Connect.
  • topics: A list of comma-separated names of the topics for the connector to subscribe to.
  • <connector-directory>: The directory into which you extracted the content of the Aerospike Kafka Inbound .zip file.
  • config-file or config: Pass the path to a JSON or YAML configuration file with config-file or the configuration in JSON format directly with config.
note

Use only one of config-file or config. The two parameters are mutually exclusive.

Run this command

<kafka-dir>/bin/connect-standalone.sh <path-to-your-Kafka-Connect-config-file>  <path to aerospike-sink.properties>
  • <kafka-dir>: The directory where the Kafka package is located.
  • <path-to-your-Kafka-Connect-config-file>: The path to the file (including the filename and extension) that you are using to configure the worker in Kafka Connect.
  • <path-to-aerospike-sink.properties>: The path to the file (including the filename and extension) that you created in the previous step.

Distributed mode

In distributed mode the connector is deployed and managed using REST API requests.

To run the connector if your Kafka Connect cluster is in distributed mode, follow these steps on each Kafka Connect node:

Edit the configuration file for the Kafka Connect worker on that node. Edit the file according to your needs. Refer to the Kafka connect documentation for more information on the worker configuration properties file.

tip

If you run multiple distributed workers on one host machine for development and testing, the rest.port configuration property must be unique for each worker. This is the port the REST interface listens on for HTTP requests.

Run this command to launch the worker in distributed mode

bin/connect-distributed <path-to-your-Kafka-Connect-config-file>
  • <path-to-your-Kafka-Connect-config-file>: The path to the file (including the filename and extension) that you are using to configure the workers in Kafka Connect.

Follow these steps to create the connector.

Set the aerosink variable

aerosink='{"name":"aerospike-sink",
"config":{"connector.class":"com.aerospike.connect.kafka.inbound.AerospikeSinkConnector",
"config-file":"<connector-directory>/etc/aerospike-kafka-inbound.yml",
"tasks.max":"",
"topics":""}}
  • <connector-directory>: The directory into which you extracted the content of the Aerospike Kafka Inbound .zip file.
  • tasks.max: The maximum number of tasks that can be created for this connector. A task runs as a process in Kafka Connect.
  • topics: A list of comma-separated names of the topics for the connector to subscribe to.

Set the kafkaEndpoint variable.

kafkaEndpoint="<URI>"

kafkaEndpoint: This is the REST endpoint for the Kafka Connect service. You can make requests to any cluster member; the REST API automatically forwards requests, if required.

Make a REST request to create the connector.

curl -X POST --header "Content-Type:application/json"
--data ${aerosink} ${kafkaEndpoint}/connectors