Skip to content

Bulk data loading for standalone processing

Overview

This page describes how to load graph data into an Aerospike database with the Aerospike Graph bulk data loader and the Gremlin call API. This method is for standalone processing of small data sets.

Data processing takes place on your AGS instance, so this option is appropriate for smaller data sets or for testing. For larger data sets, we recommend using the distributed mode.

Bulk loading with the Gremlin call step

Requirements

  • A running AGS instance with the standard Docker image. See Installation for help with getting an AGS instance up and running.

  • A running Aerospike Database, version 7.0 or later.

  • Data files for edges and vertices in CSV format, in the required directory structure.

The bulk load command

Use the following base Gremlin command to initiate a standalone bulk loading job:

g.call("aerospike.graphloader.admin.bulk-load.load")
Response
Terminal window
==>Bulk load started successfully. Use the g.call("aerospike.graphloader.admin.bulk-load.status") command to get the status of the job.

The full usage of this command varies by storage backend (Local, S3, GCS) and is shown in the tabs below.

When using local source files, make sure your AGS container can access them via Docker bind mounts. For example, if your files are in /etc/data:

Terminal window
docker run -p 8182:8182 -v /etc/data:/opt/aerospike-graph/data container.aerospike.com/aerospike/aerospike-graph-service

Then run the following in the Gremlin console:

g.with("evaluationTimeout", 20000)
.call("aerospike.graphloader.admin.bulk-load.load")
.with("aerospike.graphloader.vertices", "/opt/aerospike-graph/data/vertices")
.with("aerospike.graphloader.edges", "/opt/aerospike-graph/data/edges")
.next()

The evaluationTimeout parameter

The default AGS command timeout is 10 seconds (specified in milliseconds as 10000). Depending on system load or configuration, the bulk loader may take longer to initialize.

If your graph data is stored in remote cloud buckets or your cluster takes longer to initialize, you can increase the timeout with the evaluationTimeout parameter.

For example:

// For remote storage access (S3/GCS), longer initialization is expected
g.with("evaluationTimeout", 60000)

If commands are failing during initialization, try increasing this value.

Status monitoring

Use the command aerospike.graphloader.admin.bulk-load.status to check the progress of a standalone bulk data loading job. In the Gremlin console:

g.call("aerospike.graphloader.admin.bulk-load.status").next()

This call returns a structured response describing the job’s current status. The available fields are:

KeyTypeAvailabilityDescription
stepStringAlwaysCurrent bulk load step. See stages and steps for a complete list of bulk loading steps.
completeBooleanAlwaysIf true, the current bulk loading job is complete. If false, the job is ongoing.
statusStringAlwaysCurrent job status. May be one of: success, in progress, error
messageStringOnly when complete is true and status is errorMessage from the Exception that caused the failure.
stacktraceStringOnly when complete is true and status is errorStacktrace from the Exception that caused the failure.
elements-writtenLongOnly when stage is Vertex writing or Edge writingNumber of vertex or edge elements written, depending on the current writing stage.
complete-partitions-percentageIntegerOnly when stage is Vertex writing or Edge writingPercentage count of the partitions completed for the current writing stage.
duplicate-vertex-idsLongWhen complete is true. May be absent if status is error and the error which caused the job to fail makes this information inaccessible.See Error handling for details.
bad-entriesLongWhen complete is true. May be absent if status is error and the error which caused the job to fail makes this information inaccessible.See Error handling for details.
bad-edgesLongWhen complete is true. May be absent if status is error and the error which caused the job to fail makes this information inaccessible.See Error handling for details.
Feedback

Was this page helpful?

What type of feedback are you giving?

What would you like us to know?

+Capture screenshot

Can we reach out to you?