# Incremental data loading

In addition to creating new graphs with the [bulk loader](https://aerospike.com/docs/graph/3.1.0/develop/data-loading/overview), you can incrementally load data into an existing graph. Incremental data loading supports the following operations:

-   **Add new vertices**  
    Introduce new vertices into an existing graph.
    
-   **Add disconnected data**  
    Load new vertices and edges that are not connected to any existing elements in the graph.
    
-   **Add connected data**  
    Load new vertices and edges that establish connections with existing vertices in the graph.
    
    -   The [CSV files](https://aerospike.com/docs/graph/3.1.0/develop/data-loading/csv-format) must include references to the vertex IDs of existing elements to establish the connections correctly.
-   **Add edges to new or existing vertices**  
    Introduce new edges that connect any combination of new and/or pre-existing vertices.
    
    -   Edge definitions in the CSV must include source and target vertex IDs, including those already present in the graph.
-   **Update properties of existing vertices**  
    Modify the properties of vertices that already exist in the graph.
    
    -   You can append new properties to a vertex, such as a `status` or `last_seen` property not previously present.
        
    -   You can also overwrite existing properties by specifying the same property key with a new value in the CSV file.
        
    -   If a property already exists on a vertex, the value in the incremental load replaces the current one.
        
    -   To add a new property to an existing vertex, you can use a CSV file which contains data for the new property. Existing properties are preserved.
        

::: note
Ensure that vertex IDs in the incremental files match those in the existing graph exactly to apply updates correctly.
:::

### Vertex insertion

To perform vertex insertions on an incremental dataset, the bulk loader uses the [`mergeV`](https://tinkerpop.apache.org/docs/current/reference/#mergevertex-step) TinkerPop step.

-   The bulk loader merges vertices based on the `~id` field.
    
-   If a vertex with the same `~id` value is specified multiple times in the incremental dataset, the final vertex contains all the data of the combined rows. If the incremental dataset contains rows with duplicate `~id` fields which have different values for the same properties, the final assigned property value is non-deterministic and may be any of the assigned values.
    

### Edge insertion

Edge insertions on an incremental dataset behave the same as a fresh data load. No merging occurs, and all entries in the edge dataset create a valid edge between the specified vertices.

## Usage

-   [Standalone mode](#tab-panel-1204)
-   [Distributed mode](#tab-panel-1205)

To load data incrementally in [standalone mode](https://aerospike.com/docs/graph/3.1.0/develop/data-loading/standalone), add the `incremental_load` flag to the `call` command, as demonstrated in the following example:

```groovy
g.with("evaluationTimeout", 20000)

 .call("aerospike.graphloader.admin.bulk-load.load")

 .with("aerospike.graphloader.vertices", "<path_to_vertices>")

 .with("aerospike.graphloader.edges", "<path_to_edges>")

 .with("incremental_load", true)

 .next()
```

To load data incrementally in [distributed mode](https://aerospike.com/docs/graph/3.1.0/develop/data-loading/distributed), add the `-incremental_load` flag to the `submit spark` command for your cloud service.

::: note
Be sure to customize the following example command with the correct values for your cloud environment.
:::

Terminal window

```bash
gcloud dataproc jobs submit spark \

    --class=com.aerospike.firefly.bulkloader.SparkBulkLoader \

    --jars="gs://<bucket-name>/aerospike-graph-bulk-loader-x.y.z.jar" \

    --cluster="testcluster" \

    --region="us-central1" \

    -- -c "gs://<bucket-name>/bulk-loader.properties" -incremental_load
```

## Spark job flags

The following flags are all optional.

| Argument | Description |
| --- | --- |
| `-incremental_load` | Add new data to an existing graph. |
| `-validate_input_data` | Perform format and data validation of all Vertex and Edge CSV files before writing to Aerospike database. |
| `-verify_output_data` | Perform verification of a percentage of loaded elements, specified by `aerospike.graphloader.sampling-percentage`, by reading them back after loading. The verification process uses a traversal query. |
| `-resume` | [Resume](https://aerospike.com/docs/graph/3.1.0/develop/data-loading/restart) a previously failed job. |
| `-clear_existing_data` | Delete all existing data before beginning the new job. |