---
title: "CSV format for source data files"
description: "CSV format requirements for bulk loading vertex and edge data into Aerospike Graph Service (AGS)."
---

# CSV format for source data files

> For the complete documentation index see: [llms.txt](https://aerospike.com/docs/llms.txt)
> 
> All documentation pages available in markdown.

## Overview

Source files for bulk data loading into Aerospike Graph Service (AGS) use the comma-separated values (CSV) format described here.

::: note
The AGS bulk loader requires source data files to be stored in a defined directory structure as described in the [Directory Structure](#directory-structure) section of this page.
:::

## Header rows

Each CSV file has a comma-separated header row. Header rows must contain no spaces between delimited columns.

### Vertex data file headers

| Header | Required? | Description |
| --- | --- | --- |
| `~id` | Yes | Unique ID for the vertex. `~id` values may be of data type `String` or `Long`. Number values must be positive whole numbers. |
| `~label` | No | Label for the vertex. If `~label` is not specified, the bulk loader adds a default value of `vertex` for the `~label` field. |

### Edge data file headers

| Header | Required? | Description |
| --- | --- | --- |
| `~from` | Yes | Vertex ID of the _from_ vertex. |
| `~to` | Yes | Vertex ID of the _to_ vertex. |
| `~label` | No | Label for the edge. Each edge can have only one label. If `~label` is not specified, the bulk loader adds a default value of `edge` for the `~label` field. |

::: note
AGS does not support user-provided `~id` values for edges, so the `~id` column is optional for edge CSV files. If your CSV file contains an `~id` column, the values are ignored.
:::

### Property column headers

Specify a header for each data column with the format `propertyName:type:cardinality`.

-   `propertyName` is the name for the property.
-   `type` is an optional specifier for the data type and defaults to `String` if not provided.
-   `cardinality` is an optional specifier to indicate that the property contains multiple values.

If no `type` or `cardinality` is specified, the value is treated as a single `String`.

Examples of valid headers:

| Header | Value format |
| --- | --- |
| `propertyName` | Single `String` value |
| `propertyName:string:list` | Multiple `String` values |
| `propertyName:int` | Single `Int` value |
| `propertyName:int:list` | Multiple `Int` values |

Header considerations:

-   If a property name includes colons, you must specify both a type and cardinality. For example: `yyyy:mm:dd:string:single`.
    
-   If you specify a cardinality, you must also specify a data type.
    
-   Data type and cardinality header elements are case-insensitive.
    

### Allowable data types

The following data types are allowed:

| Data type | Allowable values |
| --- | --- |
| `Bool` or `Boolean` | `true`, `false` |
| `Int` or `Integer` | \-2^31 to 2^31-1 |
| `Long` | \-2^63 to 2^63-1 |
| `Double` | 64-bit IEEE 754 floating point |
| `String` | Any string value. Quotation marks are optional. |
| `Date` | Values must be in ISO-8601 format (for example, `YYYY-MM-DD`, `YYYY-MM-DDTHH:MM:SS`, `YYYY-MM-DDTHH:MM:SSZ`) |

### Property column values

The value in a row for an edge or vertex file underneath the specified column header is taken as-is if the header indicates a single value. Multiple values are delimited with the `;` character.

If you specify multiple values for a property:

-   Ensure the header of the value has cardinality set to `list`. Example: `propertyName:int:list`.
-   Multiple values are separated by the `;` character in the CSV file.
-   Multiple string values are allowed, but the semicolon character cannot be escaped when the value is multiple strings.

#### Vertex property multi-values

If you specify multiple values underneath a vertex property header in your CSV file, the resulting vertex has values for the property key that adhere to the [TinkerPop multi-properties standard](https://aerospike.com/docs/graph/3.1.0/develop/query/multi-properties/). For example, imagine a graph of baseball players with a vertex named `Shohei Ohtani`. It could have a property named `hasPlayedFor` with the CSV value `Angels;Dodgers`. The resulting vertex has the following properties:

-   Property 1 `hasPlayedFor: Angels`
-   Property 2 `hasPlayedFor: Dodgers`

#### Edge property multi-values

If you specify multiple values underneath an edge property header with the `list` element in your CSV file, the resulting edge has a property where the multiple values are contained in a list. For example, imagine a graph of baseball players with an edge called `teams`. It could have the CSV value `Yankees;Giants;Mariners`. The resulting edge has a property with key `teams` and the value `[Yankees, Giants, Mariners]`.

## Data row elements

Rows must contain no spaces between delimited elements.

| Element | Description |
| --- | --- |
| Delimiter | Fields are separated by commas. Records are separated by a newline or a newline followed by a carriage return. |
| Blank fields | Non-required columns may be left blank. Blank fields still require comma separators. |
| Vertex IDs | The `~id` value must be unique for all vertices in every vertex file. |
| Edge IDs | User-provided `~id` values for edges are not supported, so the `~id` column is optional for edge CSV files. If your CSV file contains an `~id` column, the values are ignored. |
| Labels | Labels are case sensitive. |
| String values | Surrounding string values with quotation marks is optional. Commas, newline, and carriage return characters are automatically escaped if they are included in a string surrounded by double quotation marks. |

## CSV format specification

The CSV file format follows the RFC 4180 CSV specification, including the following requirements.

-   Both Unix and Windows-style line endings are supported (`\n` or `\r\n`).
    
-   Any field may be surrounded with double quotation marks (`"`).
    
-   Fields containing a line-break, double-quote, or commas must be quoted. If they are not, the load process errors out immediately.
    
-   Blank fields are allowed. A blank field is considered an empty value.
    
-   For list type columns, semicolons are used as list item delimiters.
    

For more information, see [Common Format and MIME Type for CSV Files](https://tools.ietf.org/html/rfc4180) on the Internet Engineering Task Force (IETF) website.

## Data file examples

The following example vertex and edge data files illustrate an example graph of university student records.

### Vertex file

The `CourseNum` field in the following example has no specified data type in the header row, so it defaults to type `String` and the data values are all stored as strings.

Data file:

```txt
~id,Name:String,Scores:Int:list,Topic:String,Passed:Boolean,CourseNum

v1,"Bob Warner",32;67;21,"Physics",false,201

v2,"Gloria Mendes",41;85;92,"Music",true,"Three Hundred"

v3,"Susan Wolff",77;42;51,"Biology",false,330

v4,"James Halford",67;62;89,"Physics",true,101

v5,"Frieda Wolinsky",57;71;94,"Biology",true,"Two Forty"

v6,"Amy Cheng",28;59;73,"Music",false,101

v7,"Zack Hulot",59;77;93,"History",true,220

v8,"Rafael Kubelik",67;35;28,"History",false,"First Year Seminar"

v9,"Leah Starke",66;82;79,"Biology",true,330

v10,"Amber Florian",68;71;96,"Music",true,102
```

Tabular view of data:

| ~id | Name | Scores | Topic | Passed | CourseNum |
| --- | --- | --- | --- | --- | --- |
| v1 | ”Bob Warner” | 32, 67, 21 | ”Physics” | false | ”201” |
| v2 | ”Gloria Mendes” | 32, 67, 21 | ”Music” | true | ”Three Hundred” |
| v3 | ”Susan Wolff” | 77, 42, 51 | ”Biology” | false | ”330” |
| v4 | ”James Halford” | 67, 62, 89 | ”Physics” | true | ”101” |
| v5 | ”Frieda Wolinsky” | 57, 71, 94 | ”Biology” | true | ”Two Forty” |
| v6 | ”Amy Cheng” | 28, 59, 73 | ”Music” | false | ”101” |
| v7 | ”Zack Hulot” | 59, 77, 93 | ”History” | true | ”220” |
| v8 | ”Rafael Kubelik” | 67, 35, 28 | ”History” | false | ”First Year Seminar” |
| v9 | ”Leah Starke” | 66, 82, 79 | ”Biology” | true | ”330” |
| v10 | ”Amber Florian” | 68, 71, 96 | ”Music” | true | ”102” |

### Edge file

Data file:

```txt
~from,~to,~label,weight:Double

v1,v6,connected,0.7

v2,v9,connected,0.7

v3,v2,connected,0.7

v4,v8,connected,0.7

v5,v3,connected,0.7

v6,v4,connected,0.7

v7,v9,connected,0.7

v8,v1,connected,0.7

v9,v10,connected,0.7

v10,v3,connected,0.7
```

Tabular view of data:

| ~id | ~from | ~to | label | weight |
| --- | --- | --- | --- | --- |
| (Auto-generated by AGS) | v1 | v6 | ”connected” | 0.7 |
| (Auto-generated by AGS) | v2 | v9 | ”connected” | 0.7 |
| (Auto-generated by AGS) | v3 | v2 | ”connected” | 0.7 |
| (Auto-generated by AGS) | v4 | v8 | ”connected” | 0.7 |
| (Auto-generated by AGS) | v5 | v3 | ”connected” | 0.7 |
| (Auto-generated by AGS) | v6 | v4 | ”connected” | 0.7 |
| (Auto-generated by AGS) | v7 | v9 | ”connected” | 0.7 |
| (Auto-generated by AGS) | v8 | v1 | ”connected” | 0.7 |
| (Auto-generated by AGS) | v9 | v10 | ”connected” | 0.7 |
| (Auto-generated by AGS) | v10 | v3 | ”connected” | 0.7 |

## Directory structure

Source data files must be stored in directories specified by the `aerospike.graphloader.vertices` and `aerospike.graphloader.edges` [configuration options](https://aerospike.com/docs/graph/reference/config).

-   The directory specified in `aerospike.graphloader.vertices` must contain one or more subdirectories of vertex CSV files.
    
-   The directory specified in `aerospike.graphloader.edges` must contain one or more subdirectories of edge CSV files.
    
-   The CSV files in any one subdirectory must all contain the same row format, with the same header rows.
    

### Directory structure examples

#### Local files example

The following examples illustrate the directory structures for local files.

-   Data directory: `/opt/aerospike/graph/data`
    
-   Vertex directory: `/opt/aerospike/graph/data/vertices`
    
    -   The `aerospike.graphloader.vertices` configuration option must be set to `/opt/aerospike/graph/data/vertices`.
-   Vertex CSV subdirectory 1: `/opt/aerospike/graph/data/vertices/vert_dir1`
    
    -   All CSV files in the `vert_dir1` subdirectory must have the same row format.
-   Vertex CSV subdirectory 2: `/opt/aerospike/graph/data/vertices/vert_dir2`
    
    -   All CSV files in the `vert_dir2` subdirectory must have the same row format.
-   Edge directory: `/opt/aerospike/graph/data/edges`
    
    -   The `aerospike.graphloader.edges` configuration option must be set to `/opt/aerospike/graph/data/edges`.
-   Edge CSV subdirectory 1: `/opt/aerospike/graph/data/edges/edge_dir1`
    
    -   All CSV files in the `edge_dir1` subdirectory must have the same row format.

Visual representation of the directory structure:

```txt
/opt/aerospike/graph/data

|

---- /opt/aerospike/graph/data/vertices/

|

-------- /opt/aerospike/graph/data/vertices/vert_dir1

|

------------ /opt/aerospike/graph/data/vertices/vert_dir1/vert_file1.csv

------------ /opt/aerospike/graph/data/vertices/vert_dir1/vert_file2.csv

|

-------- /opt/aerospike/graph/data/vertices/vert_dir2

|

------------ /opt/aerospike/graph/data/vertices/vert_dir2/vert_file3.csv

------------ /opt/aerospike/graph/data/vertices/vert_dir2/vert_file4.csv

|

---- /opt/aerospike/graph/data/edges/

|

-------- /opt/aerospike/graph/data/edges/edge_dir1

|

------------ /opt/aerospike/graph/data/edges/edge_dir1/edge_file1.csv

------------ /opt/aerospike/graph/data/edges/edge_dir1/edge_file2.csv
```

#### Cloud storage files example

The following example illustrates the directory structure for a set of CSV files stored in Amazon S3 storage.

-   S3 bucket: `my-bucket`
    
-   Vertex directory: `/my-bucket/vertices`
    
    -   The `aerospike.graphloader.vertices` configuration option must be set to `s3://<bucket-name>/vertices`.
-   Vertex CSV subdirectory 1: `/my-bucket/vertices/vert_dir1`
    
    -   All CSV files in the `vert_dir1` subdirectory must have the same row format.
-   Vertex CSV subdirectory 2: `/my-bucket/vertices/vert_dir2`
    
    -   All CSV files in the `vert_dir2` subdirectory must have the same row format.
-   Edge directory: `/my-bucket/edges`
    
    -   The `aerospike.graphloader.edges` configuration option must be set to `s3://<bucket-name>/edges`.
-   Edge CSV subdirectory 1: `/my-bucket/vertices/edge_dir1`
    
    -   All CSV files in the `edge_dir1` subdirectory must have the same row format.

Visual representation of the directory structure:

```txt
/my-bucket

|

---- /my-bucket/vertices/

|

-------- /my-bucket/vertices/vert_dir1/

|

------------ /my-bucket/vertices/vert_dir1/vert_file1.csv

------------ /my-bucket/vertices/vert_dir1/vert_file2.csv

|

-------- /my-bucket/vertices/vert_dir2/

|

------------ /my-bucket/vertices/vert_dir2/vert_file3.csv

------------ /my-bucket/vertices/vert_dir2/vert_file4.csv

|

---- /my-bucket/edges/

|

-------- /my-bucket/edges/edge_dir1/

|

------------ /my-bucket/edges/edge_dir1/edge_file1.csv

------------ /my-bucket/edges/edge_dir1/edge_file2.csv
```