---
title: "CSV format for source data files"
description: "Learn the required CSV format, headers, and directory structures for Aerospike Graph Service bulk data loading."
---

# CSV format for source data files

> For the complete documentation index see: [llms.txt](https://aerospike.com/docs/llms.txt)
> 
> All documentation pages available in markdown.

## Overview

Source files for bulk data loading into Aerospike Graph Service (AGS) use the comma-separated values (CSV) format described here.

::: note
The AGS bulk loader requires source data files to be stored in a defined directory structure as described in the [Directory Structure](#directory-structure) section of this page.
:::

## Header rows

Each CSV file has a comma-separated header row. Header rows must contain no spaces between delimited columns.

### Vertex data file headers

| Header | Required? | Description |
| --- | --- | --- |
| `~id` | Yes | Unique ID for the vertex. `~id` values may be of data type `String`, `Int`, or `Long`. Number values must be positive whole numbers. |
| `~label` | No | Label for the vertex. If `~label` is not specified, the bulk loader adds a default value of `vertex` for the `~label` field. |

### Edge data file headers

| Header | Required? | Description |
| --- | --- | --- |
| `~from` | Yes | Vertex ID of the _from_ vertex. |
| `~to` | Yes | Vertex ID of the _to_ vertex. |
| `~label` | No | Label for the edge. Each edge can have only one label. If `~label` is not specified, the bulk loader adds a default value of `edge` for the `~label` field. |

::: note
AGS does not support user-provided `~id` values for edges, so the `~id` column is optional for edge CSV files. If your CSV file contains an `~id` column, the values are ignored.
:::

## Property column headers

Specify a header for each data column with the format `propertyname:type`. `propertyname` specifies a name for the data column. `type` specifies a data type for the column. Columns with no data type specified default to type `String`.

### Allowable data types

The following data types are allowed:

| Data type | Allowable values |
| --- | --- |
| `Bool` or `Boolean` | `true`, `false` |
| `Int` | \-2^31 to 2^31-1 |
| `Long` | \-2^63 to 2^63-1 |
| `Double` | 64-bit IEEE 754 floating point |
| `String` | Any string value. Quotation marks are optional. |

### List values

Any property may contain a list of values.

-   All values in the list must be the same data type.
-   Mixed type lists are not supported.
-   To specify a list of values, add `[]` to the data type for the column. Example: `qty:Int[]`.
-   List values are separated by semicolons.
-   Lists of strings are allowed, but the semicolon character cannot be escaped in a list of strings.

## Data row elements

Rows must contain no spaces between delimited elements.

| Element | Description |
| --- | --- |
| Delimiter | Fields are separated by commas. Records are separated by a newline or a newline followed by a carriage return. |
| Blank fields | Non-required columns may be left blank. Blank fields still require comma separators. |
| Vertex IDs | The `~id` value must be unique for all vertices in every vertex file. |
| Edge IDs | User-provided `~id` values for edges are not supported, so the `~id` column is optional for edge CSV files. If your CSV file contains an `~id` column, the values are ignored. |
| Labels | Labels are case sensitive. |
| String values | Surrounding string values with quotation marks is optional. Commas, newline, and carriage return characters are automatically escaped if they are included in a string surrounded by double quotation marks. |

## CSV format specification

The CSV file format follows the RFC 4180 CSV specification, including the following requirements.

-   Both Unix and Windows-style line endings are supported (`\n` or `\r\n`).
    
-   Any field may be surrounded with double quotation marks (`"`).
    
-   Fields containing a line-break, double-quote, or commas must be quoted. If they are not, the load process errors out immediately.
    
-   Blank fields are allowed. A blank field is considered an empty value.
    
-   For list type columns, semicolons are used as list item delimiters
    

For more information, see [Common Format and MIME Type for CSV Files](https://tools.ietf.org/html/rfc4180) on the Internet Engineering Task Force (IETF) website.

## Data file examples

The following example vertex and edge data files illustrate an example graph of university student records.

### Vertex file

The `CourseNum` field in the following example has no specified data type in the header row, so it defaults to type `String` and the data values are all stored as strings.

Data file:

```txt
~id,Name:String,Scores:Int[],Topic:String,Passed:Boolean,CourseNum

v1,"Bob Warner",32;67;21,"Physics",false,201

v2,"Gloria Mendes",41;85;92,"Music",true,"Three Hundred"

v3,"Susan Wolff",77;42;51,"Biology",false,330

v4,"James Halford",67;62;89,"Physics",true,101

v5,"Frieda Wolinsky",57;71;94,"Biology",true,"Two Forty"

v6,"Amy Cheng",28;59;73,"Music",false,101

v7,"Zack Hulot",59;77;93,"History",true,220

v8,"Rafael Kubelik",67;35;28,"History",false,"First Year Seminar"

v9,"Leah Starke",66;82;79,"Biology",true,330

v10,"Amber Florian",68;71;96,"Music",true,102
```

Tabular view of data:

| ~id | Name | Scores | Topic | Passed | CourseNum |
| --- | --- | --- | --- | --- | --- |
| v1 | ”Bob Warner” | 32, 67, 21 | ”Physics” | false | ”201” |
| v2 | ”Gloria Mendes” | 32, 67, 21 | ”Music” | true | ”Three Hundred” |
| v3 | ”Susan Wolff” | 77, 42, 51 | ”Biology” | false | ”330” |
| v4 | ”James Halford” | 67, 62, 89 | ”Physics” | true | ”101” |
| v5 | ”Frieda Wolinsky” | 57, 71, 94 | ”Biology” | true | ”Two Forty” |
| v6 | ”Amy Cheng” | 28, 59, 73 | ”Music” | false | ”101” |
| v7 | ”Zack Hulot” | 59, 77, 93 | ”History” | true | ”220” |
| v8 | ”Rafael Kubelik” | 67, 35, 28 | ”History” | false | ”First Year Seminar” |
| v9 | ”Leah Starke” | 66, 82, 79 | ”Biology” | true | ”330” |
| v10 | ”Amber Florian” | 68, 71, 96 | ”Music” | true | ”102” |

### Edge file

Data file:

```txt
~from,~to,~label,weight:Double

v1,v6,connected,0.7

v2,v9,connected,0.7

v3,v2,connected,0.7

v4,v8,connected,0.7

v5,v3,connected,0.7

v6,v4,connected,0.7

v7,v9,connected,0.7

v8,v1,connected,0.7

v9,v10,connected,0.7

v10,v3,connected,0.7
```

Tabular view of data:

| ~id | ~from | ~to | label | weight |
| --- | --- | --- | --- | --- |
| (Auto-generated by AGS) | v1 | v6 | ”connected” | 0.7 |
| (Auto-generated by AGS) | v2 | v9 | ”connected” | 0.7 |
| (Auto-generated by AGS) | v3 | v2 | ”connected” | 0.7 |
| (Auto-generated by AGS) | v4 | v8 | ”connected” | 0.7 |
| (Auto-generated by AGS) | v5 | v3 | ”connected” | 0.7 |
| (Auto-generated by AGS) | v6 | v4 | ”connected” | 0.7 |
| (Auto-generated by AGS) | v7 | v9 | ”connected” | 0.7 |
| (Auto-generated by AGS) | v8 | v1 | ”connected” | 0.7 |
| (Auto-generated by AGS) | v9 | v10 | ”connected” | 0.7 |
| (Auto-generated by AGS) | v10 | v3 | ”connected” | 0.7 |

## Directory structure

Source data files must be stored in directories specified by the `aerospike.graphloader.vertices` and `aerospike.graphloader.edges` [configuration options](https://aerospike.com/docs/graph/reference/config).

-   The directory specified in `aerospike.graphloader.vertices` must contain one or more subdirectories of vertex CSV files.
    
-   The directory specified in `aerospike.graphloader.edges` must contain one or more subdirectories of edge CSV files.
    
-   The CSV files in any one subdirectory must all contain the same row format, with the same header rows.
    

### Directory structure examples

#### Local files example

The following examples illustrate the directory structures for local files.

-   Data directory: `/opt/aerospike/graph/data`
    
-   Vertex directory: `/opt/aerospike/graph/data/vertices`
    
    -   The `aerospike.graphloader.vertices` configuration option must be set to `/opt/aerospike/graph/data/vertices`.
-   Vertex CSV subdirectory 1: `/opt/aerospike/graph/data/vertices/vert_dir1`
    
    -   All CSV files in the `vert_dir1` subdirectory must have the same row format.
-   Vertex CSV subdirectory 2: `/opt/aerospike/graph/data/vertices/vert_dir2`
    
    -   All CSV files in the `vert_dir2` subdirectory must have the same row format.
-   Edge directory: `/opt/aerospike/graph/data/edges`
    
    -   The `aerospike.graphloader.edges` configuration option must be set to `/opt/aerospike/graph/data/edges`.
-   Edge CSV subdirectory 1: `/opt/aerospike/graph/data/edges/edge_dir1`
    
    -   All CSV files in the `edge_dir1` subdirectory must have the same row format.

Visual representation of the directory structure:

```txt
/opt/aerospike/graph/data

|

---- /opt/aerospike/graph/data/vertices/

|

-------- /opt/aerospike/graph/data/vertices/vert_dir1

|

------------ /opt/aerospike/graph/data/vertices/vert_dir1/vert_file1.csv

------------ /opt/aerospike/graph/data/vertices/vert_dir1/vert_file2.csv

|

-------- /opt/aerospike/graph/data/vertices/vert_dir2

|

------------ /opt/aerospike/graph/data/vertices/vert_dir2/vert_file3.csv

------------ /opt/aerospike/graph/data/vertices/vert_dir2/vert_file4.csv

|

---- /opt/aerospike/graph/data/edges/

|

-------- /opt/aerospike/graph/data/edges/edge_dir1

|

------------ /opt/aerospike/graph/data/edges/edge_dir1/edge_file1.csv

------------ /opt/aerospike/graph/data/edges/edge_dir1/edge_file2.csv
```

#### Cloud storage files example

The following example illustrates the directory structure for a set of CSV files stored in AWS S3 storage.

-   S3 bucket: `myBucket`
    
-   Vertex directory: `/myBucket/vertices`
    
    -   The `aerospike.graphloader.vertices` configuration option must be set to `s3://myBucket/vertices`.
-   Vertex CSV subdirectory 1: `/myBucket/vertices/vert_dir1`
    
    -   All CSV files in the `vert_dir1` subdirectory must have the same row format.
-   Vertex CSV subdirectory 2: `/myBucket/vertices/vert_dir2`
    
    -   All CSV files in the `vert_dir2` subdirectory must have the same row format.
-   Edge directory: `/myBucket/edges`
    
    -   The `aerospike.graphloader.edges` configuration option must be set to `s3://myBucket/edges`.
-   Edge CSV subdirectory 1: `/myBucket/vertices/edge_dir1`
    
    -   All CSV files in the `edge_dir1` subdirectory must have the same row format.

Visual representation of the directory structure:

```txt
/myBucket

|

---- /myBucket/vertices/

|

-------- /myBucket/vertices/vert_dir1/

|

------------ /myBucket/vertices/vert_dir1/vert_file1.csv

------------ /myBucket/vertices/vert_dir1/vert_file2.csv

|

-------- /myBucket/vertices/vert_dir2/

|

------------ /myBucket/vertices/vert_dir2/vert_file3.csv

------------ /myBucket/vertices/vert_dir2/vert_file4.csv

|

---- /myBucket/edges/

|

-------- /myBucket/edges/edge_dir1/

|

------------ /myBucket/edges/edge_dir1/edge_file1.csv

------------ /myBucket/edges/edge_dir1/edge_file2.csv
```