Aerospike Loader Examples
This page contains usage examples for the asloader tool.
Prerequisitesโ
- A running Aerospike instance.
- The
asloader
tool. If you've installed Aerospike Tools version 6.1.2 or earlier,asloader
is included. Otherwise, you can download anasloader
jar file from its GitHub repository. - A Java runtime such as OpenJDK.
Usageโ
asloader
requires a data file and a configuration file. To try any of the following examples, set up asloader
, copy the data and configuration files into local .dsv
and .json
files respectively, and follow the instructions in Usage.
Example: Data with a header lineโ
You can load delimited data files either with or without a header line with column names. When you specify which set to add a line of data to, it is added as one of its columns.
The following sample data file includes a header line with column names.
The fourth column, set_name
, contains a string specifying the set to which the record is added.
user_location, user_id, last_visited, set_name, age, user_name, user_name_blob, user_rating
IND, userid1, 04/1/2014, facebook, 20, X20, 583230, 8.1
USA, userid2, 03/18/2014, twitter, 27, X2, 5832, 6.4
UK, userid3, 01/9/2014, twitter, 21, X3, 5833, 4.3
UK, userid4, 01/2/2014, facebook, 16, X9, 5839, 5.9
IND, userid5, 08/20/2014, twitter, 37, X10, 583130, 9.3
In the following sample configuration file, the dsv_config
key specifies a comma delimiter for the data file, eight columns of data, and a header line containing field names.
{
"version" : "2.0",
"dsv_config": { "delimiter": "," , "n_columns_datafile": 8, "header_exist": true},
"mappings": [
{
"key": {"column_name":"user_id", "type": "string"},
"set": { "column_name":"set_name", "type": "string"},
"bin_list": [
{"name": "age",
"value": { "column_name": "age", "type" : "integer"}
},
{"name": "location",
"value": { "column_name": "user_location", "type" : "string"}
},
{"name": "name",
"value": { "column_name": "user_name", "type" : "string"}
},
{"name": "name_blob",
"value": { "column_name": "user_name_blob", "type" : "blob", "dst_type" : "blob", "encoding":"hex"}
},
{"name": "recent_visit",
"value": { "column_name": "last_visited", "type" : "timestamp", "encoding":"MM/dd/yy", "dst_type": "integer"}
},
{"name": "rating",
"value": { "column_name": "user_rating", "type" : "float"}
}
]
}
]
}
Config file detailsโ
delimiter
specifies the separator character for the data file.n_columns_datafile
specifies the number of columns in the data file.header_exist
specifies whether a header line is present in the data file.key
specifies the field in each line of data which should be used as the record's key.set
specifies the set to which new records should be added.bin_list
contains an array of bin mappings. In each bin mapping there are two entries: 1) Aerospike bin name and 2) value, which is the bin content mapping. If one column mapping is absent in the configuration file, that column is skipped while loading.- Either
column_name
orcolumn_position
can be used to specify the column. - Native data types
integer
andstring
are stored as-is. - Data types other than native types include the additional fields
dst_type
andencoding
.
- Either
Example: JSON dataโ
The scores
column in this example is type "json"
.
JSON data maps to corresponding Aerospike CDTs.
The "scores"
bin is written to Aerospike as a map with elements "high_score"
and the nested list "others"
.
The delimiter used in this file is *
, because the ,
character is reserved for the JSON data.
The following sample data file includes a header line with column names. The 4th column contains a string specifying the set to which the record is added.
user_location* user_id* last_visited* set_name* age* user_name* user_name_blob* user_rating* scores
IND* userid1* 04/1/2014* facebook* 20* X20* 583230* 8.1* {"high_score": 26, "others": [12, 8, 20]}
USA* userid2* 03/18/2014* twitter* 27* X2* 5832* 6.4* {"high_score": 18, "others": [11, 8, 9]}
UK* userid3* 01/9/2014* twitter* 21* X3* 5833* 4.3* {"high_score": 30, "others": [10, 18, 21]}
UK* userid4* 01/2/2014* facebook* 16* X9* 5839* 5.9* {"high_score": 27, "others": [9, 8, 13]}
IND* userid5* 08/20/2014* twitter* 37* X10* 583130* 9.3* {"high_score": 14, "others": [7, 4, 12]}
The following sample configuration file specifies a *
delimiter for the data file, as
well as nine columns of data and a header line containing field names.
{
"version" : "2.0",
"dsv_config": { "delimiter": "*" , "n_columns_datafile": 9, "header_exist": true},
"mappings": [
{
"key": {"column_name":"user_id", "type": "string"},
"set": { "column_name":"set_name", "type": "string"},
"bin_list": [
{"name": "age",
"value": { "column_name": "age", "type" : "integer"}
},
{"name": "location",
"value": { "column_name": "user_location", "type" : "string"}
},
{"name": "name",
"value": { "column_name": "user_name", "type" : "string"}
},
{"name": "name_blob",
"value": { "column_name": "user_name_blob", "type" : "blob", "dst_type" : "blob", "encoding":"hex"}
},
{"name": "recent_visit",
"value": { "column_name": "last_visited", "type" : "timestamp", "encoding":"MM/dd/yy", "dst_type": "integer"}
},
{"name": "rating",
"value": { "column_name": "user_rating", "type" : "float"}
},
{
"name": "scores",
"value": { "column_name": "scores", "type" : "json"}
}
]
}
]
}
Example: Data without a header lineโ
The following example uses a data file with no header information in the first line.
If your data file does not have a header line, use column_position
for bin mapping.
Data file contentโ
IND, userid1, 04/1/2014, 20, X20, 583230, 8.1
USA, userid2, 03/18/2014, 27, X2, 5832, 6.4
UK, userid3, 01/9/2014, 21, X3, 5833, 4.3
UK, userid4, 01/2/2014, 16, X9, 5839, 5.9
IND, userid5, 08/20/2014, 37, X10, 583130, 9.3
Config file contentโ
{
"version" : "2.0",
"dsv_config": { "delimiter": "," , "n_columns_datafile": 7, "header_exist": false},
"mappings": [
{
"key": {"column_position": 2, "type": "string"},
"set": "my_set",
"bin_list": [
{"name": "age",
"value": { "column_position": 4, "type" : "integer"}
},
{"name": "location",
"value": { "column_position": 1, "type" : "string"}
},
{"name": "name",
"value": { "column_position": 5, "type" : "string"}
},
{"name": "name_blob",
"value": { "column_position": 6, "type" : "blob", "dst_type" : "blob" , "encoding":"hex"}
},
{"name": "recent_visit",
"value": { "column_position": 3, "type" : "timestamp", "encoding":"MM/dd/yy", "dst_type": "integer"}
},
{"name": "rating",
"value": { "column_position": 7, "type" : "float"}
}
]
}
]
}
Config file detailsโ
- There is no header information in the data file, so
header_exist
is false. - Each bin mapping is specified with
column_position
. - The
set
parameter is static. All new records are written to the setmy_set
.
Example: Add a static valueโ
To add a static value to every new record created from a data file, add a new bin to the bin mapping in the configuration file.
In the following example configuration file, the value "my_value"
is added to a bin called static_value
in every new record created.
{
"version" : "2.0",
"dsv_config": { "delimiter": "," , "n_columns_datafile": 7, "header_exist": false},
"mappings": [
{
"key": {"column_position": 2, "type": "string"},
"set": "my_set",
"bin_list": [
{"name": "age",
"value": { "column_position": 4, "type" : "integer"}
},
{"name": "location",
"value": { "column_position": 1, "type" : "string"}
},
{"name": "name",
"value": { "column_position": 5, "type" : "string"}
},
{"name": "name_blob",
"value": { "column_position": 6, "type" : "blob", "dst_type" : "blob" , "encoding":"hex"}
},
{"name": "recent_visit",
"value": { "column_position": 3, "type" : "timestamp", "encoding":"MM/dd/yy", "dst_type": "integer"}
},
{"name": "rating",
"value": { "column_position": 7, "type" : "float"}
},
{"name": "static_value",
"value": "my_value"
}
]
}
]
}
Try the following data file with the above config file:
IND, userid1, 04/1/2014, 20, X20, 583230, 8.1
USA, userid2, 03/18/2014, 27, X2, 5832, 6.4
UK, userid3, 01/9/2014, 21, X3, 5833, 4.3
UK, userid4, 01/2/2014, 16, X9, 5839, 5.9
IND, userid5, 08/20/2014, 37, X10, 583130, 9.3
Example: Add a timestampโ
To add a timestamp with the current system time to every new record created from a data file, add a new bin to the bin mapping in the configuration file.
In the following example configuration file, a new bin called write_time
is added to every new record with the current Unix time since epoch.
{
"version" : "2.0",
"dsv_config": { "delimiter": "," , "n_columns_datafile": 7, "header_exist": false},
"mappings": [
{
"key": {"column_position": 2, "type": "string"},
"set": "my_set",
"bin_list": [
{"name": "age",
"value": { "column_position": 4, "type" : "integer"}
},
{"name": "location",
"value": { "column_position": 1, "type" : "string"}
},
{"name": "name",
"value": { "column_position": 5, "type" : "string"}
},
{"name": "name_blob",
"value": { "column_position": 6, "type" : "blob", "dst_type" : "blob" , "encoding":"hex"}
},
{"name": "recent_visit",
"value": { "column_position": 3, "type" : "timestamp", "encoding": "MM/dd/yy", "dst_type": "integer"}
},
{"name": "rating",
"value": { "column_position": 7, "type" : "float"}
},
{
"name": "write_time",
"value": { "column_name": "system_time", "type" : "timestamp", "encoding": "MM/dd/yy HH:mm:ss", "dst_type": "integer"}
}
]
}
]
}
Try the following data file with the above config file:
IND, userid1, 04/1/2014, 20, X20, 583230, 8.1
USA, userid2, 03/18/2014, 27, X2, 5832, 6.4
UK, userid3, 01/9/2014, 21, X3, 5833, 4.3
UK, userid4, 01/2/2014, 16, X9, 5839, 5.9
IND, userid5, 08/20/2014, 37, X10, 583130, 9.3