Aerospike Loader Examples

This page contains usage examples for the asloader tool.

Prerequisites

A running Aerospike instance.
The asloader tool. If you’ve installed Aerospike Tools version 6.1.2 or earlier, asloader is included. Otherwise, you can download an asloader jar file from its GitHub repository.
A Java runtime such as OpenJDK.

Usage

asloader requires a data file and a configuration file. To try any of the following examples, set up asloader, copy the data and configuration files into local .dsv and .json files respectively, and follow the instructions in Usage.

Example: Data with a header line

You can load delimited data files either with or without a header line with column names. When you specify which set to add a line of data to, it is added as one of its columns.

The following sample data file includes a header line with column names. The fourth column, set_name, contains a string specifying the set to which the record is added.

user_location, user_id, last_visited, set_name, age, user_name, user_name_blob, user_rating
IND, userid1, 04/1/2014, facebook, 20, X20, 583230, 8.1
USA, userid2, 03/18/2014, twitter, 27, X2, 5832, 6.4
UK, userid3, 01/9/2014, twitter, 21, X3, 5833, 4.3
UK, userid4, 01/2/2014, facebook, 16, X9, 5839, 5.9
IND, userid5, 08/20/2014, twitter, 37, X10, 583130, 9.3

In the following sample configuration file, the dsv_config key specifies a comma delimiter for the data file, eight columns of data, and a header line containing field names.

{
  "version" : "2.0",
  "dsv_config": { "delimiter": "," , "n_columns_datafile": 8, "header_exist": true},

  "mappings": [
      {
          "key": {"column_name":"user_id", "type": "string"},

          "set": { "column_name":"set_name", "type": "string"},

          "bin_list": [
            {"name": "age",
             "value": { "column_name": "age", "type" : "integer"}
            },
            {"name": "location",
             "value": { "column_name": "user_location", "type" : "string"}
            },
            {"name": "name",
             "value": { "column_name": "user_name", "type" : "string"}
            },
            {"name": "name_blob",
             "value": { "column_name": "user_name_blob", "type" : "blob", "dst_type" : "blob", "encoding":"hex"}
            },
            {"name": "recent_visit",
             "value": { "column_name": "last_visited", "type" : "timestamp", "encoding":"MM/dd/yy", "dst_type": "integer"}
            },
            {"name": "rating",
             "value": { "column_name": "user_rating", "type" : "float"}
            }
          ]
      }
  ]
}

Config file details

delimiter specifies the separator character for the data file.
n_columns_datafile specifies the number of columns in the data file.
header_exist specifies whether a header line is present in the data file.
key specifies the field in each line of data which should be used as the record’s key.
set specifies the set to which new records should be added.
bin_list contains an array of bin mappings. In each bin mapping there are two entries:
1. Aerospike bin name and 2) value, which is the bin content mapping. If one column mapping is absent in the configuration file, that column is skipped while loading.
- Either column_name or column_position can be used to specify the column.
- Native data types integer and string are stored as-is.
- Data types other than native types include the additional fields dst_type and encoding.

Example: JSON data

The scores column in this example is type "json". JSON data maps to corresponding Aerospike CDTs. The "scores" bin is written to Aerospike as a map with elements "high_score" and the nested list "others".

The following sample data file includes a header line with column names. The 4th column contains a string specifying the set to which the record is added.

user_location* user_id* last_visited* set_name* age* user_name* user_name_blob* user_rating* scores
IND* userid1* 04/1/2014* facebook* 20* X20* 583230* 8.1* {"high_score": 26, "others": [12, 8, 20]}
USA* userid2* 03/18/2014* twitter* 27* X2* 5832* 6.4* {"high_score": 18, "others": [11, 8, 9]}
UK* userid3* 01/9/2014* twitter* 21* X3* 5833* 4.3* {"high_score": 30, "others": [10, 18, 21]}
UK* userid4* 01/2/2014* facebook* 16* X9* 5839* 5.9* {"high_score": 27, "others": [9, 8, 13]}
IND* userid5* 08/20/2014* twitter* 37* X10* 583130* 9.3* {"high_score": 14, "others": [7, 4, 12]}

The following sample configuration file specifies a * delimiter for the data file, as well as nine columns of data and a header line containing field names.

{
  "version" : "2.0",
  "dsv_config": { "delimiter": "*" , "n_columns_datafile": 9, "header_exist": true},

  "mappings": [
      {
          "key": {"column_name":"user_id", "type": "string"},

          "set": { "column_name":"set_name", "type": "string"},

          "bin_list": [
            {"name": "age",
             "value": { "column_name": "age", "type" : "integer"}
            },
            {"name": "location",
             "value": { "column_name": "user_location", "type" : "string"}
            },
            {"name": "name",
             "value": { "column_name": "user_name", "type" : "string"}
            },
            {"name": "name_blob",
             "value": { "column_name": "user_name_blob", "type" : "blob", "dst_type" : "blob", "encoding":"hex"}
            },
            {"name": "recent_visit",
             "value": { "column_name": "last_visited", "type" : "timestamp", "encoding":"MM/dd/yy", "dst_type": "integer"}
            },
            {"name": "rating",
             "value": { "column_name": "user_rating", "type" : "float"}
            },
            {
              "name": "scores",
              "value": {  "column_name": "scores", "type" : "json"}
            }
          ]
      }
  ]
}

Example: Data without a header line

The following example uses a data file with no header information in the first line. If your data file does not have a header line, use column_position for bin mapping.

Data file content

IND, userid1, 04/1/2014, 20, X20, 583230, 8.1
USA, userid2, 03/18/2014, 27, X2, 5832, 6.4
UK, userid3, 01/9/2014, 21, X3, 5833, 4.3
UK, userid4, 01/2/2014, 16, X9, 5839, 5.9
IND, userid5, 08/20/2014, 37, X10, 583130, 9.3

Config file content

{
  "version" : "2.0",
  "dsv_config": { "delimiter": "," , "n_columns_datafile": 7, "header_exist": false},

  "mappings": [
        {
          "key": {"column_position": 2, "type": "string"},

          "set": "my_set",

          "bin_list": [
            {"name": "age",
             "value": { "column_position": 4, "type" : "integer"}
            },
            {"name": "location",
             "value": { "column_position": 1, "type" : "string"}
            },
            {"name": "name",
             "value": { "column_position": 5, "type" : "string"}
            },
            {"name": "name_blob",
             "value": { "column_position": 6, "type" : "blob", "dst_type" : "blob" , "encoding":"hex"}
            },
            {"name": "recent_visit",
             "value": { "column_position": 3, "type" : "timestamp", "encoding":"MM/dd/yy", "dst_type": "integer"}
            },
            {"name": "rating",
             "value": { "column_position": 7, "type" : "float"}
            }
          ]
        }
    ]
}

Config file details

There is no header information in the data file, so header_exist is false.
Each bin mapping is specified with column_position.
The set parameter is static. All new records are written to the set my_set.

Example: Add a static value

To add a static value to every new record created from a data file, add a new bin to the bin mapping in the configuration file. In the following example configuration file, the value "my_value" is added to a bin called static_value in every new record created.

{
  "version" : "2.0",
  "dsv_config": { "delimiter": "," , "n_columns_datafile": 7, "header_exist": false},

  "mappings": [
        {
          "key": {"column_position": 2, "type": "string"},

          "set": "my_set",

          "bin_list": [
            {"name": "age",
             "value": { "column_position": 4, "type" : "integer"}
            },
            {"name": "location",
             "value": { "column_position": 1, "type" : "string"}
            },
            {"name": "name",
             "value": { "column_position": 5, "type" : "string"}
            },
            {"name": "name_blob",
             "value": { "column_position": 6, "type" : "blob", "dst_type" : "blob" , "encoding":"hex"}
            },
            {"name": "recent_visit",
             "value": { "column_position": 3, "type" : "timestamp", "encoding":"MM/dd/yy", "dst_type": "integer"}
            },
            {"name": "rating",
             "value": { "column_position": 7, "type" : "float"}
            },
            {"name": "static_value",
             "value": "my_value"
            }
          ]
        }
    ]
}

Try the following data file with the above config file:

IND, userid1, 04/1/2014, 20, X20, 583230, 8.1
USA, userid2, 03/18/2014, 27, X2, 5832, 6.4
UK, userid3, 01/9/2014, 21, X3, 5833, 4.3
UK, userid4, 01/2/2014, 16, X9, 5839, 5.9
IND, userid5, 08/20/2014, 37, X10, 583130, 9.3

Example: Add a timestamp

To add a timestamp with the current system time to every new record created from a data file, add a new bin to the bin mapping in the configuration file. In the following example configuration file, a new bin called write_time is added to every new record with the current Unix time since epoch.

{
  "version" : "2.0",
  "dsv_config": { "delimiter": "," , "n_columns_datafile": 7, "header_exist": false},

  "mappings": [
        {
          "key": {"column_position": 2, "type": "string"},

          "set": "my_set",

          "bin_list": [
            {"name": "age",
             "value": { "column_position": 4, "type" : "integer"}
            },
            {"name": "location",
             "value": { "column_position": 1, "type" : "string"}
            },
            {"name": "name",
             "value": { "column_position": 5, "type" : "string"}
            },
            {"name": "name_blob",
             "value": { "column_position": 6, "type" : "blob", "dst_type" : "blob" , "encoding":"hex"}
            },
            {"name": "recent_visit",
             "value": { "column_position": 3, "type" : "timestamp", "encoding": "MM/dd/yy", "dst_type": "integer"}
            },
            {"name": "rating",
             "value": { "column_position": 7, "type" : "float"}
            },
            {
             "name": "write_time",
             "value": { "column_name": "system_time", "type" : "timestamp", "encoding": "MM/dd/yy HH:mm:ss", "dst_type": "integer"}
            }
          ]
        }
    ]
}

Try the following data file with the above config file:

IND, userid1, 04/1/2014, 20, X20, 583230, 8.1
USA, userid2, 03/18/2014, 27, X2, 5832, 6.4
UK, userid3, 01/9/2014, 21, X3, 5833, 4.3
UK, userid4, 01/2/2014, 16, X9, 5839, 5.9
IND, userid5, 08/20/2014, 37, X10, 583130, 9.3