Aerospike Loader (asloader)
Aerospike Loader (asloader
) migrates data from another database to Aerospike. You provide .DSV data files and an Aerospike schema file, or config file, in JSON format.
asloader
parses the .DSV files and loads the data into the Aerospike cluster according to your schema.
Prerequisitesโ
- Java 1.8 or later
- Maven 3.0 or later
Installationโ
asloader
is available:
- As a jar file from https://github.com/aerospike/aerospike-loader/releases.
- As source code on GitHub. To install, use the following command-line instructions:
git clone https://github.com/aerospike/aerospike-loader.git
cd aerospike-loader
./build
For releases prior to Aerospike Tools 6.2, asloader
is bundled as part of the Aerospike Tools package.
Dependenciesโ
The following dependencies are downloaded automatically:
- Aerospike Java client 6.1.6 or later
- Apache Commons CLI 1.2
- Log4j 2.17.1
- Junit 4.4
- Json-simple 1.1.1
Loader thread architectureโ
The loader uses reader and writer threads.
- Reader threads read data files. The number of reader threads is equal to either the number of CPUs or the number of files in the directory, whichever is fewer.
- Writer threads write to the cluster. The number of writer threads is equal to the number of CPUs multiplied by a scale factor of 5.
Usageโ
If you downloaded the jar file from the releases page, use
java -cp aerospike-load-*-jar-with-dependencies.jar com.aerospike.load.AerospikeLoad <options> <data file name(s)/directory>
If you downloaded the source, use the run_loader script in the root directory of the source folder. Pass the options and data files to the script as options. See options for more details.
run_loader <options> <data file name(s)/directory>
The data file name or directory can either be space-delimited files or a directory name containing data files.
Data Filesโ
Sample data file:โ
user_location##user_id##last_visited##community
India##1##08/16/2011##facebook
India##2##08/17/2011##Twitter
USA##3##08/16/2011##Twitter
This example contains a header row and uses the delimiter ##
.
The config file describes the data file structure for asloader
to interpret.
See Examples for sample data and config files with various attributes.
Supported Data Types:โ
Data Type | Description | Example |
---|---|---|
Integer | 123456 | |
Float | 0.345 | |
String | "Aerospike" | |
Blob | Binary fields that are hex encoded are stored as blobs. | Hex encoded "abc" as 616263 . |
Timestamp | Timestamp data stored as a string or integer. | "1-1-1970" or -19800 (seconds referenced to UTC) |
JSON | Any standard JSON file. Lists and maps are interpreted as JSON. | List: ["a", "b", ["c", "d"]] , Map: {"a": "b", "c": {"d", "e"}} |
GeoJSON | Aerospike supports the GeoJSON datatype natively. It can be stored in its standard format. | {"type": "Point", "coordinates": [123.4, -456.7]} |
Data files that contain any JSON data should not use these JSON-specific characters '}', ']', ',', ':'...
as delimiters. Data inside double quotes " "
is not interpreted as containing possible delimiters. DSV is supported, so you can use any delimiter.
Timestamp data should be formatted consistently and always appear in double quotes. For best practices in timestamp formatting, see Oracle SimpleDateFormat.
Sample config and data fileโ
The example
directory in the GitHub repository contains a sample config file alldatatype.json
and data file alldatatype.dsv
.
Run the following command to load the configuration and data:
run_loader -h localhost -c example/alldatatype.json example/alldatatype.dsv
For information about additional data file structures, see Examples.
Command line optionsโ
Options | Description | Default |
---|---|---|
-h <hosts> | List of seed hosts where Aerospike servers are running. | 127.0.0.1 |
-p <port> | Port to use with the host specified in the -h option. | 3000 |
-U <user> | Username. | |
-P <password> | Password. | |
-n <namespace> | Namespace to load data into. | test |
-c <config> | JSON-formatted configuration file specifying parsing attributes and schema mapping. | |
-g <max-throughput> | Maximum target transactions-per-second for the loader. | 0 (no throttling) |
-T <transaction-timeout> | (In milliseconds) Timeout for a transaction during write operation. | 0 (no timeout) |
-e <expiration-time> | Expiration time of records in seconds. Other valid values: -1 for records to never expire 0 to use the server default | -1 |
-tz <timezone> | Time zone of data backup source. Used when loading data of timestamp datatype. For example, if data backup location timezone is X, and that data is destined for a server in Y timezone, then specify X's timezone. Valid values are standard three-letter codes such as PST, EST, etc. | local timezone |
-ec <abort-error-count> | Error threshold to determine when the loader should stop loading data. 0 ignores the threshold. | 0 |
-wa <write-action> | Possible values: 1) UPDATE - Create or update records. Merge incoming bin values with existing values. 2) UPDATE_ONLY - Update existing records. Fail if record does not exist. Merge incoming bin values with existing values. 3) REPLACE - Create or replace existing records. 4) REPLACE_ONLY - Replace existing records. Fail if record does not exist. 5) CREATE_ONLY - Create new records. Fail if record already exists. | UPDATE |
-tls <tls-enable> | Use TLS/SSL sockets. | False |
-tlsLoginOnly | Use TLS/SSL sockets on node login only. | False |
-tp <tls-protocols> | Allow TLS protocols. Values: TLSv1,TLSv1.1,TLSv1.2 separated by comma. | TLSv1.2 |
-auth | Authentication mode, which can be set to INTERNAL, EXTERNAL, EXTERNAL_INSECURE, or PKI. These options correspond to the Aerospike Java client authentication modes. | |
-tlsCiphers <tls-cipher-suite> | Allow TLS cipher suites. Values: cipher names defined by JVM separated by comma. | null (default cipher list provided by JVM) |
-tr <tls-revoke> | Revoke certificates identified by their serial number. Values: serial numbers separated by comma. | null (Do not revoke certificates) |
-um <unorderedMaps> | If this flag is present, write all maps as unordered maps. | |
-uk <send-user-key> | Send user defined key in addition to hash digest to store on the server. (default: userKey is not sent to reduce meta-data overhead). | |
-v | Verbose mode. If this option is specified, verbose mode is enabled and additional information is displayed on the console. | DISABLED |
-u | Display command usage. | |
-V | Print the asloader version. |
Exampleโ
run_loader -h nodex -p 3000 -n test -T 3000 -e 2592000 -ec 100 -tz PST -wa update -c ~/pathto/config.json datafiles/
Server IP: nodex (-h)
Port: 3000 (-p)
Namespace: test (-n)
Write Operation Timeout (in milliseconds): 3000 (-T)
Write Error Threshold: 100 (-ec)
Record Expiration: 2592000 (-e)
Timezone: PST (-tz)
Write Action: update (-wa)
Data Mapping: ~/pathto/config.json (-c)
Data Files: datafiles/