Configure Bulk Request for Aerospike Connect for Elasticsearch
The bulk-request-config
section of the aerospike-elasticsearch-outbound.yaml
specifies the configuration parameters for BulkRequest. The Elasticsearch outbound connector always sends data to Elasticsearch in a BulkRequest. The connector uses the following parameter with its default values unless you override them or write your own custom batch-formatter/formatter.
Option | Required | Default | Description |
---|---|---|---|
pipeline | No | Elasticsearch's default | A string value defining an Elasticsearch's pipeline field to use for pre-processing incoming documents. |
refresh | No | Elasticsearch's default | Whether or not to refresh the affected shards to make this operation visible to search. This parameter defines an Elasticsearch's refresh field. The Refresh Config Options section describes all possible options. |
require-alias | No | Elasticsearch's default | Boolean field defining an Elasticsearch's require_alias field for all incoming documents. |
shard-routing | No | Elasticsearch's default | Specific shard routing value. The shard routing configuration options section lists all possible options examples. |
timeout | No | Elasticsearch's default | String value defining Elasticsearch's timeout field. The Timeout configuration options section lists all possible options examples. |
wait-for-active-shards | No | Elasticsearch's default | Number of shard copies that must be active before proceeding with the bulk operation. See Wait for active shards configuration options section for configuration options. |
aerospike-write-operation-mapping | No | Operation type index | Mapping of an Aerospike XDR's write operation to Elasticsearch operation. See XDR Write operation mapping section for more details. |
if-primary-term | No | Elasticsearch's default | Integer value defining the constraint that it should only perform the operation if the document has this primary term. |
if-seq-no | No | Elasticsearch's default | Integer value defining the constraint that it should only perform the operation if the document has this sequence number. |
version | No | Elasticsearch's default | Long value defining the explicit version number for concurrency control. The specified version must match the current version of the document for the request to succeed. |
version-type | No | Elasticsearch's default | Specific version type. See Version type configuration options for configuration options. |
ignore-aerospike-delete | No | false | Boolean flag that asks not to process Aerospike's delete record to avoid document deletion from Elasticsearch. |
Refresh configuration options
An optional refresh
field can use one of the three possible options:
Option | Description |
---|---|
true | Refresh the relevant primary and replica shards (not the whole index) immediately after the operation occurs, so that the updated document appears in search results immediately. This should ONLY be done after careful thought and verification that it does not lead to poor performance, both from an indexing and a search standpoint. |
false | Take no refresh related actions. The changes made by this request will be made visible at some point after the request returns. |
wait_for | Wait for the changes made by the request to be made visible by a refresh before replying. This doesn’t force an immediate refresh, rather, it waits for a refresh to happen. |
Shard Routing configuration options
This field has the following two properties:
Property | Required | Default | Description |
---|---|---|---|
source | Yes | Source for generating Elasticsearch's shard-routing value. | |
failure-strategy | No | USE_DIGEST | What to do if Aerospike can’t generate the shard-routing value using the specified source. While not all sources are compatible with this configuration parameter, you can still generate the shard-routing value using those sources. |
Source values
The source field accepts the following values:
Value | Description | Can configure failure-strategy |
---|---|---|
none | The destination system will auto-generate the value or use null. | No |
system-default | The connector sets a different default source. In this case, it is set to source none. | No |
namespace | Use the record’s namespace as Elasticsearch's shard-routing value. | No |
set | Use the record’s set name as Elasticsearch's shard-routing value. | Yes |
digest | Use the record’s digest as Elasticsearch's shard-routing value. | No |
user-key | Use the record’s user-key as Elasticsearch's shard-routing value. | Yes |
bin-value | Use the record’s bin value of the given bin name as Elasticsearch's shard-routing value. | Yes |
static | Use a static value as Elasticsearch's shard-routing value. | No |
Failure Strategy values
While not all sources are compatible with this configuration parameter, you can still generate the shard-routing
value using those sources.
The failure-strategy field accepts the following values:
Value | Description | Details |
---|---|---|
USE_DIGEST | Use the record’s digest as Elasticsearch's shard-routing value. | This value is always available. |
FAIL | Send the temporary error to XDR so that it will retry the record. | |
IGNORE | Send the permanent error to XDR so that it will not retry the record. | In this situation, the record is never shipped to the destination system. |
Examples
You can use any of the failure-strategy values wherever applicable. The following examples show usage of all failure-strategy values at different places.
Source: none
...
bulk-request-config:
shard-routing:
source: none
...
Source: system-default
...
bulk-request-config:
shard-routing:
source: system-default
...
Source: namespace
...
bulk-request-config:
shard-routing:
source: namespace
...
Source: set
...
bulk-request-config:
shard-routing:
source: set
failure-strategy: USE_DIGEST
...
Source: digest
...
bulk-request-config:
shard-routing:
source: digest
...
Source: user-key
...
bulk-request-config:
shard-routing:
source: user-key
failure-strategy: FAIL
...
Source: bin-value
...
bulk-request-config:
shard-routing:
source: bin-value
bin-name: color
failure-strategy: IGNORE
...
Source: static
...
bulk-request-config:
shard-routing:
source: static
value: dummy_doc_id
...
Timeout configuration options
An optional operation timeout
field supports following duration units:
Unit | Example |
---|---|
m - Minutes | 1m |
s - Seconds | 10s |
ms - Milliseconds | 200ms |
micros - Microseconds | 10000micros |
nanos - Nanoseconds | 1000000nanos |
Example
bulk-request-config:
...
timeout: 200ms
...
namespaces:
players:
bulk-request-config:
...
timeout: 3s
...
Wait for active shards configuration options
An optional wait-for-active-shards
field requires two sub-properties:
Name | Value |
---|---|
kind | Defines how a timeout value should be interpreted. Valid values are option (predefined options by Elasticsearch) and count (an active shard count). |
value | The actual value allowed as per the kind property. It should be one of all or index-setting when kind is an option and a number when kind is a count . |
Example
bulk-request-config:
...
# All shards must be active before proceeding the operation.
wait-for-active-shards:
kind: option
value: all
...
namespaces:
players:
bulk-request-config:
...
# Follow index set wait-for-active-shard setting.
wait-for-active-shards:
kind: option
value: index-setting
...
sets:
gujarati:
bulk-request-config:
...
# 2 shards must be active before proceeding the operation.
wait-for-active-shards:
kind: count
value: 2
...
Version type configuration options
An optional version-type
field can use one of the four possible options:
Option | Description |
---|---|
internal | The default versioning scheme provided by Elasticsearch that starts at 1 and increments with each update and delete. |
external | The user defines their own version number. The value provided must be a numeric, long value greater than or equal to 0, and less than around 9.2e+18. |
external_gte | The given version should be equal or higher than the version of the stored document. If there is no existing document the operation will succeed as well. |
force | This option is deprecated by Elasticsearch because it can cause primary and replica shards to diverge. |
XDR write operation mapping
This configuration property has a required sub-property operation-type
which defines how to map XDR's write operation to Elasticsearch operations. See Create and Index operation's optional parameters section for supported optional configuration parameters.
There are three options:
Elasticsearch Operation | Description |
---|---|
create | Map Aerospike's write operation to Elasticsearch's create operation. |
index | Map Aerospike's write operation to Elasticsearch's index operation. |
update | Map Aerospike's write operation to Elasticsearch's update operation. See Update operation's optional parameters section for supported optional configuration parameters. |
Create and index optional parameters
Parameter | Description |
---|---|
dynamic-template | Define a custom mappings that can be applied to dynamically added fields based on the matching condition. See dynamic templates for more details. |
Example
bulk-request-config:
...
aerospike-write-operation-mapping:
operation-type: index
# Set dynamic-templates for operation-type index.
dynamic-templates: [
{
strings_as_ip: "{\"match_mapping_type\":\"string\",\"match\":\"ip*\",\"runtime\":{\"type\":\"ip\"}}"
}
]
...
namespaces:
players:
bulk-request-config:
...
aerospike-write-operation-mapping:
operation-type: create
# Set dynamic-templates for operation-type create.
dynamic-templates: [
{
integers: "{\"match_mapping_type\":\"long\",\"mapping\":{\"type\":\"integer\"}}"
},
{
strings: "{\"match_mapping_type\":\"string\",\"mapping\":{\"type\":\"text\",\"fields\":{\"raw\":{\"type\":\"keyword\",\"ignore_above\":256}}}}"
}
]
...
Update optional parameters
Parameter | Description |
---|---|
retry-on-conflict | An integer value specifying how many times an update should be retried in the case of a version conflict. |
doc-as-upsert | Instead of sending a partial doc plus an upsert doc, you can set doc_as_upsert to true to use the contents of doc as the upsert value. |
Example
bulk-request-config:
...
aerospike-write-operation-mapping:
operation-type: update
# Set retry-on-conflict and doc-as-upsert for operation-type update.
retry-on-conflict: 3
doc-as-upsert: true
...
Example
...
bulk-request-config:
pipeline: testPipeline
refresh: wait_for
require-alias: true
shard-routing:
source: digest
timeout:
kind: time
value: 2m
wait-for-active-shards:
kind: option
value: index-setting
aerospike-write-operation-mapping:
operation-type: create
dynamic-templates: [
{
strings_as_ip: "{\"match_mapping_type\":\"string\",\"match\":\"ip*\",\"runtime\":{\"type\":\"ip\"}}"
}
]
if-primary-term: 2
if-seq-no: 3
version: 4
version-type: external_gte
ignore-aerospike-delete: true
...