Skip to main content
Loading

Configure Bulk Request for Aerospike Connect for Elasticsearch

The bulk-request-config section of the aerospike-elasticsearch-outbound.yaml specifies the configuration parameters for BulkRequest. The Elasticsearch outbound connector always sends data to Elasticsearch in a BulkRequest. The connector uses the following parameter with its default values unless you override them or write your own custom batch-formatter/formatter.

OptionRequiredDefaultDescription
pipelineNoElasticsearch's defaultA string value defining an Elasticsearch's pipeline field to use for pre-processing incoming documents.
refreshNoElasticsearch's defaultWhether or not to refresh the affected shards to make this operation visible to search. This parameter defines an Elasticsearch's refresh field. The Refresh Config Options section describes all possible options.
require-aliasNoElasticsearch's defaultBoolean field defining an Elasticsearch's require_alias field for all incoming documents.
shard-routingNoElasticsearch's defaultSpecific shard routing value. The shard routing configuration options section lists all possible options examples.
timeoutNoElasticsearch's defaultString value defining Elasticsearch's timeout field. The Timeout configuration options section lists all possible options examples.
wait-for-active-shardsNoElasticsearch's defaultNumber of shard copies that must be active before proceeding with the bulk operation. See Wait for active shards configuration options section for configuration options.
aerospike-write-operation-mappingNoOperation type indexMapping of an Aerospike XDR's write operation to Elasticsearch operation. See XDR Write operation mapping section for more details.
if-primary-termNoElasticsearch's defaultInteger value defining the constraint that it should only perform the operation if the document has this primary term.
if-seq-noNoElasticsearch's defaultInteger value defining the constraint that it should only perform the operation if the document has this sequence number.
versionNoElasticsearch's defaultLong value defining the explicit version number for concurrency control. The specified version must match the current version of the document for the request to succeed.
version-typeNoElasticsearch's defaultSpecific version type. See Version type configuration options for configuration options.
ignore-aerospike-deleteNofalseBoolean flag that asks not to process Aerospike's delete record to avoid document deletion from Elasticsearch.

Refresh configuration options

An optional refresh field can use one of the three possible options:

OptionDescription
trueRefresh the relevant primary and replica shards (not the whole index) immediately after the operation occurs, so that the updated document appears in search results immediately. This should ONLY be done after careful thought and verification that it does not lead to poor performance, both from an indexing and a search standpoint.
falseTake no refresh related actions. The changes made by this request will be made visible at some point after the request returns.
wait_forWait for the changes made by the request to be made visible by a refresh before replying. This doesn’t force an immediate refresh, rather, it waits for a refresh to happen.

Shard Routing configuration options

This field has the following two properties:

PropertyRequiredDefaultDescription
sourceYesSource for generating Elasticsearch's shard-routing value.
failure-strategyNoUSE_DIGESTWhat to do if Aerospike can’t generate the shard-routing value using the specified source. While not all sources are compatible with this configuration parameter, you can still generate the shard-routing value using those sources.

Source values

The source field accepts the following values:

ValueDescriptionCan configure failure-strategy
noneThe destination system will auto-generate the value or use null.No
system-defaultThe connector sets a different default source. In this case, it is set to source none.No
namespaceUse the record’s namespace as Elasticsearch's shard-routing value.No
setUse the record’s set name as Elasticsearch's shard-routing value.Yes
digestUse the record’s digest as Elasticsearch's shard-routing value.No
user-keyUse the record’s user-key as Elasticsearch's shard-routing value.Yes
bin-valueUse the record’s bin value of the given bin name as Elasticsearch's shard-routing value.Yes
staticUse a static value as Elasticsearch's shard-routing value.No

Failure Strategy values

While not all sources are compatible with this configuration parameter, you can still generate the shard-routing value using those sources.

The failure-strategy field accepts the following values:

ValueDescriptionDetails
USE_DIGESTUse the record’s digest as Elasticsearch's shard-routing value.This value is always available.
FAILSend the temporary error to XDR so that it will retry the record.
IGNORESend the permanent error to XDR so that it will not retry the record.In this situation, the record is never shipped to the destination system.

Examples

You can use any of the failure-strategy values wherever applicable. The following examples show usage of all failure-strategy values at different places.

Source: none

...
bulk-request-config:
shard-routing:
source: none
...

Source: system-default

...
bulk-request-config:
shard-routing:
source: system-default
...

Source: namespace

...
bulk-request-config:
shard-routing:
source: namespace
...

Source: set

...
bulk-request-config:
shard-routing:
source: set
failure-strategy: USE_DIGEST
...

Source: digest

...
bulk-request-config:
shard-routing:
source: digest
...

Source: user-key

...
bulk-request-config:
shard-routing:
source: user-key
failure-strategy: FAIL
...

Source: bin-value

...
bulk-request-config:
shard-routing:
source: bin-value
bin-name: color
failure-strategy: IGNORE
...

Source: static

...
bulk-request-config:
shard-routing:
source: static
value: dummy_doc_id
...

Timeout configuration options

An optional operation timeout field supports following duration units:

UnitExample
m - Minutes1m
s - Seconds10s
ms - Milliseconds200ms
micros - Microseconds10000micros
nanos - Nanoseconds1000000nanos

Example

bulk-request-config:
...
timeout: 200ms
...

namespaces:
players:
bulk-request-config:
...
timeout: 3s
...

Wait for active shards configuration options

An optional wait-for-active-shards field requires two sub-properties:

NameValue
kindDefines how a timeout value should be interpreted. Valid values are option (predefined options by Elasticsearch) and count (an active shard count).
valueThe actual value allowed as per the kind property. It should be one of all or index-setting when kind is an option and a number when kind is a count.

Example

bulk-request-config:
...
# All shards must be active before proceeding the operation.
wait-for-active-shards:
kind: option
value: all
...

namespaces:
players:
bulk-request-config:
...
# Follow index set wait-for-active-shard setting.
wait-for-active-shards:
kind: option
value: index-setting
...
sets:
gujarati:
bulk-request-config:
...
# 2 shards must be active before proceeding the operation.
wait-for-active-shards:
kind: count
value: 2
...

Version type configuration options

An optional version-type field can use one of the four possible options:

OptionDescription
internalThe default versioning scheme provided by Elasticsearch that starts at 1 and increments with each update and delete.
externalThe user defines their own version number. The value provided must be a numeric, long value greater than or equal to 0, and less than around 9.2e+18.
external_gteThe given version should be equal or higher than the version of the stored document. If there is no existing document the operation will succeed as well.
forceThis option is deprecated by Elasticsearch because it can cause primary and replica shards to diverge.

XDR write operation mapping

This configuration property has a required sub-property operation-type which defines how to map XDR's write operation to Elasticsearch operations. See Create and Index operation's optional parameters section for supported optional configuration parameters. There are three options:

Elasticsearch OperationDescription
createMap Aerospike's write operation to Elasticsearch's create operation.
indexMap Aerospike's write operation to Elasticsearch's index operation.
updateMap Aerospike's write operation to Elasticsearch's update operation. See Update operation's optional parameters section for supported optional configuration parameters.

Create and index optional parameters

ParameterDescription
dynamic-templateDefine a custom mappings that can be applied to dynamically added fields based on the matching condition. See dynamic templates for more details.

Example

bulk-request-config:
...
aerospike-write-operation-mapping:
operation-type: index
# Set dynamic-templates for operation-type index.
dynamic-templates: [
{
strings_as_ip: "{\"match_mapping_type\":\"string\",\"match\":\"ip*\",\"runtime\":{\"type\":\"ip\"}}"
}
]
...

namespaces:
players:
bulk-request-config:
...
aerospike-write-operation-mapping:
operation-type: create
# Set dynamic-templates for operation-type create.
dynamic-templates: [
{
integers: "{\"match_mapping_type\":\"long\",\"mapping\":{\"type\":\"integer\"}}"
},
{
strings: "{\"match_mapping_type\":\"string\",\"mapping\":{\"type\":\"text\",\"fields\":{\"raw\":{\"type\":\"keyword\",\"ignore_above\":256}}}}"
}
]
...

Update optional parameters

ParameterDescription
retry-on-conflictAn integer value specifying how many times an update should be retried in the case of a version conflict.
doc-as-upsertInstead of sending a partial doc plus an upsert doc, you can set doc_as_upsert to true to use the contents of doc as the upsert value.

Example

bulk-request-config:
...
aerospike-write-operation-mapping:
operation-type: update
# Set retry-on-conflict and doc-as-upsert for operation-type update.
retry-on-conflict: 3
doc-as-upsert: true
...

Example

...
bulk-request-config:
pipeline: testPipeline
refresh: wait_for
require-alias: true
shard-routing:
source: digest
timeout:
kind: time
value: 2m
wait-for-active-shards:
kind: option
value: index-setting
aerospike-write-operation-mapping:
operation-type: create
dynamic-templates: [
{
strings_as_ip: "{\"match_mapping_type\":\"string\",\"match\":\"ip*\",\"runtime\":{\"type\":\"ip\"}}"
}
]
if-primary-term: 2
if-seq-no: 3
version: 4
version-type: external_gte
ignore-aerospike-delete: true
...