Skip to main content
Loading

Managing UDFs

Overviewโ€‹

Aerospike provides the asadm command line tool as well as APIs (in C, Java, C# .NET, Node.js, Python, and Go) for managing User-Defined Functions (UDFs) in a cluster.

Management of User-Defined Functions (UDFs) is centered around modules. A module is a file containing one or more User-Defined Functions (UDFs). A module and all its external (non-Aerospike) dependencies must be uploaded and registered with the Aerospike cluster before the UDF can be invoked. A deployment may have one or many modules.

To execute a UDF, you specify the module name, the name of the function within the module, and the arguments that will be passed to the function.

Lifecycle of a UDFโ€‹

When a package is loaded into the server, it is immediately compiled into byte-code and made available to subsequent invocations from clients. All client requests after the update will only use the most recently updated module.

If a client is in the middle of invoking a UDF when its module is removed or updated, it will be able to complete the operation without interruption. Once the function completes, any subsequent invocation will either use the updated function, or fail if the module was removed.

A UDF module can be registered into a cluster using the asadm tool. A module can also be registered by using a client API.

When a UDF module is registered, it is actually replicated to each node in the cluster, then registered by each node. The UDF module will be available when every node registers it. However, since registration is an asynchronous process that occurs within the cluster, there may be a delay between the registration action and the availability of the UDF. You can check the status of the UDF registration using one of the provided tools.

UDF registration should be treated as an administrator operation, and be controlled using normal change control procedures. It should not be continuously performed by applications at run-time.

Management optionsโ€‹

Aerospike provides several options for managing User-Defined Functions (UDFs). You can use the following tools or client APIs:

  • asadm โ€“ A command-line utility for executing commands against an Aerospike cluster. Tools package 5.1.1 or greater is required.
  • Language Specific API - provides a number of functions that allow you to programmatically manage User-Defined Functions (UDFs) in a cluster. This currently includes:

Module dependenciesโ€‹

For details on Lua Modules, see Lua UDF โ€“ Developing Lua Modules.

  • A module and its non-Aerospike dependencies must be uploaded to the cluster. Dependencies can be loaded from any module, as Lua makes use of the require() function to indicate module dependencies.
  • Modules should be registered and maintained as part of administrative operations using a command line tool like aql.
  • Managing UDFs from application code in a high frequency manner should be avoided because the operations are somewhat heavy-weight and can impact system performance.

UDF latencyโ€‹

The overall UDF histogram is printed in the log file (/var/log/aerospike/aerospike.log) every 10 (by default) seconds.

Sep 28 2018 14:58:56 GMT: INFO (info): (hist.c:240) histogram dump: {test}-udf (267911 total) msec
Sep 28 2018 14:58:56 GMT: INFO (info): (hist.c:257) (00: 0000238581) (01: 0000024013) (02: 0000003574) (03: 0000000963)
Sep 28 2018 14:58:56 GMT: INFO (info): (hist.c:257) (04: 0000000572) (05: 0000000174) (06: 0000000033) (07: 0000000001)

In case the system shows very high latency the following things should be checked:

  • Make sure Lua caching is enabled (which is the default behavior -- see Lua cache config)
  • UDFs generally hold the record lock for a relatively long duration. Check to see if there are hot-keys (a small set of keys that have a lot of UDFs executed on them).

UDF Statisticsโ€‹

You can use asadm to get statistics for a given namespace. For example, for the namespace test:

Admin> show stat namespace for test

UDF related stats:

All of the stats are also available at regular intervals using the ticker in the server log file. The comma-separated values within parentheses for a given UDF stat group are listed in the same order as the descriptions above.

Example log entries:

Nov 09 2018 00:07:11 GMT: INFO (info): (ticker.c:587) {test} client: tsvc (0,0) proxy (0,0,0) read (126,0,1,3,1) write (2886,0,23,2) delete (197,0,1,19,3) udf (35,0,1,4) lang (26,7,0,3)
Nov 09 2018 00:07:11 GMT: INFO (info): (ticker.c:637) {test} batch-sub: tsvc (0,0) proxy (0,0,0) read (768,0,0,41,1)
Nov 09 2018 00:07:11 GMT: INFO (info): (ticker.c:638) early-fail: demarshal 0 tsvc-client 1 tsvc-from-proxy 0 tsvc-batch-sub 0 tsvc-from-proxy-batch-sub 0 tsvc-udf-sub 0 tsvc-ops-sub 0
Nov 09 2018 00:07:11 GMT: INFO (info): (ticker.c:639) {test} from-proxy: tsvc (0,0) read (105,0,1,7) write (2812,0,22,1) delete (188,0,1,16,2) udf (35,0,1,3) lang (26,7,0,3)
Nov 09 2018 00:07:11 GMT: INFO (info): (ticker.c:664) {test} scan: basic (29,0,0) aggr (0,0,0) udf-bg (7,0,0) ops-bg (10,0,0)
Nov 09 2018 00:07:11 GMT: INFO (info): (ticker.c:688) {test} query: basic (20,1) aggr (6,0) udf-bg (1,0) ops-bg (2,0)
Nov 09 2018 00:07:11 GMT: INFO (info): (ticker.c:689) {test} retransmits: migration 0 all-read 0 all-write (0,1) all-delete (0,0) all-udf (0,0) all-batch-sub 0 udf-sub (0,0) ops-sub (0,0)
Nov 09 2018 00:07:11 GMT: INFO (info): (ticker.c:715) {test} udf-sub: tsvc (0,0) udf (2651,0,0,1) lang (52,2498,101,0)

List Registered UDF modulesโ€‹

Using asadm

Admin> show udfs
~~~~~~~~UDF Modules (2021-01-22 23:12:29 UTC)~~~~~~~~~
Filename| Hash|Type
abc.lua |dceaf7f1acddf1d6e12a1752d499d80cfadfc24b|LUA
bar.lua |591d2536acb21a329040beabfd9bfaf110d35c18|LUA
foo.lua |f6eaf2b22d8b29b3597ef1ad9113d0907425ecd0|LUA

Operational notesโ€‹

UDF Modules are stored in the following directory path by default:

/opt/aerospike/usr/udf/lua

You can override this using the server configuration, in the mod-lua block:

mod-lua {
user-path /opt/aerospike/usr/udf/lua
}

You must verify that this directory is in-sync across the cluster. There are a number of tools you can use to manage this, including configuration management tools.

Caching behaviorโ€‹

At startup, a node creates a 10-deep cache of Lua execution states for each registered UDF module. When a UDF runs, it uses a cached state if one is available, otherwise a state is created for it. When the UDF finishes, its state is returned to the cache if the cache contains fewer than 128 entries, otherwise the state is destroyed.