Managing UDFs
Overviewโ
Aerospike provides the asadm command line tool as well as APIs (in C, Java, C# .NET, Node.js, Python, and Go) for managing User-Defined Functions (UDFs) in a cluster.
Management of User-Defined Functions (UDFs) is centered around modules. A module is a file containing one or more User-Defined Functions (UDFs). A module and all its external (non-Aerospike) dependencies must be uploaded and registered with the Aerospike cluster before the UDF can be invoked. A deployment may have one or many modules.
To execute a UDF, you specify the module name, the name of the function within the module, and the arguments that will be passed to the function.
Lifecycle of a UDFโ
When a package is loaded into the server, it is immediately compiled into byte-code and made available to subsequent invocations from clients. All client requests after the update will only use the most recently updated module.
If a client is in the middle of invoking a UDF when its module is removed or updated, it will be able to complete the operation without interruption. Once the function completes, any subsequent invocation will either use the updated function, or fail if the module was removed.
A UDF module can be registered into a cluster using the asadm tool. A module can also be registered by using a client API.
When a UDF module is registered, it is actually replicated to each node in the cluster, then registered by each node. The UDF module will be available when every node registers it. However, since registration is an asynchronous process that occurs within the cluster, there may be a delay between the registration action and the availability of the UDF. You can check the status of the UDF registration using one of the provided tools.
UDF registration should be treated as an administrator operation, and be controlled using normal change control procedures. It should not be continuously performed by applications at run-time.
Management optionsโ
Aerospike provides several options for managing User-Defined Functions (UDFs). You can use the following tools or client APIs:
- asadm โ A command-line utility for executing commands against an Aerospike cluster. Tools package 5.1.1 or greater is required.
- Language Specific API - provides a number of functions that allow you to programmatically manage User-Defined Functions (UDFs) in a cluster. This currently includes:
Module dependenciesโ
For details on Lua Modules, see Lua UDF โ Developing Lua Modules.
- A module and its non-Aerospike dependencies must be uploaded to the cluster. Dependencies can be loaded from any module, as Lua makes use of the
require()
function to indicate module dependencies. - Modules should be registered and maintained as part of administrative operations using a command line tool like aql.
- Managing UDFs from application code in a high frequency manner should be avoided because the operations are somewhat heavy-weight and can impact system performance.
UDF latencyโ
The overall UDF histogram is printed in the log file (/var/log/aerospike/aerospike.log
) every 10 (by default) seconds.
Sep 28 2018 14:58:56 GMT: INFO (info): (hist.c:240) histogram dump: {test}-udf (267911 total) msec
Sep 28 2018 14:58:56 GMT: INFO (info): (hist.c:257) (00: 0000238581) (01: 0000024013) (02: 0000003574) (03: 0000000963)
Sep 28 2018 14:58:56 GMT: INFO (info): (hist.c:257) (04: 0000000572) (05: 0000000174) (06: 0000000033) (07: 0000000001)
In case the system shows very high latency the following things should be checked:
- Make sure Lua caching is enabled (which is the default behavior -- see Lua cache config)
- UDFs generally hold the record lock for a relatively long duration. Check to see if there are hot-keys (a small set of keys that have a lot of UDFs executed on them).
UDF Statisticsโ
You can use asadm
to get statistics for a given namespace. For example, for the namespace test
:
Admin> show stat namespace for test
UDF related stats:
Number of successfully completed record UDF transactions.
Number of unsuccessful record UDF transactions (other than timeouts). This includes transactions that fail before executing the UDF. See the server log file for more information about the error. The error is also returned to the client.
Number of client UDF transactions that did not happen because the record was filtered out using a predicate expression. Database 4.7 and later.
Number of timed out record UDF transactions. The timeout error is returned to the client.
Number of successfully completed record UDF read transactions.
Number of successfully completed record UDF write transactions.
Number of successfully completed record UDF delete transactions.
Number of unsuccessful record UDF transactions that failed during UDF execution.
Number of internal UDF sub-transactions (these are for UDF background scans and queries) unsuccessful in the transaction layer (not including timeouts).
Number of internal UDF sub-transactions that timed out in the transaction layer.
Number of successfully completed internal UDF sub-transactions.
Number of internal UDF sub-transactions that failed during processing (other than timeouts).
Number of udf sub-transactions that did not happen because the record was filtered out using a predicate expression. Database 4.7 and later.
Number of internal UDF sub-transactions that timed out during processing.
Number of successfully completed internal UDF read sub-transactions.
Number of successfully completed internal UDF write sub-transactions.
Number of successfully completed internal UDF delete sub-transactions.
Number of unsuccessful internal UDF sub-transactions that failed during UDF execution.
Number of retransmits that occurred during client initiated UDF transactions that were being duplicate resolved. This includes retransmits originating on the client as well as proxying nodes.
Number of retransmits that occurred during client initiated UDF transactions that were being replica written. This includes retransmits originating on the client as well as proxying nodes.
Number of retransmits that occurred during client initiated udf transactions that were being duplicate resolved. Replaced with retransmit_all_udf_dup_res as of Database 4.5.1.5.
retransmit_client_udf_repl_write
Number of retransmits that occurred during client initiated udf transactions that were being replica written. Replaced with retransmit_all_udf_repl_write as of Database 4.5.1.5.
Number of duplicate resolve retransmissions for UDF sub-transactions.
Number of replica write retransmissions for UDF sub-transactions.
Number of successfully completed scan aggregations.
Number of unsuccessful (non-aborted) scan aggregations.
Number of scan aggregations aborted by the user.
Number of successfully completed UDF background scans.
Number of unsuccessful (non-aborted) UDF background scans.
Number of UDF background scans aborted by the user.
Number of successfully completed UDF background queries. Renamed to query_udf_bg_complete in Database 5.7.
Number of unsuccessful UDF background queries (including aborts). Removed from Database 5.7 onwards. Use query_udf_bg_error + query_udf_bg_abort instead.
Number of successfully completed UDF background queries.
Number of UDF background queries that returned error.
Number of UDF background queries that were aborted.
All of the stats are also available at regular intervals using the ticker in the server log file. The comma-separated values within parentheses for a given UDF stat group are listed in the same order as the descriptions above.
Example log entries:
Nov 09 2018 00:07:11 GMT: INFO (info): (ticker.c:587) {test} client: tsvc (0,0) proxy (0,0,0) read (126,0,1,3,1) write (2886,0,23,2) delete (197,0,1,19,3) udf (35,0,1,4) lang (26,7,0,3)
Nov 09 2018 00:07:11 GMT: INFO (info): (ticker.c:637) {test} batch-sub: tsvc (0,0) proxy (0,0,0) read (768,0,0,41,1)
Nov 09 2018 00:07:11 GMT: INFO (info): (ticker.c:638) early-fail: demarshal 0 tsvc-client 1 tsvc-from-proxy 0 tsvc-batch-sub 0 tsvc-from-proxy-batch-sub 0 tsvc-udf-sub 0 tsvc-ops-sub 0
Nov 09 2018 00:07:11 GMT: INFO (info): (ticker.c:639) {test} from-proxy: tsvc (0,0) read (105,0,1,7) write (2812,0,22,1) delete (188,0,1,16,2) udf (35,0,1,3) lang (26,7,0,3)
Nov 09 2018 00:07:11 GMT: INFO (info): (ticker.c:664) {test} scan: basic (29,0,0) aggr (0,0,0) udf-bg (7,0,0) ops-bg (10,0,0)
Nov 09 2018 00:07:11 GMT: INFO (info): (ticker.c:688) {test} query: basic (20,1) aggr (6,0) udf-bg (1,0) ops-bg (2,0)
Nov 09 2018 00:07:11 GMT: INFO (info): (ticker.c:689) {test} retransmits: migration 0 all-read 0 all-write (0,1) all-delete (0,0) all-udf (0,0) all-batch-sub 0 udf-sub (0,0) ops-sub (0,0)
Nov 09 2018 00:07:11 GMT: INFO (info): (ticker.c:715) {test} udf-sub: tsvc (0,0) udf (2651,0,0,1) lang (52,2498,101,0)
List Registered UDF modulesโ
Using asadm
Admin> show udfs
~~~~~~~~UDF Modules (2021-01-22 23:12:29 UTC)~~~~~~~~~
Filename| Hash|Type
abc.lua |dceaf7f1acddf1d6e12a1752d499d80cfadfc24b|LUA
bar.lua |591d2536acb21a329040beabfd9bfaf110d35c18|LUA
foo.lua |f6eaf2b22d8b29b3597ef1ad9113d0907425ecd0|LUA
Operational notesโ
UDF Modules are stored in the following directory path by default:
/opt/aerospike/usr/udf/lua
You can override this using the server configuration, in the mod-lua block:
mod-lua {
user-path /opt/aerospike/usr/udf/lua
}
You must verify that this directory is in-sync across the cluster. There are a number of tools you can use to manage this, including configuration management tools.
Caching behaviorโ
At startup, a node creates a 10-deep cache of Lua execution states for each registered UDF module. When a UDF runs, it uses a cached state if one is available, otherwise a state is created for it. When the UDF finishes, its state is returned to the cache if the cache contains fewer than 128 entries, otherwise the state is destroyed.