Skip to content
Visit booth 3171 at Google Cloud Next to see how to unlock real-time decisions at scaleMore info

User-defined functions (UDFs)

This page describes user-defined functions (UDFs), code provided by a user or developer that runs inside the Aerospike database server.

Overview

You can write code to extend the functionality and performance of the Aerospike Database engine. Aerospike supports only Lua for writing UDFs.

Why Lua?

Lua is a powerful, fast, lightweight, embeddable scripting language. Lua combines simple procedural syntax with powerful data description constructs based on associative arrays and extensible semantics. To reduceg latency and increase performance, Aerospike creates a queue of Lua contexts with no more than one per thread per registered UDF.

For more information, see the Aerospike UDF guide

Record UDFs

Record UDFs execute on a single database record. They can create, update, or delete a record.

The Record UDF API provides:

  • Access to the record object, including its bins and metadata (generation, TTL)
  • Easy manipulation of collection data types (CDTs), such as Map and List
  • Easy binary access to blob data types

Record UDFs execute in the transaction main flow. Every Record UDF is part of a flow that accesses a row in the database, creates the Lua Record object, and invokes the Lua function to allows the UDF to operate. Sometimes the record does not exist, which allows the UDF to create a record, and allows record manipulation.

For more information see the Aerospike UDF guide section on Record UDFs.

Stream UDFs

Stream UDFs perform read-only operations on a collection of records. The Stream UDF system allows a MapReduce style of programming, which is used for common, highly parallel MapReduce jobs such as word count—each row is accessed and a list of words and their counts are emitted. The top results are calculated in a reduce phase. For simple aggregations where counters are simply incremented in a context instead of continually creating and destroying objects, Aerospike provides optimal implementation.

An important difference between the Aerospike Stream UDF system and typical MapReduce frameworks is that the Aerospike Stream UDF is very low latency and has high reliability in a shared-nothing architecture.

Instead of coordinated job control, Stream UDF queries are sent to all cluster nodes by a requesting client. They are managed and prioritized by each server. Results return to the requesting client, which performs final operations, such as reduce or final aggregation, before returning the results to the client.

Although this can result in high memory use on the client, the server cluster is not disrupted by a poorly formed query. If lightweight app servers need to make requests that demand a large final reduce stage, you can add an intermediate application server to use a REST or SOAP API. This acts like a coordinating agent, and that server can have a memory profile that fits the application load.

For more information see the Aerospike UDF guide section on Stream UDFs.

Managing UDF code in a cluster

UDFs are managed by the system metadata (SMD) component of the cluster. When a Lua module is registered, it is sent to only one node. That node forwards the request to the current cluster principal, who compares this incoming version with previous versions. If the version is newer, it persists the file to local storage in a user directory and sends it to the remaining cluster nodes.

During registration, the file is interpreted. This allows immediate detection of simple parsing errors. Those errors are returned by the registration API.

See the Aerospike UDF guide section on managing UDFs.

Invoking UDFs

UDFs are organized into modules, where a module is a file containing Lua code that was registered with the database. To invoke a UDF from the client the developer will

  1. Specify the UDF module name.
  2. Specify the function name within the module to execute.
  3. Specify arguments to the UDF (optional).

Multicore

Many other interpretive languages can only run one execution thread per process. For example, CPython uses globals in its code base, which greatly reduces the ability to run multiple Python contexts per process.

Gateway to C

Lua code can call C functions directly. Although the overhead to do so is measurable, it is simple. Aerospike provides examples of compiling and registering a shared C object, then calling one of the C functions directly from a Lua UDF module.

For more information see Using C functions in UDF modules.

Protection and sandboxing

The Aerospike UDFs operate in-process. This maximizes performance, but poorly written UDFs can cause performance problems. Aerospike provides the ability to prevent similar errors in UDFs. Infinite loops are protected through limiting the amount of time spent in a UDF.

Stored procedures vs. UDFs

Stored procedures are commonly used in database systems. Stored procedures are similar to UDFs in that they are a user application program stored and run in a Relational Database Management System (RDBMS). Stored procedures can read or write one or more records, and so are general-purpose mechanisms.

UDFs are more limited. They operate either on a single record as Record UDFs, or a selected stream of records as Stream UDFs. Record UDFs behave like a traditional UDF. Stream UDFs manage multiple records, somewhat like stored procedures.

Feedback

Was this page helpful?

What type of feedback are you giving?

What would you like us to know?

+Capture screenshot

Can we reach out to you?