Skip to main content
Loading

Configuration Properties for Aerospike Connect for Trino

Use these configuration properties in the aerospike.properties file to specify how the Trino connector should interact with your Aerospike database.

note

Help with tuning Trino for performance improvements is beyond the scope of this documentation.

To use these configuration properties if you plan to use Docker, you must use the --volume or -v option when you run the docker run command. Use this option to mount the aerospike.properties file by providing the path to the folder that contains this file. The default path is trino-aerospike.docker/docker/etc/catalog.

In general, when you use Docker you cannot both set environment variables and set these configuration properties. The only exception is when you use TRINO_DISCOVERY_URI and TRINO_NODE_TYPE. You can set those environment variables using the -e option and also mount the aerospike.properties file using the -v option in the same docker run command.

The following sets of configuration properties are available:

Basic configuration properties​

aerospike.cache-ttl-ms

Description: Number of milliseconds to keep the inferred schema cached. If you specify the schema, this property has no effect.

Default value: 1800000


aerospike.clientpolicy.clusterName

Description: Name of the Aerospike cluster, if a name has been configured for it.

Default value: null


aerospike.clientpolicy.maxSocketIdle

Description: Maximum socket idle in seconds. Socket connection pools will discard sockets that have been idle longer than the maximum.

Default value: 55


aerospike.clientpolicy.timeout

Description: How long the connector should wait (in milliseconds) for a response from your Aerospike database when the connector tries to make its initial connection to it.

Default value: 1000


aerospike.clientpolicy.failIfNotConnected

Description: Set to true for the connector to throw an exception if it is unable to connect to any seed nodes for an Aerospike database.

Default value: true


aerospike.clientpolicy.sharedThreadPool

Description: Is threadPool shared between other client instances or classes. If threadPool is not shared (default), threadPool shuts down when the client instance closes.

Default value: false


aerospike.clientpolicy.useServicesAlternate

Description: Use "services-alternate" instead of "services" in info request during cluster tending.

Default value: false


aerospike.default-set-name

Description: Table name for the default set. Use this environment variable when your namespace has a null set or no sets. If you have multiple namespaces with no sets in your cluster, you can query them, like this:

select * from <namespace_1>.<value>
select * from <namespace_2>.<value>

Where <value> is the value assigned to DEFAULT_SET_NAME.

Default value: __default


aerospike.hostlist

Description: Comma-separated list of seed nodes in the Aerospike cluster.

Default value for non-virtual server environments: null

For standalone deployments in Dockerized environments: On MacOS, use docker.for.mac.host.internal:3000. On Linux operating systems, use localhost:3000


aerospike.insert-require-key

Description: Require the primary key (PK) on INSERT queries. Although we recommend that you provide a primary key, you can choose not to by setting this property to false, in which case a UUID is generated for the PK. You can view PKs by setting aerospike.record-key-hidden to false for future queries.

Default value: true


aerospike.record-key-name

Description: Column name for the record's primary key. You can use this name in the queries for projection and/or predicates.

Default value: __key


aerospike.record-key-hidden

Description: If set to false, the primary key column is made available in the result set.

Default value: true


aerospike.record-digest-name

Description: Column name for the record's digest. You can use this name in the queries for projection and/or predicates, and must be specified as a string.

Default value: __digest


aerospike.record-digest-hidden

Description: If set to false, the record's digest column will be available in the result set.

Default value: true


aerospike.strict-schemas

Description: Use a strict schema. See "Strict schemas".

Default value: false


aerospike.table-desc-dir

Description: Path of the directory containing table description files. Do not use this configuration parameter for Dockerized environments; instead, use the --volume or -v option in the docker run command, as explained in "Deploying in Standalone Mode in Docker" and "Deploying in Distributed Mode in Docker".

Default value: <trino_directory>/etc/catalog


aerospike.case-insensitive-identifiers

Description: Use this property when you have a namespace or set name with mixed case types in the Aerospike database. This property will help resolve the tables/schemas case-insensitivity issue inherent in Trino that converts all names to lowercase. If turned on, you will be able to use supported SQL statements correctly against table names with mixed case types, e.g. “deepLearning”.

note
  • It does not support two sets with the same name but differing in case types within the same namespace, e.g. sets named “deepLearning” and “deeplearning”.
  • It only works with Trino version 360 or greater.
  • Using it may have some performance implications, hence use it only when you have set names with mixed case types in the Aerospike database.
  • Although the output of SHOW TABLES and DESCRIBE statements are lower case names regardless of whether mixed case naming is used in Aerospike database, you should be able to correctly use SELECT and other statements either with mixed case or lower case names. For example, if deepLearning and Score are the names of the set and the bin names that are used in the Aerospike database, SELECT Score FROM deepLearning; should work fine. That is despite the fact that set and the column names show up in lower case in the output for SHOW TABLES and DESCRIBE statements respectively.

Possible values: true, false.

Default value: false


aerospike.client-cache-size

Description: Aerospike Java client cache pool size.

Default value: 4


Strict schemas​

Because Trino is a SQL engine, it assumes that the underlying data store (Aerospike, in this case) follows a strict schema for all the records within a table. However, as a NoSQL database, Aerospike is schema-less.

Therefore, a single bin (mapped to a Trino column) within a set (mapped to a Trino table) could technically hold values of multiple Aerospike supported types.

The Trino connector reconciles this incompatibility with the help of the aerospike.strict-schema configuration property:

  1. aerospike.strict-schemas = false (default)

    • If none of the column types in the user-specified schema match the bin types of a record in Aerospike, a record with NULLs is returned in the result set.
    • If the above mismatch is limited to fewer columns in the user-specified schema, NULL is returned for those columns in the result set. There is no way to differentiate between a NULL due to a missing value in the original data set and a NULL due to a mismatch. Therefore, a user would have to treat all NULLs as missing values. The columns that are not a part of the schema will be automatically filtered out in the result set by the connector.
  2. aerospike.strict-schemas = true

    • If a mismatch between the user-specified schema and the schema of a record in Aerospike is detected at the bin/column level, your query will error out.
    • The strict configuration (aerospike.strict-schemas = true) could be used when you have modeled your data in Aerospike to adhere to a strict schema i.e. each record within the set has the same structure.

aerospike.clientpolicy.user and aerospike.clientpolicy.password

Description: Authenticates all Trino users to your Aerospike database with this single set of credentials. If you set aerospike.clientpolicy.authMode to INTERNAL, ensure that the user and the password, and the associated roles, are set up in the Aerospike database. See Configuring access control for more information.

To override the username and password that are set in this file, you can authenticate users in Trino sessions by running these commands in the Trino CLI:

SET SESSION <catalog_name>.client_policy_user = '<username>'
SET SESSION <catalog_name>.client_policy_password = '<password>'

where <catalog_name> matches the catalog name of the Aerospike database being authenticated to.

note

When you use the SET SESSION command, the names of these two configuration properties use underscores, not periods, to separate the words that compose them. Also, these are the only properties that you can set with the SET SESSION command.

Default value: null


aerospike.clientpolicy.authMode

Description: Authentication mode to use when values are set for aerospike.clientpolicy.user and aerospike.clientpolicy.password.

Possible values:

  • INTERNAL - Use internal authentication only. The hashed password is stored on the server. Do not send clear password. This is the default.
  • EXTERNAL - Use external authentication (such as LDAP). Specific external authentication is configured on server. If TLS is defined, send clear password on node login using TLS. Throw exception if TLS is not defined.
  • EXTERNAL_INSECURE - Use external authentication (such as LDAP). Specific external authentication is configured on server. Send clear password on node login whether or not TLS is defined. This mode should only be used for testing purposes because it does not provide secure authentication.
  • PKI - Authentication and authorization based on a certificate. No username or password needs to be configured. Requires TLS and a client certificate. Requires Databaset 5.7 or later.

Default value: INTERNAL


aerospike.clientpolicy.tls.enabled

Description: Enable secure TLS connection.

Default value: false


aerospike.clientpolicy.tls.storeType

Description: Type of the keystore.

Default value: jks


aerospike.clientpolicy.tls.keystorePath

Description: Keystore file path.

Default value: null


aerospike.clientpolicy.tls.keystorePassword

Description: Keystore password.

Default value: null


aerospike.clientpolicy.tls.keyPassword

Description: Key password.

Default value: null


aerospike.clientpolicy.tls.truststorePath

Description: Truststore file path.

Default value: null


aerospike.clientpolicy.tls.truststorePassword

Description: Truststore password.

Default value: null


aerospike.clientpolicy.tls.forLoginOnly

Description: Use TLS connection only for login authentication.

Default value: false


aerospike.clientpolicy.tls.allowedCiphers

Description: Comma-separated list of allowable TLS ciphers to use for the secure connection.

Default value: Default ciphers defined by JVM


aerospike.clientpolicy.tls.allowedProtocols

Description: Comma separated list of allowable TLS protocols to use for the secure connection.

Default value: TLSv1.2


aerospike.clientpolicy.connPoolsPerNode

Description: Number of synchronous connection pools used for each node.

If each of your nodes has eight or fewer CPU cores, you can leave this value at the default. However, if each node has more CPU cores, use a higher value to create multiple connection pools per node. Doing so helps to avoid contention among CPU cores for pooled connections.

Default value: 1


aerospike.clientpolicy.maxConnsPerNode

Description: Maximum number of synchronous connections allowed per server node. Increasing this value can help prevent the connector from reaching the maximum number of connections if you run many queries that use parallel scans.

Default value: 300


aerospike.enable-statistics

Description: Generate statistics for Cost-Based Optimization (CBO). Currently, the Trino connector only supports the row count. Ensure that you turn on CBO in Trino.

Default value: false


aerospike.scanpolicy.recordsPerSecond

Description: Limit returned records per second (RPS) rate for each server. A value of 0 specifies that there is no limit. Setting this value higher than 0 throttles the rate at which records are returned.

Default value: 0


aerospike.scanpolicy.maxConcurrentNodes

Description: Maximum number of concurrent requests to server nodes at any point in time. Issue requests to all server nodes in parallel if maxConcurrentNodes is zero.

Default value: 0


aerospike.split-number

Description: Number of Trino splits. Update this property to align with the available resources (CPU threads) in your cluster. Aerospike connector supports up to Integer.MAX_VALUE splits (i.e. 2^31-1 Trino splits) for parallel partition scans by Trino workers.

Splits is the unit of parallelism in Trino. Hence, we can support up to ~2B Trino worker threads (configurable by setting task.max-worker-threads in Trino).

Setting this value too high may cause a drop in performance due to context switching. Aerospike recommends that you set the value of aerospike.split-number to the result of multiplying the number of cores by the number of threads per core.

Default value: 16

Use a value of 4 for Dockerized environments


aerospike.policy.connectTimeout

Description: Socket connect timeout in milliseconds.

Default value: 0


aerospike.policy.socketTimeout

Description: Aerospike socket idle timeout in milliseconds when processing a database command.

Default value: 300000


aerospike.event-group-size

Description: Aerospike Java client Netty event loop group size.

Default value: <number of available CPU cores>


aerospike.eventpolicy.maxCommandsInProcess

Description: Maximum number of async commands that can be processed in each event loop at any point in time.

Default value: 0 (execute all async commands immediately)


aerospike.eventpolicy.maxCommandsInQueue

Description: Maximum number of async commands that can be stored in each event loop's delay queue for later execution.

Default value: 0 (no delay queue limit)


aerospike.eventpolicy.commandsPerEventLoop

Description: Expected number of concurrent asynchronous commands in each event loop that are active at any point in time.

Default value: 256


The Aerospike connector supports query audit logging by leveraging Trino event listener.

It currently logs timestamp, initiating user (name used in the Trino session or the user's OS name if session name was unspecified), schema name and table name (format is schemaname.tablename), query ID, query status (success/failure), SQL statement, and the number of records that were read or written.

You can enable it by creating a configuration file etc/event-listener.properties with the following properties.

event-listener.name

Description: Set this name to aerospike-audit-log.


audit-log.path

Description: Path of the security audit log file. If you plan to use the default path, make sure that the permissions to create the file in the specified location exist, otherwise the feature will not work.

Default value: etc/log/security.log


audit-log.max-size

Description: Maximum size of a single security audit log file that is specified in bytes (e.g. 128MB=134217728).

Default value: 134217728


audit-log.max-history

Description: Maximum number of security audit log files that could be created.

Default value: 24

note
  • The log file contains one log entry per line, and the values are separated by a tab character. The timestamp is using ISO 8601 format.
  • The log files are stored in the Trino coordinator.
  • schemaname.tablename is not displayed if the query fails.

info

These properties are available in Aerospike Connect for Trino versions 1.1.0 or later.

When certain error conditions occur as the connector interacts with your Aerospike cluster, the connector can "back off" from the server. "Backing off" means not only to retry the actions that led to the error, but to retry them at exponentially increasing intervals of time.

The duration of the interval before the first attempt is specified by the configuration property aerospike.retry-initial-millis. If the database cannot service the request because it is busy, the connector continues attempting the same action after exponentially longer intervals. To compute the length of each successive interval, the connector multiplies the duration of the current interval by the value of the configuration parameter aerospike.retry-multiplier.

For example, if the initial wait time is 1s (1000 milliseconds) and the multiplier is 2, the retries are attempted at 1s, 2s, 4s, 8s, 16s, 32s, and so on. The connector continues retrying the action until the database can service the request or until the connector reaches the maximum number of retries allowed, which you can specify with aerospike.retry-max-attempts.

The most important of the error conditions that prompt the connector to back off from the server is the violation of a rate quota. The other error conditions are internal error conditions.

Configuration properties​

These three configuration properties determine the "back off" behavior of the connector when the connector encounters one of the error conditions:

aerospike.retry-initial-millis

Description: Time to wait (in milliseconds) before retrying for the first time the action that led to the error condition. If the error condition persists after the initial retry, subsequent retries are attempted at intervals that become exponentially longer.

Default value: 1000


aerospike.retry-max-attempts

Description: The maximum number of times to retry an action that led to an error condition. A value of 0 prevents the connector from retrying.

Default value: 3


aerospike.retry-multiplier

Description: The integer by which to multiply the duration of the current wait interval to determine the duration of the next wait interval.

Default value: 2


aerospike.case-insensitive-identifiers

Description: Enables case-insensitive name resolution. Has minor performance penalty when enabled.

Default value: false


aerospike.sindex-table-name

Description: Name of the table that stores the list of available secondary indexes for a schema.

Default value: __sindex


Back-offs for violations of one or both rate quotas​

info

You can use rate quotas with Aerospike Connect for Trino version 1.1.0 or later only when your Aerospike Database Enterprise Edition cluster is at version 5.6 or later.

In your Aerospike database, an administrator can set rate quotas for roles and then assign users to those roles. One rate quota limits the number of reads in terms of records per second, and the other rate quota limits the number of writes, also in terms of records per second. All record accesses are counted towards the quotas: updates, replaces, UDFs, background UDFs, reads, batch reads and scans. Your Aerospike database consequently limits the user to a number of transactions per second. This number consists of the sum of the two rate quotas.

For example, you might set for the role analysts the rate quota of 40,000 records per second for reads, and the rate quota of 40,000 records per second for writes. Then, you might assign the user analyst_1 the role analysts. When a query issued against your Aerospike database by analyst_1 results in a rate of transactions per second that includes a breach of either of these rate quotas, the connector waits before attempting to re-run the stage of the query that violated a rate quota.

The connector might have to retry a query stage more than once because the database might be busy and not have the resources to service the request. If the query stage still does not run after the maximum number of retries, the connector fails the query, In this situation, the user can retry the query at a later time.