Configuration Properties for Aerospike Connect for Trino
Use these configuration properties in the aerospike.properties
file to specify how the Trino connector should interact with your Aerospike database.
Help with tuning Trino for performance improvements is beyond the scope of this documentation.
To use these configuration properties if you plan to use Docker, you must use the --volume
or -v
option when you run the docker run
command. Use this option to mount the aerospike.properties
file by providing the path to the folder that contains this file. The default path is trino-aerospike.docker/docker/etc/catalog
.
In general, when you use Docker you cannot both set environment variables and set these configuration properties. The only exception is when you use TRINO_DISCOVERY_URI
and TRINO_NODE_TYPE
. You can set those environment variables using the -e
option and also mount the aerospike.properties
file using the -v
option in the same docker run
command.
The following sets of configuration properties are available:
Basic configuration properties​
aerospike.cache-ttl-ms
Description: Number of milliseconds to keep the inferred schema cached. If you specify the schema, this property has no effect.
Default value: 1800000
aerospike.clientpolicy.clusterName
Description: Name of the Aerospike cluster, if a name has been configured for it.
Default value: null
aerospike.clientpolicy.maxSocketIdle
Description: Maximum socket idle in seconds. Socket connection pools will discard sockets that have been idle longer than the maximum.
Default value: 55
aerospike.clientpolicy.timeout
Description: How long the connector should wait (in milliseconds) for a response from your Aerospike database when the connector tries to make its initial connection to it.
Default value: 1000
aerospike.clientpolicy.failIfNotConnected
Description: Set to true
for the connector to throw an exception if it is unable to connect to any seed nodes for an Aerospike database.
Default value: true
aerospike.clientpolicy.sharedThreadPool
Description: Is threadPool
shared between other client instances or classes. If threadPool
is not shared (default), threadPool
shuts down when the client instance closes.
Default value: false
aerospike.clientpolicy.useServicesAlternate
Description: Use "services-alternate" instead of "services" in info request during cluster tending.
Default value: false
aerospike.default-set-name
Description: Table name for the default set. Use this environment variable when your namespace has a null set or no sets. If you have multiple namespaces with no sets in your cluster, you can query them, like this:
select * from <namespace_1>.<value>
select * from <namespace_2>.<value>
Where <value>
is the value assigned to DEFAULT_SET_NAME
.
Default value: __default
aerospike.hostlist
Description: Comma-separated list of seed nodes in the Aerospike cluster.
Default value for non-virtual server environments: null
For standalone deployments in Dockerized environments: On MacOS, use docker.for.mac.host.internal:3000
. On Linux operating systems, use localhost:3000
aerospike.insert-require-key
Description: Require the primary key (PK) on INSERT queries. Although we recommend that you provide a primary key, you can choose not to by setting this property to false, in which case a UUID is generated for the PK. You can view PKs by setting aerospike.record-key-hidden
to false for future queries.
Default value: true
aerospike.record-key-name
Description: Column name for the record's primary key. You can use this name in the queries for projection and/or predicates.
Default value: __key
aerospike.record-key-hidden
Description: If set to false
, the primary key column is made available in the result set.
Default value: true
aerospike.record-digest-name
Description: Column name for the record's digest. You can use this name in the queries for projection and/or predicates, and must be specified as a string.
Default value: __digest
aerospike.record-digest-hidden
Description: If set to false, the record's digest column will be available in the result set.
Default value: true
aerospike.strict-schemas
Description: Use a strict schema. See "Strict schemas".
Default value: false
aerospike.table-desc-dir
Description: Path of the directory containing table description files. Do not use this configuration parameter for Dockerized environments; instead, use the --volume
or -v
option in the docker run
command, as explained in "Deploying in Standalone Mode in Docker" and "Deploying in Distributed Mode in Docker".
Default value: <trino_directory>/etc/catalog
aerospike.case-insensitive-identifiers
Description: Use this property when you have a namespace or set name with mixed case types in the Aerospike database. This property will help resolve the tables/schemas case-insensitivity issue inherent in Trino that converts all names to lowercase. If turned on, you will be able to use supported SQL statements correctly against table names with mixed case types, e.g. “deepLearning”.
- It does not support two sets with the same name but differing in case types within the same namespace, e.g. sets named “deepLearning” and “deeplearning”.
- It only works with Trino version 360 or greater.
- Using it may have some performance implications, hence use it only when you have set names with mixed case types in the Aerospike database.
- Although the output of SHOW TABLES and DESCRIBE statements are lower case names regardless of whether mixed case naming is used in Aerospike database, you should be able to correctly use SELECT and other statements either with mixed case or lower case names. For example, if deepLearning and Score are the names of the set and the bin names that are used in the Aerospike database,
SELECT Score FROM deepLearning;
should work fine. That is despite the fact that set and the column names show up in lower case in the output for SHOW TABLES and DESCRIBE statements respectively.
Possible values: true, false.
Default value: false
aerospike.client-cache-size
Description: Aerospike Java client cache pool size.
Default value: 4
Strict schemas​
Because Trino is a SQL engine, it assumes that the underlying data store (Aerospike, in this case) follows a strict schema for all the records within a table. However, as a NoSQL database, Aerospike is schema-less.
Therefore, a single bin (mapped to a Trino column) within a set (mapped to a Trino table) could technically hold values of multiple Aerospike supported types.
The Trino connector reconciles this incompatibility with the help of the aerospike.strict-schema
configuration property:
aerospike.strict-schemas
=false
(default)- If none of the column types in the user-specified schema match the bin types of a record in Aerospike, a record with NULLs is returned in the result set.
- If the above mismatch is limited to fewer columns in the user-specified schema, NULL is returned for those columns in the result set. There is no way to differentiate between a NULL due to a missing value in the original data set and a NULL due to a mismatch. Therefore, a user would have to treat all NULLs as missing values. The columns that are not a part of the schema will be automatically filtered out in the result set by the connector.
aerospike.strict-schemas
=true
- If a mismatch between the user-specified schema and the schema of a record in Aerospike is detected at the bin/column level, your query will error out.
- The strict configuration (
aerospike.strict-schemas
=true
) could be used when you have modeled your data in Aerospike to adhere to a strict schema i.e. each record within the set has the same structure.
Properties related to security​
aerospike.clientpolicy.user
and aerospike.clientpolicy.password
Description: Authenticates all Trino users to your Aerospike database with this single set of credentials. If you set aerospike.clientpolicy.authMode
to INTERNAL
, ensure that the user and the password, and the associated roles, are set up in the Aerospike database. See Configuring access control for more information.
To override the username and password that are set in this file, you can authenticate users in Trino sessions by running these commands in the Trino CLI:
SET SESSION <catalog_name>.client_policy_user = '<username>'
SET SESSION <catalog_name>.client_policy_password = '<password>'
where <catalog_name>
matches the catalog name of the Aerospike database being authenticated to.
When you use the SET SESSION
command, the names of these two configuration properties use underscores, not periods, to separate the words that compose them. Also, these are the only properties that you can set with the SET SESSION
command.
Default value: null
aerospike.clientpolicy.authMode
Description: Authentication mode to use when values are set for aerospike.clientpolicy.user
and aerospike.clientpolicy.password
.
Possible values:
INTERNAL
- Use internal authentication only. The hashed password is stored on the server. Do not send clear password. This is the default.EXTERNAL
- Use external authentication (such as LDAP). Specific external authentication is configured on server. If TLS is defined, send clear password on node login using TLS. Throw exception if TLS is not defined.EXTERNAL_INSECURE
- Use external authentication (such as LDAP). Specific external authentication is configured on server. Send clear password on node login whether or not TLS is defined. This mode should only be used for testing purposes because it does not provide secure authentication.PKI
- Authentication and authorization based on a certificate. No username or password needs to be configured. Requires TLS and a client certificate. Requires Databaset 5.7 or later.
Default value: INTERNAL
aerospike.clientpolicy.tls.enabled
Description: Enable secure TLS connection.
Default value: false
aerospike.clientpolicy.tls.storeType
Description: Type of the keystore.
Default value: jks
aerospike.clientpolicy.tls.keystorePath
Description: Keystore file path.
Default value: null
aerospike.clientpolicy.tls.keystorePassword
Description: Keystore password.
Default value: null
aerospike.clientpolicy.tls.keyPassword
Description: Key password.
Default value: null
aerospike.clientpolicy.tls.truststorePath
Description: Truststore file path.
Default value: null
aerospike.clientpolicy.tls.truststorePassword
Description: Truststore password.
Default value: null
aerospike.clientpolicy.tls.forLoginOnly
Description: Use TLS connection only for login authentication.
Default value: false
aerospike.clientpolicy.tls.allowedCiphers
Description: Comma-separated list of allowable TLS ciphers to use for the secure connection.
Default value: Default ciphers defined by JVM
aerospike.clientpolicy.tls.allowedProtocols
Description: Comma separated list of allowable TLS protocols to use for the secure connection.
Default value: TLSv1.2
Properties related to performance​
aerospike.clientpolicy.connPoolsPerNode
Description: Number of synchronous connection pools used for each node.
If each of your nodes has eight or fewer CPU cores, you can leave this value at the default. However, if each node has more CPU cores, use a higher value to create multiple connection pools per node. Doing so helps to avoid contention among CPU cores for pooled connections.
Default value: 1
aerospike.clientpolicy.maxConnsPerNode
Description: Maximum number of synchronous connections allowed per server node. Increasing this value can help prevent the connector from reaching the maximum number of connections if you run many queries that use parallel scans.
Default value: 300
aerospike.enable-statistics
Description: Generate statistics for Cost-Based Optimization (CBO). Currently, the Trino connector only supports the row count. Ensure that you turn on CBO in Trino.
Default value: false
aerospike.scanpolicy.recordsPerSecond
Description: Limit returned records per second (RPS) rate for each server. A value of 0 specifies that there is no limit. Setting this value higher than 0 throttles the rate at which records are returned.
Default value: 0
aerospike.scanpolicy.maxConcurrentNodes
Description: Maximum number of concurrent requests to server nodes at any point in time. Issue requests to all server nodes in parallel if maxConcurrentNodes is zero.
Default value: 0
aerospike.split-number
Description: Number of Trino splits. Update this property to align with the available resources (CPU threads) in your cluster. Aerospike connector supports up to Integer.MAX_VALUE
splits (i.e. 2^31-1 Trino splits) for parallel partition scans by Trino workers.
Splits is the unit of parallelism in Trino. Hence, we can support up to ~2B Trino worker threads (configurable by setting task.max-worker-threads
in Trino).
Setting this value too high may cause a drop in performance due to context switching. Aerospike recommends that you set the value of aerospike.split-number
to the result of multiplying the number of cores by the number of threads per core.
Default value: 16
Use a value of 4 for Dockerized environments
aerospike.policy.connectTimeout
Description: Socket connect timeout in milliseconds.
Default value: 0
aerospike.policy.socketTimeout
Description: Aerospike socket idle timeout in milliseconds when processing a database command.
Default value: 300000
aerospike.event-group-size
Description: Aerospike Java client Netty event loop group size.
Default value: <number of available CPU cores>
aerospike.eventpolicy.maxCommandsInProcess
Description: Maximum number of async commands that can be processed in each event loop at any point in time.
Default value: 0 (execute all async commands immediately)
aerospike.eventpolicy.maxCommandsInQueue
Description: Maximum number of async commands that can be stored in each event loop's delay queue for later execution.
Default value: 0 (no delay queue limit)
aerospike.eventpolicy.commandsPerEventLoop
Description: Expected number of concurrent asynchronous commands in each event loop that are active at any point in time.
Default value: 256
Properties related to audit trail​
The Aerospike connector supports query audit logging by leveraging Trino event listener.
It currently logs timestamp, initiating user (name used in the Trino session or the user's OS name if session name was unspecified), schema name and table name (format is schemaname.tablename), query ID, query status (success/failure), SQL statement, and the number of records that were read or written.
You can enable it by creating a configuration file etc/event-listener.properties
with the following properties.
event-listener.name
Description: Set this name to aerospike-audit-log
.
audit-log.path
Description: Path of the security audit log file. If you plan to use the default path, make sure that the permissions to create the file in the specified location exist, otherwise the feature will not work.
Default value: etc/log/security.log
audit-log.max-size
Description: Maximum size of a single security audit log file that is specified in bytes (e.g. 128MB=134217728).
Default value: 134217728
audit-log.max-history
Description: Maximum number of security audit log files that could be created.
Default value: 24
- The log file contains one log entry per line, and the values are separated by a tab character. The timestamp is using ISO 8601 format.
- The log files are stored in the Trino coordinator.
- schemaname.tablename is not displayed if the query fails.
Properties related to backing off due to error conditions​
These properties are available in Aerospike Connect for Trino versions 1.1.0 or later.
When certain error conditions occur as the connector interacts with your Aerospike cluster, the connector can "back off" from the server. "Backing off" means not only to retry the actions that led to the error, but to retry them at exponentially increasing intervals of time.
The duration of the interval before the first attempt is specified by the configuration property aerospike.retry-initial-millis
. If the database cannot service the request because it is busy, the connector continues attempting the same action after exponentially longer intervals. To compute the length of each successive interval, the connector multiplies the duration of the current interval by the value of the configuration parameter aerospike.retry-multiplier
.
For example, if the initial wait time is 1s (1000 milliseconds) and the multiplier is 2, the retries are attempted at 1s, 2s, 4s, 8s, 16s, 32s, and so on. The connector continues retrying the action until the database can service the request or until the connector reaches the maximum number of retries allowed, which you can specify with aerospike.retry-max-attempts
.
The most important of the error conditions that prompt the connector to back off from the server is the violation of a rate quota. The other error conditions are internal error conditions.
Configuration properties​
These three configuration properties determine the "back off" behavior of the connector when the connector encounters one of the error conditions:
aerospike.retry-initial-millis
Description: Time to wait (in milliseconds) before retrying for the first time the action that led to the error condition. If the error condition persists after the initial retry, subsequent retries are attempted at intervals that become exponentially longer.
Default value: 1000
aerospike.retry-max-attempts
Description: The maximum number of times to retry an action that led to an error condition. A value of 0 prevents the connector from retrying.
Default value: 3
aerospike.retry-multiplier
Description: The integer by which to multiply the duration of the current wait interval to determine the duration of the next wait interval.
Default value: 2
aerospike.case-insensitive-identifiers
Description: Enables case-insensitive name resolution. Has minor performance penalty when enabled.
Default value: false
aerospike.sindex-table-name
Description: Name of the table that stores the list of available secondary indexes for a schema.
Default value: __sindex
Back-offs for violations of one or both rate quotas​
You can use rate quotas with Aerospike Connect for Trino version 1.1.0 or later only when your Aerospike Database Enterprise Edition cluster is at version 5.6 or later.
In your Aerospike database, an administrator can set rate quotas for roles and then assign users to those roles. One rate quota limits the number of reads in terms of records per second, and the other rate quota limits the number of writes, also in terms of records per second. All record accesses are counted towards the quotas: updates, replaces, UDFs, background UDFs, reads, batch reads and scans. Your Aerospike database consequently limits the user to a number of transactions per second. This number consists of the sum of the two rate quotas.
For example, you might set for the role analysts
the rate quota of 40,000 records per second for reads, and the rate quota of 40,000 records per second for writes. Then, you might assign the user analyst_1
the role analysts
. When a query issued against your Aerospike database by analyst_1
results in a rate of transactions per second that includes a breach of either of these rate quotas, the connector waits before attempting to re-run the stage of the query that violated a rate quota.
The connector might have to retry a query stage more than once because the database might be busy and not have the resources to service the request. If the query stage still does not run after the maximum number of retries, the connector fails the query, In this situation, the user can retry the query at a later time.