Recommendations for Google Compute Engine
Operating Systemโ
We recommend using the latest Ubuntu OS as it includes the most recent optimizations and bug fixes on the Google Compute Engine platform.
Cloud Considerationโ
Lack of Multicast Support: Cloud providers, such as, Amazon and Google Compute Engine do not support multicast networking. For these providers, we offer Mesh heartbeats, which uses point-to-point TCP connections for heartbeats.
Network Variability: Often, the network latency on cloud platforms is not consistent over time. This can cause problems with heartbeat packet delivery times. For these providers, we recommend setting the heartbeat
interval
to 150 and the heartbeattimeout
to 20.Instance Pauses: At times, your cloud instance could be paused by the cloud provider for short durations. For example, Google Compute Engine (GCE) employs live migration which could pause your instance for short time durations for maintenance or software updates. The short pauses might cause the other instances in the cluster to consider this instance as "dead". We recommend upgrading to server versions 3.13 and above to help your cluster recover quickly after any network disruption or cluster changes. Refer to paxos-recovery-policy. This policy has been introduced with Aerospike server version 3.7.0.1, but it requires explicit configuration to
auto-reset-master
until version 3.8.1.
Network Setupโ
By default, machines within a Google Compute Engine project can communicate freely with each other. A default-allow-internal firewall rule allows this. Aerospike uses TCP ports 3000 for communication with clients and 3002-3003 for intra-cluster communication. These ports need not be open to the rest of internet. If the database clients are present within the same project, there should not be any need for a separate firewall rule, as they will be able to connect over port 3000.
You will need a port for SSH access to your instances (default tcp port 22). This is also already open due to the default-allow-ssh firewall rule.
All instances in Google Compute Engine are assigned an internal IP address. These internal IP addresses should be used in the mesh heartbeat configuration.
Persistenceโ
Persistent Disk Storageโ
Google Compute Engine provides storage in the form of Persistent Disks, which are network-attached to virtual machine instances. There are two types of Persistent Disks: Standard (HDD) Persistent Disks and Solid-State Drive (SSD) Persistent Disks. The performance of Persistent Disks is closely tied to the size of the disk volumes and the virtual machine instance types.
The Google Compute Engine documentation states:
- IOPS performance limits grow linearly with the size of the persistent disk volume.
- Throughput limits also grow linearly, up to the maximum bandwidth for the virtual machine that the persistent disk is attached to.
- Larger virtual machines have higher bandwidth limits than smaller virtual machines.
We recommend using the SSD Persistent Disks instead of standard (HDD) Persistent Disks for the storage engines requiring persistence. Read more about configuring storage engines.
Local SSD Storageโ
In some zones, Google Compute Engine offers local SSD storage option. This provides extremely good performance, with high input/output operations per second (IOPS) and low latency compared to the standard persistent disk and SSD persistent disk options. However, these local SSDs are created and destroyed along with the virtual machine instance. In spite of this, the local SSD storage option can be used judiciously with an Aerospike cluster so that the data is always replicated on multiple local SSDs attached to multiple virtual machines in the cluster.
Here is an example configuration snippet:
storage-engine device {
device /dev/disk/by-id/google-local-disk-0
...
}
It does not matter which interface type is chosen, scsi or NVMe.
Ubuntu based OS deployments are already NVMe optimized. If deployed through the marketplace, the base image is scsi-mq optimized as well.
Shadow Device configurationโ
Aerospike recommends Shadow Device configuration for cloud based VM deployments like in GCE.
To take advantage of local disks with the persistence guarantee of persistent disks, Aerospike has the Shadow Device configuration for the storage engine.
The write throughput is still be limited by the instance limit and storage volume, so this strategy gives good results when the percentage of writes is low.
An example config would be as follows:
storage-engine device{
device /dev/disk/by-id/google-local-ssd-0 /dev/disk/by-id/google-persistent-disk-1
...
}
or using the resolved symlinks:
storage-engine device{
device /dev/nvme0n1 /dev/sdb
...
}
The shadow EBS device must have a size at least equal to the size of the direct-attached SSD. Otherwise, when you try to start your Aerospike cluster, it logs the critical error shadow device <device> is smaller than main device
and fails to start. Here is an example of the error message:
May 05 2021 21:31:36 GMT: CRITICAL (drv_ssd): (drv_ssd.c:3290) shadow device /dev/sdc is smaller than main device - 2147483648 < 5368709120
For data-in-memory use cases with persistence, it may also be preferable to use a local SSD device alongside a Persistent Disk volume. In this case, it would be to save on IOPS cost incurred on read during the defragmentation process. The reads would be performed against the local SSD device and re-written/defragmented blocks directly mirrored to the Persistent Disk volume.