Operational guidelines
Follow these guidelines to maintain high availability and optimal performance for your Aerospike Cloud deployment.
- Monitor memory, CPU, replica lag, and storage usage continuously using metrics emitted from the Aerospike Prometheus endpoint.
- Set alerts for significant changes in usage patterns or when approaching resource limits for storage, memory, and CPU.
- Scale clusters proactively to ensure sufficient capacity for your workload and prevent service degradation.
- Account for time for migrations to complete when scaling up clusters.
- Enable circuit breaker functionality in your Aerospike clients and configure appropriate retry timeouts.
- For disaster recovery, configure standby clusters using cross-datacenter replication (XDR) and regularly test failover to measure recovery time for your workload.
- Perform backups during low write activity periods to minimize impact, and regularly test restores to verify data recovery readiness.
- Maintain secure network connectivity configurations (VPC peering, private endpoints, firewall rules) following Aerospike security recommendations.
- Ensure proper TLS configuration in clients for encrypted communication with the database.