Troubleshooting Aerospike on Kubernetes
For the complete documentation index see: llms.txt
All documentation pages available in markdown.
This page describes common issues with Aerospike Kubernetes Operator (AKO) deployments and how to resolve them.
Pods stuck in pending state
After an Aerospike cluster has been created or updated, if the pods are stuck with “Pending” status like so:
kubectl get pods -n aerospikeNAME READY STATUS RESTARTS AGEaerocluster-0-0 1/1 Pending 0 48saerocluster-0-1 1/1 Pending 0 48sUse the following steps to find the reason for scheduling failure and confirm that the fix worked.
-
Run the
kubectl describecommand for one of the pending pods. TheEventssection shows the reason for the pod not being scheduled.CommandTerminal window kubectl -n aerospike describe pod aerocluster-0-0Example output QoS Class: BurstableNode-Selectors: <none>Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300snode.kubernetes.io/unreachable:NoExecute op=Exists for 300sEvents:Type Reason Age From Message---- ------ ---- ---- -------Warning FailedScheduling 9m27s (x3 over 9m31s) default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.Warning FailedScheduling 20s (x9 over 9m23s) default-scheduler 0/1 nodes are available: 1 node(s) didn't match Pod's node affinity. -
Fix the issue shown in the pod events. Possible reasons include:
- Storage class incorrect or not created
- Node(s) didn’t match the pod’s node affinity - invalid zone, region, rack label, or other parameter for the rack configured for this pod
- Insufficient CPU or memory resources available to schedule more pods
-
If the fix changes Kubernetes manifests, apply the updated file so the changes take effect.
Terminal window kubectl apply -f CLUSTER_CR.yaml -
Validate that the pod is no longer pending.
Terminal window kubectl -n aerospike get podsThe affected pod should move from
PendingtoRunning.
Pods keep crashing
After an Aerospike cluster has been created or updated, if the pods are stuck with “Error” or “CrashLoopBackOff” status like so:
kubectl get pods -n aerospikeNAME READY STATUS RESTARTS AGEaerocluster-0-0 1/1 Error 0 48saerocluster-0-1 1/1 CrashLoopBackOff 2 48sCheck the following logs to see if pod initialization failed or the Aerospike instance stopped.
-
Check the init container logs.
Terminal window kubectl -n aerospike logs aerocluster-0-0 -c aerospike-init -
Check the Aerospike logs.
Terminal window kubectl -n aerospike logs aerocluster-0-0 -c aerospike-server -
Fix the issue shown in the logs. Possible reasons include:
- Missing or incorrect feature-key file - Fix by deleting the Aerospike secret and recreating it with the correct feature-key file.
- Bad Aerospike configuration - AKO tries to validate the configuration before applying it to the cluster. However, it’s still possible to misconfigure the Aerospike instance. The offending parameter is logged in the Aerospike logs. Fix the configuration and apply it again to the cluster. See Aerospike configuration change for details.
-
If the fix changes the AerospikeCluster CR, apply the updated file.
Terminal window kubectl apply -f CLUSTER_CR.yaml -
Validate that the pod is running and the restart count stops increasing.
Terminal window kubectl -n aerospike get pods
Error connecting to the cluster from outside Kubernetes
If the cluster runs fine as verified by the pod status and asadm, ensure that the firewall allows inbound traffic to the Kubernetes cluster for the Aerospike ports.
-
Confirm that the Aerospike pods are running.
Terminal window kubectl -n aerospike get pods -
Check the firewall rules for the Aerospike ports. See Port access for details.
-
Validate the external connection with
asadm. Replace the host, port, username, and password with values for your deployment.Terminal window asadm -h HOST:PORT -U USERNAME -P PASSWORD --services-alternateSee Connecting with
asadmfor more connection examples.
Metadata missing after pod restart
If secondary index definitions, security metadata, or other cluster metadata are missing after a pod restart while namespace data still appears present, the Aerospike work directory might be using ephemeral storage. The work directory contains System MetaData (SMD), which needs persistent file system storage when namespace data is persistent.
Use the following steps to check and fix the AerospikeCluster custom resource (CR).
-
Check the CR for the following issues:
spec.storage.volumesorspec.rackConfig.racks[].storage.volumesdoes not mount a persistent volume at the Aerospike work directory path, such as/opt/aerospike.spec.validationPolicy.skipWorkDirValidateis set totrue, which skips the AKO check for persistent work directory storage.
-
Update your cluster CR file so the work directory uses a persistent volume in the global or rack-level storage configuration. Adding or changing persistent volumes in
spec.storageorspec.rackConfig.racks[].storagerequires storage scaling because AKO does not add, remove, or change persistent storage volumes dynamically. -
Apply the updated CR file.
Terminal window kubectl apply -f CLUSTER_CR.yaml -
Validate that the live CR has work directory storage configured and that
skipWorkDirValidateis not set totrue.Terminal window kubectl -n aerospike get aerospikecluster AEROSPIKE_CLUSTER_NAME -o yamlConfirm that the global or rack-level
storage.volumesentry mounts a persistent volume at the work directory path, such as/opt/aerospike.
For more information, see Work directory and System MetaData (SMD) and Validation policy.
Events
Kubernetes events are generated on errors, resource state changes and at times for informational messages. Common errors include pod scheduling failures, storage unavailability, missing secrets, and missing storage classes. When the Aerospike cluster is not deployed as expected, events could help debug and find the cause.
To help troubleshoot issues, AKO generates events indicating state changes, errors and informational messages for the cluster it is working on.
Run kubectl get event --namespace [namespace] --field-selector involvedObject.name=[cluster name] to see the events generated by AKO for an Aerospike cluster.
For example, this command displays the events generated for cluster aeroclutster in the aerospike namespace:
kubectl -n aerospike --field-selector involvedObject.name=aerocluster get eventsLAST SEEN TYPE REASON OBJECT MESSAGE90s Normal WaitMigration aerospikecluster/aerocluster [rack-0] Waiting for migrations to complete92s Normal RackRollingRestart aerospikecluster/aerocluster [rack-0] Started Rolling restart2m8s Normal PodWaitSafeDelete aerospikecluster/aerocluster [rack-0] Waiting to safely restart Pod aerocluster-0-192s Normal PodRestarted aerospikecluster/aerocluster [rack-0] Restarted Pod aerocluster-0-192s Normal PodWaitSafeDelete aerospikecluster/aerocluster [rack-0] Waiting to safely restart Pod aerocluster-0-061s Normal PodRestarted aerospikecluster/aerocluster [rack-0] Restarted Pod aerocluster-0-0To see all the events for the cluster’s namespace, run the following command with the namespace. In this example, the namespace is aerospike.
kubectl -n NAMESPACE_NAME get eventsLAST SEEN TYPE REASON OBJECT MESSAGE15m Normal Killing pod/aerocluster-0-0 Stopping container aerospike-server15m Normal Scheduled pod/aerocluster-0-0 Successfully assigned aerospike/aerocluster-0-0 to ip-10-0-146-203.ap-south-1.compute.internal15m Normal AddedInterface pod/aerocluster-0-0 Add eth0 [10.131.1.30/23] from openshift-sdn15m Normal Pulled pod/aerocluster-0-0 Container image "docker.io/aerospike/aerospike-kubernetes-init:2.5.1" already present on machine15m Normal Created pod/aerocluster-0-0 Created container aerospike-init15m Normal Started pod/aerocluster-0-0 Started container aerospike-init15m Normal Pulled pod/aerocluster-0-0 Container image "docker.io/aerospike/aerospike-server-enterprise:8.1.2.0" already present on machine15m Normal Created pod/aerocluster-0-0 Created container aerospike-server15m Normal Started pod/aerocluster-0-0 Started container aerospike-server16m Normal Killing pod/aerocluster-0-1 Stopping container aerospike-server16m Normal Scheduled pod/aerocluster-0-1 Successfully assigned aerospike/aerocluster-0-1 to ip-10-0-146-203.ap-south-1.compute.internal16m Normal AddedInterface pod/aerocluster-0-1 Add eth0 [10.131.1.28/23] from openshift-sdn16m Normal Pulled pod/aerocluster-0-1 Container image "docker.io/aerospike/aerospike-kubernetes-init:2.5.1" already present on machine16m Normal Created pod/aerocluster-0-1 Created container aerospike-init16m Normal Started pod/aerocluster-0-1 Started container aerospike-init16m Normal Pulled pod/aerocluster-0-1 Container image "docker.io/aerospike/aerospike-server-enterprise:8.1.2.0" already present on machine16m Normal Created pod/aerocluster-0-1 Created container aerospike-server16m Normal Started pod/aerocluster-0-1 Started container aerospike-server15m Normal SuccessfulCreate statefulset/aerocluster-0 create Pod aerocluster-0-0 in StatefulSet aerocluster-0 successful16m Normal SuccessfulCreate statefulset/aerocluster-0 create Pod aerocluster-0-1 in StatefulSet aerocluster-0 successful16m Normal WaitMigration aerospikecluster/aerocluster [rack-0] Waiting for migrations to complete16m Normal RackRollingRestart aerospikecluster/aerocluster [rack-0] Started Rolling restart16m Normal PodWaitSafeDelete aerospikecluster/aerocluster [rack-0] Waiting to safely restart Pod aerocluster-0-116m Normal PodRestarted aerospikecluster/aerocluster [rack-0] Restarted Pod aerocluster-0-116m Normal PodWaitSafeDelete aerospikecluster/aerocluster [rack-0] Waiting to safely restart Pod aerocluster-0-015m Normal PodRestarted aerospikecluster/aerocluster [rack-0] Restarted Pod aerocluster-0-0Operator logs
AKOCTL collectinfo logs
akoctl is a Krew plugin for the Kubernetes Operator that uses the command collectinfo to collect logs from a cluster or namespace.
See AKOCTL for more information.
List the AKO pods:
kubectl -n operators get pod | grep aerospike-operator-controller-manageraerospike-operator-controller-manager-677f99497c-qrtcl 2/2 Running 0 9haerospike-operator-controller-manager-677f99497c-z9t6v 2/2 Running 0 9hGet the log for an AKO pod using the pod names above using the following command:
kubectl -n operators logs aerospike-operator-controller-manager-677f99497c-qrtcl -c managerAdd the -f flag to follow the logs continuously.
kubectl -n operators logs -f aerospike-operator-controller-manager-677f99497c-qrtcl -c managerThe series of steps AKO follows to apply user changes are logged along with errors and warnings.