Troubleshooting Aerospike on Kubernetes
Pods stuck in pending stateโ
After an Aerospike cluster has been created or updated if the pods are stuck with "Pending" status like so:
Output:
NAME READY STATUS RESTARTS AGE
aerocluster-0-0 1/1 Pending 0 48s
aerocluster-0-1 1/1 Pending 0 48s
Describe the pod to find the reason for scheduling failure:
kubectl -n aerospike describe pod aerocluster-0-0
Under the events section you will find the reason for the pod not being scheduled. For example:
Output
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 9m27s (x3 over 9m31s) default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
Warning FailedScheduling 20s (x9 over 9m23s) default-scheduler 0/1 nodes are available: 1 node(s) didn't match Pod's node affinity.
Possible reasons are
- Storage class incorrect or not created. Please see persistent storage configuration for details.
- 1 node(s) didn't match Pod's node affinity - Invalid zone, region, racklabel etc. for the rack configured for this pod.
- Insufficient resources, CPU or memory available to schedule more pods.
Pods keep crashingโ
After an Aerospike cluster has been created or updated if the pods are stuck with "Error" or "CrashLoopBackOff" status like so:
Output:
NAME READY STATUS RESTARTS AGE
aerocluster-0-0 1/1 Error 0 48s
aerocluster-0-1 1/1 CrashLoopBackOff 2 48s
Check the following logs to see if pod initialization failed or the Aerospike Server stopped.
Init logs:
kubectl -n aerospike logs aerocluster-0-0 -c aerospike-init
Server logs:
kubectl -n aerospike logs aerocluster-0-0 -c aerospike-server
Possible reasons are
- Missing or incorrect feature key file - Fix by deleting the Aerospike secret and recreating it with correct feature key file. See Aerospike secrets for details.
- Bad Aerospike configuration - The operator tries to validate the configuration before applying it to the cluster. However, it's still possible to misconfigure the Aerospike server. The offending parameter is logged in the server logs and should be fixed and applied again to the cluster. See Aerospike configuration change for details.
Error connecting to the cluster from outside Kubernetesโ
If the cluster runs fine as verified by the pod status and asadm (see connecting with asadm), Ensure that firewall allows inbound traffic to the Kubernetes cluster for the Aerospike ports. See Port access for details.
Eventsโ
Kubernetes events are generated on errors, resource state changes and at times for informational messages.
Common errors include pod scheduling failures, storage unavailability, missing secrets, missing storage classes.
When the Aerospike cluster is not deployed as expected, events could help debug and find the cause.
To help troubleshoot issues, the Operator generates events indicating state changes, errors and informational messages for the cluster it is working on.
Run kubectl get event --namespace [namespace] --field-selector involvedObject.name=[cluster name]
to see the events generated by the operator for an Aerospike cluster. For example, this command displays the events generated for cluster aeroclutster
in the aerospike
namespace:
kubectl -n aerospike --field-selector involvedObject.name=aerocluster get events
Output:
LAST SEEN TYPE REASON OBJECT MESSAGE
90s Normal WaitMigration aerospikecluster/aerocluster [rack-0] Waiting for migrations to complete
92s Normal RackRollingRestart aerospikecluster/aerocluster [rack-0] Started Rolling restart
2m8s Normal PodWaitSafeDelete aerospikecluster/aerocluster [rack-0] Waiting to safely restart Pod aerocluster-0-1
92s Normal PodRestarted aerospikecluster/aerocluster [rack-0] Restarted Pod aerocluster-0-1
92s Normal PodWaitSafeDelete aerospikecluster/aerocluster [rack-0] Waiting to safely restart Pod aerocluster-0-0
61s Normal PodRestarted aerospikecluster/aerocluster [rack-0] Restarted Pod aerocluster-0-0
To see all the events for the cluster's namespace (aerospike
in this case), run the following command
kubectl -n aerospike get events
Output:
LAST SEEN TYPE REASON OBJECT MESSAGE
15m Normal Killing pod/aerocluster-0-0 Stopping container aerospike-server
15m Normal Scheduled pod/aerocluster-0-0 Successfully assigned aerospike/aerocluster-0-0 to ip-10-0-146-203.ap-south-1.compute.internal
15m Normal AddedInterface pod/aerocluster-0-0 Add eth0 [10.131.1.30/23] from openshift-sdn
15m Normal Pulled pod/aerocluster-0-0 Container image "registry.connect.redhat.com/aerospike/aerospike-kubernetes-init:0.0.17" already present on machine
15m Normal Created pod/aerocluster-0-0 Created container aerospike-init
15m Normal Started pod/aerocluster-0-0 Started container aerospike-init
15m Normal Pulled pod/aerocluster-0-0 Container image "registry.connect.redhat.com/aerospike/aerospike-server-enterprise-ubi8:6.1.0.1" already present on machine
15m Normal Created pod/aerocluster-0-0 Created container aerospike-server
15m Normal Started pod/aerocluster-0-0 Started container aerospike-server
16m Normal Killing pod/aerocluster-0-1 Stopping container aerospike-server
16m Normal Scheduled pod/aerocluster-0-1 Successfully assigned aerospike/aerocluster-0-1 to ip-10-0-146-203.ap-south-1.compute.internal
16m Normal AddedInterface pod/aerocluster-0-1 Add eth0 [10.131.1.28/23] from openshift-sdn
16m Normal Pulled pod/aerocluster-0-1 Container image "registry.connect.redhat.com/aerospike/aerospike-kubernetes-init:0.0.17" already present on machine
16m Normal Created pod/aerocluster-0-1 Created container aerospike-init
16m Normal Started pod/aerocluster-0-1 Started container aerospike-init
16m Normal Pulled pod/aerocluster-0-1 Container image "registry.connect.redhat.com/aerospike/aerospike-server-enterprise-ubi8:6.1.0.1" already present on machine
16m Normal Created pod/aerocluster-0-1 Created container aerospike-server
16m Normal Started pod/aerocluster-0-1 Started container aerospike-server
15m Normal SuccessfulCreate statefulset/aerocluster-0 create Pod aerocluster-0-0 in StatefulSet aerocluster-0 successful
16m Normal SuccessfulCreate statefulset/aerocluster-0 create Pod aerocluster-0-1 in StatefulSet aerocluster-0 successful
16m Normal WaitMigration aerospikecluster/aerocluster [rack-0] Waiting for migrations to complete
16m Normal RackRollingRestart aerospikecluster/aerocluster [rack-0] Started Rolling restart
16m Normal PodWaitSafeDelete aerospikecluster/aerocluster [rack-0] Waiting to safely restart Pod aerocluster-0-1
16m Normal PodRestarted aerospikecluster/aerocluster [rack-0] Restarted Pod aerocluster-0-1
16m Normal PodWaitSafeDelete aerospikecluster/aerocluster [rack-0] Waiting to safely restart Pod aerocluster-0-0
15m Normal PodRestarted aerospikecluster/aerocluster [rack-0] Restarted Pod aerocluster-0-0
Operator logsโ
OpenShiftโ
List the operator pods
kubectl -n openshift-operators get pod | grep aerospike-operator-controller-manager
Output:
aerospike-operator-controller-manager-677f99497c-qrtcl 2/2 Running 0 9h
aerospike-operator-controller-manager-677f99497c-z9t6v 2/2 Running 0 9h
Get the log for an operator pod using the pod names above using the following command:
kubectl -n openshift-operators logs aerospike-operator-controller-manager-677f99497c-qrtcl -c manager
Add the -f flag to follow the logs continuously.
kubectl -n openshift-operators logs -f aerospike-operator-controller-manager-677f99497c-qrtcl -c manager
The series of steps the operator follows to apply user changes are logged along with errors, and warnings.
Other Kubernetes distributionsโ
List the operator pods
kubectl -n operators get pod | grep aerospike-operator-controller-manager
Output:
aerospike-operator-controller-manager-677f99497c-qrtcl 2/2 Running 0 9h
aerospike-operator-controller-manager-677f99497c-z9t6v 2/2 Running 0 9h
Get the log for an operator pod using the pod names above using the following command:
kubectl -n operators logs aerospike-operator-controller-manager-677f99497c-qrtcl -c manager
Add the -f flag to follow the logs continuously.
kubectl -n operators logs -f aerospike-operator-controller-manager-677f99497c-qrtcl -c manager
The series of steps the operator follows to apply user changes are logged along with errors, and warnings.