Troubleshooting
- Pods stuck in pending state
- Pods keep crashing
- Error connecting to the cluster from outside Kubernetes
- Troubleshooting
Pods stuck in pending state
After an Aerospike cluster has been created or updated if the pods are stuck with "Pending" status like so,
NAME READY STATUS RESTARTS AGE
aerocluster-0-0 1/1 Pending 0 48s
aerocluster-0-1 1/1 Pending 0 48s
describe the pod to find the reason for scheduling failure.
kubectl -n aerospike describe pod aerocluster-0-0
Under the events section you will find the reason for the pod not being scheduled. For example
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 9m27s (x3 over 9m31s) default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
Warning FailedScheduling 20s (x9 over 9m23s) default-scheduler 0/1 nodes are available: 1 node(s) didn't match Pod's node affinity.
Possible reasons are
- Storage class incorrect or not created. Please see persistent storage configuration for details.
- 1 node(s) didn't match Pod's node affinity - Invalid zone, region, racklabel, or other parameter for the rack configured for this pod.
- Insufficient resources, CPU or memory available to schedule more pods.
Pods keep crashing
After an Aerospike cluster has been created or updated if the pods are stuck with "Error" or "CrashLoopBackOff" status like so,
NAME READY STATUS RESTARTS AGE
aerocluster-0-0 1/1 Error 0 48s
aerocluster-0-1 1/1 CrashLoopBackOff 2 48s
Check the following logs to see if pod initialization failed or the Aerospike Server stopped.
Init logs
kubectl -n aerospike logs aerocluster-0-0 -c aerospike-init
Server logs
kubectl -n aerospike logs aerocluster-0-0 -c aerospike-server
Possible reasons are
- Missing or incorrect feature key file - Fix by deleting the Aerospike secret and recreating it with correct feature key file. See Aerospike secrets for details.
- Bad Aerospike configuration - The operator tries to validate the configuration before applying it to the cluster. However it's still possible to misconfigure the Aerospike server. The offending paramter will be logged in the server logs and should be fixed and applied again to the cluster. See Aerospike configuration change for details.
Error connecting to the cluster from outside Kubernetes
If the cluster runs fine as verfied by the pod status and asadm (see connecting with asadm), Ensure that firewall allows inbound traffic to the Kubenetes cluster for the Aerospike ports. See Port access for details.
Troubleshooting
Events
Events are high-level information on what is happening in the cluster, including incidents like scheduling failure or unavailable storage.
kubectl -n aerospike get events
Operator logs
List the operator instance and select the operator instance with READY column 1/1. This is the active operator instance even if you run multiple replicas for high availability.
kubectl -n aerospike get pod
NAME READY STATUS RESTARTS AGE
aerocluster-0-0 2/2 Running 0 133m
aerocluster-0-1 2/2 Running 0 133m
aerocluster-0-2 2/2 Running 0 12m
aerospike-kubernetes-operator-5f44549fc9-hwk8k 0/1 Running 0 15m
aerospike-kubernetes-operator-5f44549fc9-n7vj2 1/1 Running 0 15m
Get the log
kubectl -n aerospike logs aerospike-kubernetes-operator-5f44549fc9-n7vj2
Add the -f flag to follow the logs continuously.
kubectl -n aerospike logs -f aerospike-kubernetes-operator-5f44549fc9-n7vj2
The series of steps the operator follows to apply user changes are logged allow with debug information, errors, and warnings.