Troubleshooting Aerospike on Kubernetes
Pods stuck in pending state
After an Aerospike cluster has been created or updated if the pods are stuck with “Pending” status like so:
Output:
Terminal window
NAME READY STATUS RESTARTS AGEaerocluster-0-0 1/1 Pending 0 48saerocluster-0-1 1/1 Pending 0 48s
Describe the pod to find the reason for scheduling failure:
kubectl -n aerospike describe pod aerocluster-0-0
Under the events section you will find the reason for the pod not being scheduled. For example:
Output
Terminal window
QoS Class: BurstableNode-Selectors: <none>Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300snode.kubernetes.io/unreachable:NoExecute op=Exists for 300sEvents:Type Reason Age From Message---- ------ ---- ---- -------Warning FailedScheduling 9m27s (x3 over 9m31s) default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.Warning FailedScheduling 20s (x9 over 9m23s) default-scheduler 0/1 nodes are available: 1 node(s) didn't match Pod's node affinity.
Possible reasons are
- Storage class incorrect or not created. See persistent storage configuration for details.
- 1 node(s) didn’t match Pod’s node affinity - Invalid zone, region, racklabel, or other parameter for the rack configured for this pod.
- Insufficient resources, CPU or memory available to schedule more pods.
Pods keep crashing
After an Aerospike cluster has been created or updated if the pods are stuck with “Error” or “CrashLoopBackOff” status like so:
Output:
Terminal window
NAME READY STATUS RESTARTS AGEaerocluster-0-0 1/1 Error 0 48saerocluster-0-1 1/1 CrashLoopBackOff 2 48s
Check the following logs to see if pod initialization failed or the Aerospike Server stopped.
Init logs:
kubectl -n aerospike logs aerocluster-0-0 -c aerospike-init
Server logs:
kubectl -n aerospike logs aerocluster-0-0 -c aerospike-server
Possible reasons are
- Missing or incorrect feature key file - Fix by deleting the Aerospike secret and recreating it with correct feature key file. See Aerospike secrets for details.
- Bad Aerospike configuration - AKO tries to validate the configuration before applying it to the cluster. However, it’s still possible to misconfigure the Aerospike server. The offending parameter is logged in the server logs and should be fixed and applied again to the cluster. See Aerospike configuration change for details.
Error connecting to the cluster from outside Kubernetes
If the cluster runs fine as verified by the pod status and asadm (see connecting with asadm), Ensure that firewall allows inbound traffic to the Kubernetes cluster for the Aerospike ports. See Port access for details.
Events
Kubernetes events are generated on errors, resource state changes and at times for informational messages. Common errors include pod scheduling failures, storage unavailability, missing secrets, missing storage classes. When the Aerospike cluster is not deployed as expected, events could help debug and find the cause.
To help troubleshoot issues, AKO generates events indicating state changes, errors and informational messages for the cluster it is working on.
Run kubectl get event --namespace [namespace] --field-selector involvedObject.name=[cluster name]
to see the events generated by AKO for an Aerospike cluster. For example, this command displays the events generated for cluster aeroclutster
in the aerospike
namespace:
kubectl -n aerospike --field-selector involvedObject.name=aerocluster get events
Output:
Terminal window
LAST SEEN TYPE REASON OBJECT MESSAGE90s Normal WaitMigration aerospikecluster/aerocluster [rack-0] Waiting for migrations to complete92s Normal RackRollingRestart aerospikecluster/aerocluster [rack-0] Started Rolling restart2m8s Normal PodWaitSafeDelete aerospikecluster/aerocluster [rack-0] Waiting to safely restart Pod aerocluster-0-192s Normal PodRestarted aerospikecluster/aerocluster [rack-0] Restarted Pod aerocluster-0-192s Normal PodWaitSafeDelete aerospikecluster/aerocluster [rack-0] Waiting to safely restart Pod aerocluster-0-061s Normal PodRestarted aerospikecluster/aerocluster [rack-0] Restarted Pod aerocluster-0-0
To see all the events for the cluster’s namespace (aerospike
in this case), run the following command
kubectl -n aerospike get events
Output:
Terminal window
LAST SEEN TYPE REASON OBJECT MESSAGE15m Normal Killing pod/aerocluster-0-0 Stopping container aerospike-server15m Normal Scheduled pod/aerocluster-0-0 Successfully assigned aerospike/aerocluster-0-0 to ip-10-0-146-203.ap-south-1.compute.internal15m Normal AddedInterface pod/aerocluster-0-0 Add eth0 [10.131.1.30/23] from openshift-sdn15m Normal Pulled pod/aerocluster-0-0 Container image "docker.io/aerospike/aerospike-kubernetes-init:2.2.4" already present on machine15m Normal Created pod/aerocluster-0-0 Created container aerospike-init15m Normal Started pod/aerocluster-0-0 Started container aerospike-init15m Normal Pulled pod/aerocluster-0-0 Container image "docker.io/aerospike/aerospike-server-enterprise:8.0.0.2" already present on machine15m Normal Created pod/aerocluster-0-0 Created container aerospike-server15m Normal Started pod/aerocluster-0-0 Started container aerospike-server16m Normal Killing pod/aerocluster-0-1 Stopping container aerospike-server16m Normal Scheduled pod/aerocluster-0-1 Successfully assigned aerospike/aerocluster-0-1 to ip-10-0-146-203.ap-south-1.compute.internal16m Normal AddedInterface pod/aerocluster-0-1 Add eth0 [10.131.1.28/23] from openshift-sdn16m Normal Pulled pod/aerocluster-0-1 Container image "docker.io/aerospike/aerospike-kubernetes-init:2.2.4" already present on machine16m Normal Created pod/aerocluster-0-1 Created container aerospike-init16m Normal Started pod/aerocluster-0-1 Started container aerospike-init16m Normal Pulled pod/aerocluster-0-1 Container image "docker.io/aerospike/aerospike-server-enterprise:8.0.0.2" already present on machine16m Normal Created pod/aerocluster-0-1 Created container aerospike-server16m Normal Started pod/aerocluster-0-1 Started container aerospike-server15m Normal SuccessfulCreate statefulset/aerocluster-0 create Pod aerocluster-0-0 in StatefulSet aerocluster-0 successful16m Normal SuccessfulCreate statefulset/aerocluster-0 create Pod aerocluster-0-1 in StatefulSet aerocluster-0 successful16m Normal WaitMigration aerospikecluster/aerocluster [rack-0] Waiting for migrations to complete16m Normal RackRollingRestart aerospikecluster/aerocluster [rack-0] Started Rolling restart16m Normal PodWaitSafeDelete aerospikecluster/aerocluster [rack-0] Waiting to safely restart Pod aerocluster-0-116m Normal PodRestarted aerospikecluster/aerocluster [rack-0] Restarted Pod aerocluster-0-116m Normal PodWaitSafeDelete aerospikecluster/aerocluster [rack-0] Waiting to safely restart Pod aerocluster-0-015m Normal PodRestarted aerospikecluster/aerocluster [rack-0] Restarted Pod aerocluster-0-0
Operator logs
AKOCTL collectinfo logs
akoctl
is a Krew plugin for the Kubernetes Operator that uses the command collectinfo
to collect logs from a cluster or namespace.
See AKOCTL for more information.
List the AKO pods:
kubectl -n operators get pod | grep aerospike-operator-controller-manager
Output:
Terminal window
aerospike-operator-controller-manager-677f99497c-qrtcl 2/2 Running 0 9haerospike-operator-controller-manager-677f99497c-z9t6v 2/2 Running 0 9h
Get the log for an AKO pod using the pod names above using the following command:
kubectl -n operators logs aerospike-operator-controller-manager-677f99497c-qrtcl -c manager
Add the -f
flag to follow the logs continuously.
kubectl -n operators logs -f aerospike-operator-controller-manager-677f99497c-qrtcl -c manager
The series of steps AKO follows to apply user changes are logged along with errors and warnings.