Skip to main content
Loading
Version: Operator 3.3.0

Troubleshooting Aerospike on Kubernetes

Pods stuck in pending state

After an Aerospike cluster has been created or updated if the pods are stuck with "Pending" status like so:

Output:

NAME          READY   STATUS      RESTARTS   AGE
aerocluster-0-0 1/1 Pending 0 48s
aerocluster-0-1 1/1 Pending 0 48s

Describe the pod to find the reason for scheduling failure:

kubectl -n aerospike describe pod aerocluster-0-0

Under the events section you will find the reason for the pod not being scheduled. For example:

Output

QoS Class:       Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 9m27s (x3 over 9m31s) default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
Warning FailedScheduling 20s (x9 over 9m23s) default-scheduler 0/1 nodes are available: 1 node(s) didn't match Pod's node affinity.

Possible reasons are

  • Storage class incorrect or not created. Please see persistent storage configuration for details.
  • 1 node(s) didn't match Pod's node affinity - Invalid zone, region, racklabel etc. for the rack configured for this pod.
  • Insufficient resources, CPU or memory available to schedule more pods.

Pods keep crashing

After an Aerospike cluster has been created or updated if the pods are stuck with "Error" or "CrashLoopBackOff" status like so:

Output:

NAME          READY   STATUS      RESTARTS   AGE
aerocluster-0-0 1/1 Error 0 48s
aerocluster-0-1 1/1 CrashLoopBackOff 2 48s

Check the following logs to see if pod initialization failed or the Aerospike Server stopped.

Init logs:

kubectl -n aerospike logs aerocluster-0-0 -c aerospike-init

Server logs:

kubectl -n aerospike logs aerocluster-0-0 -c aerospike-server

Possible reasons are

  • Missing or incorrect feature key file - Fix by deleting the Aerospike secret and recreating it with correct feature key file. See Aerospike secrets for details.
  • Bad Aerospike configuration - The operator tries to validate the configuration before applying it to the cluster. However, it's still possible to misconfigure the Aerospike server. The offending parameter is logged in the server logs and should be fixed and applied again to the cluster. See Aerospike configuration change for details.

Error connecting to the cluster from outside Kubernetes

If the cluster runs fine as verified by the pod status and asadm (see connecting with asadm), Ensure that firewall allows inbound traffic to the Kubernetes cluster for the Aerospike ports. See Port access for details.

Events

Kubernetes events are generated on errors, resource state changes and at times for informational messages. Common errors include pod scheduling failures, storage unavailability, missing secrets, missing storage classes.
When the Aerospike cluster is not deployed as expected, events could help debug and find the cause.

To help troubleshoot issues, the Operator generates events indicating state changes, errors and informational messages for the cluster it is working on.

Run kubectl get event --namespace [namespace] --field-selector involvedObject.name=[cluster name] to see the events generated by the operator for an Aerospike cluster. For example, this command displays the events generated for cluster aeroclutster in the aerospike namespace:

kubectl -n aerospike --field-selector involvedObject.name=aerocluster get events 

Output:

LAST SEEN   TYPE     REASON               OBJECT                              MESSAGE
90s Normal WaitMigration aerospikecluster/aerocluster [rack-0] Waiting for migrations to complete
92s Normal RackRollingRestart aerospikecluster/aerocluster [rack-0] Started Rolling restart
2m8s Normal PodWaitSafeDelete aerospikecluster/aerocluster [rack-0] Waiting to safely restart Pod aerocluster-0-1
92s Normal PodRestarted aerospikecluster/aerocluster [rack-0] Restarted Pod aerocluster-0-1
92s Normal PodWaitSafeDelete aerospikecluster/aerocluster [rack-0] Waiting to safely restart Pod aerocluster-0-0
61s Normal PodRestarted aerospikecluster/aerocluster [rack-0] Restarted Pod aerocluster-0-0

To see all the events for the cluster's namespace (aerospike in this case), run the following command

kubectl -n aerospike get events

Output:

LAST SEEN   TYPE     REASON               OBJECT                              MESSAGE
15m Normal Killing pod/aerocluster-0-0 Stopping container aerospike-server
15m Normal Scheduled pod/aerocluster-0-0 Successfully assigned aerospike/aerocluster-0-0 to ip-10-0-146-203.ap-south-1.compute.internal
15m Normal AddedInterface pod/aerocluster-0-0 Add eth0 [10.131.1.30/23] from openshift-sdn
15m Normal Pulled pod/aerocluster-0-0 Container image "docker.io/aerospike/aerospike-kubernetes-init:2.0.0" already present on machine
15m Normal Created pod/aerocluster-0-0 Created container aerospike-init
15m Normal Started pod/aerocluster-0-0 Started container aerospike-init
15m Normal Pulled pod/aerocluster-0-0 Container image "docker.io/aerospike/aerospike-server-enterprise:7.1.0.0" already present on machine
15m Normal Created pod/aerocluster-0-0 Created container aerospike-server
15m Normal Started pod/aerocluster-0-0 Started container aerospike-server
16m Normal Killing pod/aerocluster-0-1 Stopping container aerospike-server
16m Normal Scheduled pod/aerocluster-0-1 Successfully assigned aerospike/aerocluster-0-1 to ip-10-0-146-203.ap-south-1.compute.internal
16m Normal AddedInterface pod/aerocluster-0-1 Add eth0 [10.131.1.28/23] from openshift-sdn
16m Normal Pulled pod/aerocluster-0-1 Container image "docker.io/aerospike/aerospike-kubernetes-init:2.2.0" already present on machine
16m Normal Created pod/aerocluster-0-1 Created container aerospike-init
16m Normal Started pod/aerocluster-0-1 Started container aerospike-init
16m Normal Pulled pod/aerocluster-0-1 Container image "docker.io/aerospike/aerospike-server-enterprise:7.1.0.0" already present on machine
16m Normal Created pod/aerocluster-0-1 Created container aerospike-server
16m Normal Started pod/aerocluster-0-1 Started container aerospike-server
15m Normal SuccessfulCreate statefulset/aerocluster-0 create Pod aerocluster-0-0 in StatefulSet aerocluster-0 successful
16m Normal SuccessfulCreate statefulset/aerocluster-0 create Pod aerocluster-0-1 in StatefulSet aerocluster-0 successful
16m Normal WaitMigration aerospikecluster/aerocluster [rack-0] Waiting for migrations to complete
16m Normal RackRollingRestart aerospikecluster/aerocluster [rack-0] Started Rolling restart
16m Normal PodWaitSafeDelete aerospikecluster/aerocluster [rack-0] Waiting to safely restart Pod aerocluster-0-1
16m Normal PodRestarted aerospikecluster/aerocluster [rack-0] Restarted Pod aerocluster-0-1
16m Normal PodWaitSafeDelete aerospikecluster/aerocluster [rack-0] Waiting to safely restart Pod aerocluster-0-0
15m Normal PodRestarted aerospikecluster/aerocluster [rack-0] Restarted Pod aerocluster-0-0

Operator logs

AKOCTL collectinfo logs

akoctl is a Krew plugin for the Kubernetes Operator that uses the command collectinfo to collect logs from a cluster or namespace.

See AKOCTL for more information.

OpenShift

List the operator pods

kubectl -n openshift-operators get pod | grep aerospike-operator-controller-manager

Output:

aerospike-operator-controller-manager-677f99497c-qrtcl   2/2     Running   0          9h
aerospike-operator-controller-manager-677f99497c-z9t6v 2/2 Running 0 9h

Get the log for an operator pod using the pod names above using the following command:

kubectl -n openshift-operators logs aerospike-operator-controller-manager-677f99497c-qrtcl -c manager

Add the -f flag to follow the logs continuously.

kubectl -n openshift-operators logs -f aerospike-operator-controller-manager-677f99497c-qrtcl -c manager

The series of steps the operator follows to apply user changes are logged along with errors, and warnings.

Other Kubernetes distributions

List the operator pods

kubectl -n operators get pod | grep aerospike-operator-controller-manager

Output:

aerospike-operator-controller-manager-677f99497c-qrtcl   2/2     Running   0          9h
aerospike-operator-controller-manager-677f99497c-z9t6v 2/2 Running 0 9h

Get the log for an operator pod using the pod names above using the following command:

kubectl -n operators logs aerospike-operator-controller-manager-677f99497c-qrtcl -c manager

Add the -f flag to follow the logs continuously.

kubectl -n operators logs -f aerospike-operator-controller-manager-677f99497c-qrtcl -c manager

The series of steps the operator follows to apply user changes are logged along with errors, and warnings.