Skip to content

Troubleshooting Aerospike on Kubernetes

For the complete documentation index see: llms.txt

All documentation pages available in markdown.

This page describes common issues with Aerospike Kubernetes Operator (AKO) deployments and how to resolve them.

Pods stuck in pending state

After an Aerospike cluster has been created or updated, if the pods are stuck with “Pending” status like so:

Command
Terminal window
kubectl get pods -n aerospike
Example output
NAME READY STATUS RESTARTS AGE
aerocluster-0-0 1/1 Pending 0 48s
aerocluster-0-1 1/1 Pending 0 48s

Use the following steps to find the reason for scheduling failure and confirm that the fix worked.

  1. Run the kubectl describe command for one of the pending pods. The Events section shows the reason for the pod not being scheduled.

    Command
    Terminal window
    kubectl -n aerospike describe pod aerocluster-0-0
    Example output
    QoS Class: Burstable
    Node-Selectors: <none>
    Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
    node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
    Events:
    Type Reason Age From Message
    ---- ------ ---- ---- -------
    Warning FailedScheduling 9m27s (x3 over 9m31s) default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
    Warning FailedScheduling 20s (x9 over 9m23s) default-scheduler 0/1 nodes are available: 1 node(s) didn't match Pod's node affinity.
  2. Fix the issue shown in the pod events. Possible reasons include:

    • Storage class incorrect or not created
    • Node(s) didn’t match the pod’s node affinity - invalid zone, region, rack label, or other parameter for the rack configured for this pod
    • Insufficient CPU or memory resources available to schedule more pods
  3. If the fix changes Kubernetes manifests, apply the updated file so the changes take effect.

    Terminal window
    kubectl apply -f CLUSTER_CR.yaml
  4. Validate that the pod is no longer pending.

    Terminal window
    kubectl -n aerospike get pods

    The affected pod should move from Pending to Running.

Pods keep crashing

After an Aerospike cluster has been created or updated, if the pods are stuck with “Error” or “CrashLoopBackOff” status like so:

Command
Terminal window
kubectl get pods -n aerospike
Example output
NAME READY STATUS RESTARTS AGE
aerocluster-0-0 1/1 Error 0 48s
aerocluster-0-1 1/1 CrashLoopBackOff 2 48s

Check the following logs to see if pod initialization failed or the Aerospike instance stopped.

  1. Check the init container logs.

    Terminal window
    kubectl -n aerospike logs aerocluster-0-0 -c aerospike-init
  2. Check the Aerospike logs.

    Terminal window
    kubectl -n aerospike logs aerocluster-0-0 -c aerospike-server
  3. Fix the issue shown in the logs. Possible reasons include:

    • Missing or incorrect feature-key file - Fix by deleting the Aerospike secret and recreating it with the correct feature-key file.
    • Bad Aerospike configuration - AKO tries to validate the configuration before applying it to the cluster. However, it’s still possible to misconfigure the Aerospike instance. The offending parameter is logged in the Aerospike logs. Fix the configuration and apply it again to the cluster. See Aerospike configuration change for details.
  4. If the fix changes the AerospikeCluster CR, apply the updated file.

    Terminal window
    kubectl apply -f CLUSTER_CR.yaml
  5. Validate that the pod is running and the restart count stops increasing.

    Terminal window
    kubectl -n aerospike get pods

Error connecting to the cluster from outside Kubernetes

If the cluster runs fine as verified by the pod status and asadm, ensure that the firewall allows inbound traffic to the Kubernetes cluster for the Aerospike ports.

  1. Confirm that the Aerospike pods are running.

    Terminal window
    kubectl -n aerospike get pods
  2. Check the firewall rules for the Aerospike ports. See Port access for details.

  3. Validate the external connection with asadm. Replace the host, port, username, and password with values for your deployment.

    Terminal window
    asadm -h HOST:PORT -U USERNAME -P PASSWORD --services-alternate

    See Connecting with asadm for more connection examples.

Metadata missing after pod restart

If secondary index definitions, security metadata, or other cluster metadata are missing after a pod restart while namespace data still appears present, the Aerospike work directory might be using ephemeral storage. The work directory contains System MetaData (SMD), which needs persistent file system storage when namespace data is persistent.

Use the following steps to check and fix the AerospikeCluster custom resource (CR).

  1. Check the CR for the following issues:

    • spec.storage.volumes or spec.rackConfig.racks[].storage.volumes does not mount a persistent volume at the Aerospike work directory path, such as /opt/aerospike.
    • spec.validationPolicy.skipWorkDirValidate is set to true, which skips the AKO check for persistent work directory storage.
  2. Update your cluster CR file so the work directory uses a persistent volume in the global or rack-level storage configuration. Adding or changing persistent volumes in spec.storage or spec.rackConfig.racks[].storage requires storage scaling because AKO does not add, remove, or change persistent storage volumes dynamically.

  3. Apply the updated CR file.

    Terminal window
    kubectl apply -f CLUSTER_CR.yaml
  4. Validate that the live CR has work directory storage configured and that skipWorkDirValidate is not set to true.

    Terminal window
    kubectl -n aerospike get aerospikecluster AEROSPIKE_CLUSTER_NAME -o yaml

    Confirm that the global or rack-level storage.volumes entry mounts a persistent volume at the work directory path, such as /opt/aerospike.

For more information, see Work directory and System MetaData (SMD) and Validation policy.

Events

Kubernetes events are generated on errors, resource state changes and at times for informational messages. Common errors include pod scheduling failures, storage unavailability, missing secrets, and missing storage classes. When the Aerospike cluster is not deployed as expected, events could help debug and find the cause.

To help troubleshoot issues, AKO generates events indicating state changes, errors and informational messages for the cluster it is working on.

Run kubectl get event --namespace [namespace] --field-selector involvedObject.name=[cluster name] to see the events generated by AKO for an Aerospike cluster. For example, this command displays the events generated for cluster aeroclutster in the aerospike namespace:

Command
Terminal window
kubectl -n aerospike --field-selector involvedObject.name=aerocluster get events
Cluster events
LAST SEEN TYPE REASON OBJECT MESSAGE
90s Normal WaitMigration aerospikecluster/aerocluster [rack-0] Waiting for migrations to complete
92s Normal RackRollingRestart aerospikecluster/aerocluster [rack-0] Started Rolling restart
2m8s Normal PodWaitSafeDelete aerospikecluster/aerocluster [rack-0] Waiting to safely restart Pod aerocluster-0-1
92s Normal PodRestarted aerospikecluster/aerocluster [rack-0] Restarted Pod aerocluster-0-1
92s Normal PodWaitSafeDelete aerospikecluster/aerocluster [rack-0] Waiting to safely restart Pod aerocluster-0-0
61s Normal PodRestarted aerospikecluster/aerocluster [rack-0] Restarted Pod aerocluster-0-0

To see all the events for the cluster’s namespace, run the following command with the namespace. In this example, the namespace is aerospike.

Command
Terminal window
kubectl -n NAMESPACE_NAME get events
Namespace events
LAST SEEN TYPE REASON OBJECT MESSAGE
15m Normal Killing pod/aerocluster-0-0 Stopping container aerospike-server
15m Normal Scheduled pod/aerocluster-0-0 Successfully assigned aerospike/aerocluster-0-0 to ip-10-0-146-203.ap-south-1.compute.internal
15m Normal AddedInterface pod/aerocluster-0-0 Add eth0 [10.131.1.30/23] from openshift-sdn
15m Normal Pulled pod/aerocluster-0-0 Container image "docker.io/aerospike/aerospike-kubernetes-init:2.5.1" already present on machine
15m Normal Created pod/aerocluster-0-0 Created container aerospike-init
15m Normal Started pod/aerocluster-0-0 Started container aerospike-init
15m Normal Pulled pod/aerocluster-0-0 Container image "docker.io/aerospike/aerospike-server-enterprise:8.1.2.0" already present on machine
15m Normal Created pod/aerocluster-0-0 Created container aerospike-server
15m Normal Started pod/aerocluster-0-0 Started container aerospike-server
16m Normal Killing pod/aerocluster-0-1 Stopping container aerospike-server
16m Normal Scheduled pod/aerocluster-0-1 Successfully assigned aerospike/aerocluster-0-1 to ip-10-0-146-203.ap-south-1.compute.internal
16m Normal AddedInterface pod/aerocluster-0-1 Add eth0 [10.131.1.28/23] from openshift-sdn
16m Normal Pulled pod/aerocluster-0-1 Container image "docker.io/aerospike/aerospike-kubernetes-init:2.5.1" already present on machine
16m Normal Created pod/aerocluster-0-1 Created container aerospike-init
16m Normal Started pod/aerocluster-0-1 Started container aerospike-init
16m Normal Pulled pod/aerocluster-0-1 Container image "docker.io/aerospike/aerospike-server-enterprise:8.1.2.0" already present on machine
16m Normal Created pod/aerocluster-0-1 Created container aerospike-server
16m Normal Started pod/aerocluster-0-1 Started container aerospike-server
15m Normal SuccessfulCreate statefulset/aerocluster-0 create Pod aerocluster-0-0 in StatefulSet aerocluster-0 successful
16m Normal SuccessfulCreate statefulset/aerocluster-0 create Pod aerocluster-0-1 in StatefulSet aerocluster-0 successful
16m Normal WaitMigration aerospikecluster/aerocluster [rack-0] Waiting for migrations to complete
16m Normal RackRollingRestart aerospikecluster/aerocluster [rack-0] Started Rolling restart
16m Normal PodWaitSafeDelete aerospikecluster/aerocluster [rack-0] Waiting to safely restart Pod aerocluster-0-1
16m Normal PodRestarted aerospikecluster/aerocluster [rack-0] Restarted Pod aerocluster-0-1
16m Normal PodWaitSafeDelete aerospikecluster/aerocluster [rack-0] Waiting to safely restart Pod aerocluster-0-0
15m Normal PodRestarted aerospikecluster/aerocluster [rack-0] Restarted Pod aerocluster-0-0

Operator logs

AKOCTL collectinfo logs

akoctl is a Krew plugin for the Kubernetes Operator that uses the command collectinfo to collect logs from a cluster or namespace.

See AKOCTL for more information.

List the AKO pods:

Command
Terminal window
kubectl -n operators get pod | grep aerospike-operator-controller-manager
Example output
aerospike-operator-controller-manager-677f99497c-qrtcl 2/2 Running 0 9h
aerospike-operator-controller-manager-677f99497c-z9t6v 2/2 Running 0 9h

Get the log for an AKO pod using the pod names above using the following command:

kubectl -n operators logs aerospike-operator-controller-manager-677f99497c-qrtcl -c manager

Add the -f flag to follow the logs continuously.

kubectl -n operators logs -f aerospike-operator-controller-manager-677f99497c-qrtcl -c manager

The series of steps AKO follows to apply user changes are logged along with errors and warnings.

Feedback

Was this page helpful?

What type of feedback are you giving?

What would you like us to know?

+Capture screenshot

Can we reach out to you?