# Troubleshooting Aerospike on Kubernetes

This page describes common issues with Aerospike Kubernetes Operator (AKO) deployments and how to resolve them.

## Pods stuck in pending state

After an Aerospike cluster has been created or updated, if the pods are stuck with “Pending” status like so:

Terminal window

```bash
kubectl get pods -n aerospike
```

Example output

```text
NAME          READY   STATUS      RESTARTS   AGE

aerocluster-0-0     1/1     Pending     0          48s

aerocluster-0-1     1/1     Pending     0          48s
```

Use the `kubectl describe` command to find the reason for scheduling failure. The `Events` section shows the reason for the pod not being scheduled.

Terminal window

```bash
kubectl -n aerospike describe pod aerocluster-0-0
```

Example output

```text
QoS Class:       Burstable

Node-Selectors:  <none>

Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s

                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s

Events:

  Type     Reason            Age                    From               Message

  ----     ------            ----                   ----               -------

  Warning  FailedScheduling  9m27s (x3 over 9m31s)  default-scheduler  0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.

  Warning  FailedScheduling  20s (x9 over 9m23s)    default-scheduler  0/1 nodes are available: 1 node(s) didn't match Pod's node affinity.
```

Possible reasons include:

-   Storage class incorrect or not created
-   Node(s) didn’t match the pod’s node affinity - invalid zone, region, rack label, or other parameter for the rack configured for this pod
-   Insufficient CPU or memory resources available to schedule more pods

## Pods keep crashing

After an Aerospike cluster has been created or updated, if the pods are stuck with “Error” or “CrashLoopBackOff” status like so:

Terminal window

```bash
kubectl get pods -n aerospike
```

Example output

```text
NAME          READY   STATUS      RESTARTS   AGE

aerocluster-0-0     1/1     Error     0          48s

aerocluster-0-1     1/1     CrashLoopBackOff     2          48s
```

Check the following logs to see if pod initialization failed or the Aerospike server stopped.

Init logs:

Terminal window

```bash
kubectl -n aerospike logs aerocluster-0-0 -c aerospike-init
```

Server logs:

Terminal window

```bash
kubectl -n aerospike logs aerocluster-0-0 -c aerospike-server
```

Possible reasons include:

-   Missing or incorrect feature-key file - Fix by deleting the Aerospike secret and recreating it with the correct feature-key file.
-   Bad Aerospike configuration - AKO tries to validate the configuration before applying it to the cluster. However, it’s still possible to misconfigure the Aerospike server. The offending parameter is logged in the server logs. Fix the configuration and apply it again to the cluster. See [Aerospike configuration change](https://aerospike.com/docs/kubernetes/4.3.x/reference/config-reference) for details.

## Error connecting to the cluster from outside Kubernetes

If the cluster runs fine as verified by the pod status and `asadm`, ensure that firewall allows inbound traffic to the Kubernetes cluster for the Aerospike ports. See [Connecting with `asadm`](https://aerospike.com/docs/kubernetes/4.3.x/install/deploy/connect#connect-with-asadm) and [Port access](https://aerospike.com/docs/kubernetes/4.3.x/install/deploy/connect#port-access) for details.

## Events

Kubernetes events are generated on errors, resource state changes and at times for informational messages. Common errors include pod scheduling failures, storage unavailability, missing secrets, and missing storage classes. When the Aerospike cluster is not deployed as expected, events could help debug and find the cause.

To help troubleshoot issues, AKO generates events indicating state changes, errors and informational messages for the cluster it is working on.

Run `kubectl get event --namespace [namespace] --field-selector involvedObject.name=[cluster name]` to see the events generated by AKO for an Aerospike cluster. For example, this command displays the events generated for cluster `aeroclutster` in the `aerospike` namespace:

Terminal window

```bash
kubectl -n aerospike --field-selector involvedObject.name=aerocluster get events
```

Output

```text
LAST SEEN   TYPE     REASON               OBJECT                              MESSAGE

90s         Normal   WaitMigration        aerospikecluster/aerocluster   [rack-0] Waiting for migrations to complete

92s         Normal   RackRollingRestart   aerospikecluster/aerocluster   [rack-0] Started Rolling restart

2m8s        Normal   PodWaitSafeDelete    aerospikecluster/aerocluster   [rack-0] Waiting to safely restart Pod aerocluster-0-1

92s         Normal   PodRestarted         aerospikecluster/aerocluster   [rack-0] Restarted Pod aerocluster-0-1

92s         Normal   PodWaitSafeDelete    aerospikecluster/aerocluster   [rack-0] Waiting to safely restart Pod aerocluster-0-0

61s         Normal   PodRestarted         aerospikecluster/aerocluster   [rack-0] Restarted Pod aerocluster-0-0
```

To see all the events for the cluster’s namespace run the following command with the namespace. In this example, the namespace is `aerospike`.

Terminal window

```bash
kubectl -n aerospike get events
```

Output

```text
LAST SEEN   TYPE     REASON               OBJECT                              MESSAGE

15m         Normal   Killing              pod/aerocluster-0-0            Stopping container aerospike-server

15m         Normal   Scheduled            pod/aerocluster-0-0            Successfully assigned aerospike/aerocluster-0-0 to ip-10-0-146-203.ap-south-1.compute.internal

15m         Normal   AddedInterface       pod/aerocluster-0-0            Add eth0 [10.131.1.30/23] from openshift-sdn

15m         Normal   Pulled               pod/aerocluster-0-0            Container image "docker.io/aerospike/aerospike-kubernetes-init:2.5.0" already present on machine

15m         Normal   Created              pod/aerocluster-0-0            Created container aerospike-init

15m         Normal   Started              pod/aerocluster-0-0            Started container aerospike-init

15m         Normal   Pulled               pod/aerocluster-0-0            Container image "docker.io/aerospike/aerospike-server-enterprise:8.1.1.0" already present on machine

15m         Normal   Created              pod/aerocluster-0-0            Created container aerospike-server

15m         Normal   Started              pod/aerocluster-0-0            Started container aerospike-server

16m         Normal   Killing              pod/aerocluster-0-1            Stopping container aerospike-server

16m         Normal   Scheduled            pod/aerocluster-0-1            Successfully assigned aerospike/aerocluster-0-1 to ip-10-0-146-203.ap-south-1.compute.internal

16m         Normal   AddedInterface       pod/aerocluster-0-1            Add eth0 [10.131.1.28/23] from openshift-sdn

16m         Normal   Pulled               pod/aerocluster-0-1            Container image "docker.io/aerospike/aerospike-kubernetes-init:2.5.0" already present on machine

16m         Normal   Created              pod/aerocluster-0-1            Created container aerospike-init

16m         Normal   Started              pod/aerocluster-0-1            Started container aerospike-init

16m         Normal   Pulled               pod/aerocluster-0-1            Container image "docker.io/aerospike/aerospike-server-enterprise:8.1.1.0" already present on machine

16m         Normal   Created              pod/aerocluster-0-1            Created container aerospike-server

16m         Normal   Started              pod/aerocluster-0-1            Started container aerospike-server

15m         Normal   SuccessfulCreate     statefulset/aerocluster-0      create Pod aerocluster-0-0 in StatefulSet aerocluster-0 successful

16m         Normal   SuccessfulCreate     statefulset/aerocluster-0      create Pod aerocluster-0-1 in StatefulSet aerocluster-0 successful

16m         Normal   WaitMigration        aerospikecluster/aerocluster   [rack-0] Waiting for migrations to complete

16m         Normal   RackRollingRestart   aerospikecluster/aerocluster   [rack-0] Started Rolling restart

16m         Normal   PodWaitSafeDelete    aerospikecluster/aerocluster   [rack-0] Waiting to safely restart Pod aerocluster-0-1

16m         Normal   PodRestarted         aerospikecluster/aerocluster   [rack-0] Restarted Pod aerocluster-0-1

16m         Normal   PodWaitSafeDelete    aerospikecluster/aerocluster   [rack-0] Waiting to safely restart Pod aerocluster-0-0

15m         Normal   PodRestarted         aerospikecluster/aerocluster   [rack-0] Restarted Pod aerocluster-0-0
```

## Operator logs

### AKOCTL collectinfo logs

`akoctl` is a [Krew](https://krew.sigs.k8s.io/) plugin for the Kubernetes Operator that uses the command `collectinfo` to collect logs from a cluster or namespace.

See [AKOCTL](https://aerospike.com/docs/kubernetes/4.3.x/manage/akoctl) for more information.

List the AKO pods:

```plaintext
kubectl -n operators get pod | grep aerospike-operator-controller-manager
```

> Output:
> 
> Terminal window
> 
> ```shell
> aerospike-operator-controller-manager-677f99497c-qrtcl   2/2     Running   0          9h
> 
> aerospike-operator-controller-manager-677f99497c-z9t6v   2/2     Running   0          9h
> ```

Get the log for an AKO pod using the pod names above using the following command:

```plaintext
kubectl -n operators logs aerospike-operator-controller-manager-677f99497c-qrtcl -c manager
```

Add the `-f` flag to follow the logs continuously.

```plaintext
kubectl -n operators logs -f aerospike-operator-controller-manager-677f99497c-qrtcl -c manager
```

The series of steps AKO follows to apply user changes are logged along with errors and warnings.