# Troubleshooting Aerospike on Kubernetes

## Pods stuck in pending state

After an Aerospike cluster has been created or updated, if the pods are stuck with “Pending” status like so:

> Output:
> 
> Terminal window
> 
> ```sh
> NAME          READY   STATUS      RESTARTS   AGE
> 
> aerocluster-0-0     1/1     Pending     0          48s
> 
> aerocluster-0-1     1/1     Pending     0          48s
> ```

Use the `kubectl describe` command to find the reason for scheduling failure:

Terminal window

```sh
kubectl -n aerospike describe pod aerocluster-0-0
```

The `Events` section shows the reason for the pod not being scheduled.

> Output
> 
> Terminal window
> 
> ```sh
> QoS Class:       Burstable
> 
> Node-Selectors:  <none>
> 
> Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
> 
>                  node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
> 
> Events:
> 
>   Type     Reason            Age                    From               Message
> 
>   ----     ------            ----                   ----               -------
> 
>   Warning  FailedScheduling  9m27s (x3 over 9m31s)  default-scheduler  0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
> 
>   Warning  FailedScheduling  20s (x9 over 9m23s)    default-scheduler  0/1 nodes are available: 1 node(s) didn't match Pod's node affinity.
> ```

Possible reasons are

-   Storage class incorrect or not created.
-   1 node(s) didn’t match the pod’s node affinity - Invalid zone, region, racklabel, or other parameter for the rack configured for this pod.
-   Insufficient resources, CPU or memory available to schedule more pods.

## Pods keep crashing

After an Aerospike cluster has been created or updated, if the pods are stuck with “Error” or “CrashLoopBackOff” status like so:

> Output:
> 
> Terminal window
> 
> ```sh
> NAME          READY   STATUS      RESTARTS   AGE
> 
> aerocluster-0-0     1/1     Error     0          48s
> 
> aerocluster-0-1     1/1     CrashLoopBackOff     2          48s
> ```

Check the following logs to see if pod initialization failed or the Aerospike server stopped.

Init logs:

Terminal window

```sh
kubectl -n aerospike logs aerocluster-0-0 -c aerospike-init
```

Server logs:

Terminal window

```sh
kubectl -n aerospike logs aerocluster-0-0 -c aerospike-server
```

Possible reasons are

-   Missing or incorrect feature key file - Fix by deleting the Aerospike secret and recreating it with the correct feature key file.
-   Bad Aerospike configuration - AKO tries to validate the configuration before applying it to the cluster. However, it’s still possible to misconfigure the Aerospike server. The offending parameter is logged in the server logs and should be fixed and applied again to the cluster. See [Aerospike configuration change](https://aerospike.com/docs/kubernetes/4.1.x/reference/config-reference) for details.

## Error connecting to the cluster from outside Kubernetes

If the cluster runs fine as verified by the pod status and `asadm`, ensure that firewall allows inbound traffic to the Kubernetes cluster for the Aerospike ports. See [Connecting with `asadm`](https://aerospike.com/docs/kubernetes/4.1.x/install/deploy/connect#connect-with-asadm) and [Port access](https://aerospike.com/docs/kubernetes/4.1.x/install/deploy/connect#port-access) for details.

## Events

Kubernetes events are generated on errors, resource state changes and at times for informational messages. Common errors include pod scheduling failures, storage unavailability, missing secrets, and missing storage classes. When the Aerospike cluster is not deployed as expected, events could help debug and find the cause.

To help troubleshoot issues, AKO generates events indicating state changes, errors and informational messages for the cluster it is working on.

Run `kubectl get event --namespace [namespace] --field-selector involvedObject.name=[cluster name]` to see the events generated by AKO for an Aerospike cluster. For example, this command displays the events generated for cluster `aeroclutster` in the `aerospike` namespace:

Terminal window

```shell
kubectl -n aerospike --field-selector involvedObject.name=aerocluster get events
```

> Output:
> 
> Terminal window
> 
> ```shell
> LAST SEEN   TYPE     REASON               OBJECT                              MESSAGE
> 
> 90s         Normal   WaitMigration        aerospikecluster/aerocluster   [rack-0] Waiting for migrations to complete
> 
> 92s         Normal   RackRollingRestart   aerospikecluster/aerocluster   [rack-0] Started Rolling restart
> 
> 2m8s        Normal   PodWaitSafeDelete    aerospikecluster/aerocluster   [rack-0] Waiting to safely restart Pod aerocluster-0-1
> 
> 92s         Normal   PodRestarted         aerospikecluster/aerocluster   [rack-0] Restarted Pod aerocluster-0-1
> 
> 92s         Normal   PodWaitSafeDelete    aerospikecluster/aerocluster   [rack-0] Waiting to safely restart Pod aerocluster-0-0
> 
> 61s         Normal   PodRestarted         aerospikecluster/aerocluster   [rack-0] Restarted Pod aerocluster-0-0
> ```

To see all the events for the cluster’s namespace run the following command with the namespace. In this example, the namespace is `aerospike`.

Terminal window

```sh
kubectl -n aerospike get events
```

> Output:
> 
> Terminal window
> 
> ```shell
> LAST SEEN   TYPE     REASON               OBJECT                              MESSAGE
> 
> 15m         Normal   Killing              pod/aerocluster-0-0            Stopping container aerospike-server
> 
> 15m         Normal   Scheduled            pod/aerocluster-0-0            Successfully assigned aerospike/aerocluster-0-0 to ip-10-0-146-203.ap-south-1.compute.internal
> 
> 15m         Normal   AddedInterface       pod/aerocluster-0-0            Add eth0 [10.131.1.30/23] from openshift-sdn
> 
> 15m         Normal   Pulled               pod/aerocluster-0-0            Container image "docker.io/aerospike/aerospike-kubernetes-init:2.3.0" already present on machine
> 
> 15m         Normal   Created              pod/aerocluster-0-0            Created container aerospike-init
> 
> 15m         Normal   Started              pod/aerocluster-0-0            Started container aerospike-init
> 
> 15m         Normal   Pulled               pod/aerocluster-0-0            Container image "docker.io/aerospike/aerospike-server-enterprise:8.1.0.0" already present on machine
> 
> 15m         Normal   Created              pod/aerocluster-0-0            Created container aerospike-server
> 
> 15m         Normal   Started              pod/aerocluster-0-0            Started container aerospike-server
> 
> 16m         Normal   Killing              pod/aerocluster-0-1            Stopping container aerospike-server
> 
> 16m         Normal   Scheduled            pod/aerocluster-0-1            Successfully assigned aerospike/aerocluster-0-1 to ip-10-0-146-203.ap-south-1.compute.internal
> 
> 16m         Normal   AddedInterface       pod/aerocluster-0-1            Add eth0 [10.131.1.28/23] from openshift-sdn
> 
> 16m         Normal   Pulled               pod/aerocluster-0-1            Container image "docker.io/aerospike/aerospike-kubernetes-init:2.3.0" already present on machine
> 
> 16m         Normal   Created              pod/aerocluster-0-1            Created container aerospike-init
> 
> 16m         Normal   Started              pod/aerocluster-0-1            Started container aerospike-init
> 
> 16m         Normal   Pulled               pod/aerocluster-0-1            Container image "docker.io/aerospike/aerospike-server-enterprise:8.1.0.0" already present on machine
> 
> 16m         Normal   Created              pod/aerocluster-0-1            Created container aerospike-server
> 
> 16m         Normal   Started              pod/aerocluster-0-1            Started container aerospike-server
> 
> 15m         Normal   SuccessfulCreate     statefulset/aerocluster-0      create Pod aerocluster-0-0 in StatefulSet aerocluster-0 successful
> 
> 16m         Normal   SuccessfulCreate     statefulset/aerocluster-0      create Pod aerocluster-0-1 in StatefulSet aerocluster-0 successful
> 
> 16m         Normal   WaitMigration        aerospikecluster/aerocluster   [rack-0] Waiting for migrations to complete
> 
> 16m         Normal   RackRollingRestart   aerospikecluster/aerocluster   [rack-0] Started Rolling restart
> 
> 16m         Normal   PodWaitSafeDelete    aerospikecluster/aerocluster   [rack-0] Waiting to safely restart Pod aerocluster-0-1
> 
> 16m         Normal   PodRestarted         aerospikecluster/aerocluster   [rack-0] Restarted Pod aerocluster-0-1
> 
> 16m         Normal   PodWaitSafeDelete    aerospikecluster/aerocluster   [rack-0] Waiting to safely restart Pod aerocluster-0-0
> 
> 15m         Normal   PodRestarted         aerospikecluster/aerocluster   [rack-0] Restarted Pod aerocluster-0-0
> ```

## Operator logs

### AKOCTL collectinfo logs

`akoctl` is a [Krew](https://krew.sigs.k8s.io/) plugin for the Kubernetes Operator that uses the command `collectinfo` to collect logs from a cluster or namespace.

See [AKOCTL](https://aerospike.com/docs/kubernetes/4.1.x/manage/akoctl) for more information.

List the AKO pods:

```plaintext
kubectl -n operators get pod | grep aerospike-operator-controller-manager
```

> Output:
> 
> Terminal window
> 
> ```shell
> aerospike-operator-controller-manager-677f99497c-qrtcl   2/2     Running   0          9h
> 
> aerospike-operator-controller-manager-677f99497c-z9t6v   2/2     Running   0          9h
> ```

Get the log for an AKO pod using the pod names above using the following command:

```plaintext
kubectl -n operators logs aerospike-operator-controller-manager-677f99497c-qrtcl -c manager
```

Add the `-f` flag to follow the logs continuously.

```plaintext
kubectl -n operators logs -f aerospike-operator-controller-manager-677f99497c-qrtcl -c manager
```

The series of steps AKO follows to apply user changes are logged along with errors and warnings.