# Node maintenance for Aerospike on Kubernetes

When performing Kubernetes node maintenance (such as version upgrades, patching, or hardware changes), you need to safely migrate Aerospike pods off the affected nodes. The Aerospike Kubernetes Operator (AKO) provides multiple approaches to handle this:

| Approach | Use Case | Storage Type |
| --- | --- | --- |
| [Safe Pod Eviction](#safe-pod-eviction-webhook) | `kubectl drain` operations | Any |
| [Scheduling Policies](#scheduling-policy) | Planned migrations of Aerospike pods | Network-attached |
| [K8sNodeBlockList](#k8snodeblocklist) | Planned migrations of Aerospike pods | Any |

## Safe pod eviction webhook

AKO provides a webhook that intercepts pod eviction API calls triggered by commands like `kubectl drain` or Kubernetes node scale-down by cluster autoscalers like Karpenter. This webhook blocks the Aerospike pod eviction API calls and safely migrates those pods to other Kubernetes nodes, ensuring all safety checks for data migration.

This feature works with both network-attached and local-attached storage configurations. It is disabled by default.

### Enabling safe pod eviction

::: caution
While safe pod eviction manages API-driven requests, managed workflows like node upgrades on cloud services or a cluster autoscaler like Karpenter often bypass these protections by using a **Drain** → **Wait** → **Force-Delete** flow.

Even if the webhook denies the initial eviction to protect data, these systems eventually force-delete the pod after a pre-set timeout. If this occurs before Aerospike completes its data migration, the process may fail or restart, leading to potential data instability.

`safePodEviction.timeoutSeconds` controls webhook response wait time for each eviction request. It does not set how long Aerospike migration can run.

To prevent disruption to Aerospike migrations:

-   Avoid automatic Kubernetes node upgrades that force-delete pods. For example, GKE specifically lists this as a limitation, see [Limitations of local PersistentVolumes](https://docs.cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/local-ssd-raw#limitations_of_local_persistentvolumes).
-   Avoid forced termination (such as Karpenter `terminationGracePeriod`). If forced termination is required, set the grace period high enough for full data migration.
:::

To enable the safe pod eviction webhook, set the `ENABLE_SAFE_POD_EVICTION` environment variable to `true` in the operator deployment.

-   [Helm](#tab-panel-3055)
-   [OLM](#tab-panel-3056)

If you installed the operator using Helm, enable it by setting the value during installation or upgrade:

Terminal window

```bash
helm upgrade aerospike-kubernetes-operator aerospike/aerospike-kubernetes-operator \

  --set safePodEviction.enable="true"
```

Or add it to your `values.yaml`:

```yaml
# Enable the eviction webhook to safely block Aerospike pod evictions during node maintenance

# Also enables Prometheus metrics: aerospike_ako_eviction_webhook_requests_total (labels: eviction_namespace, decision)

safePodEviction:

  enable: "true"

  # Eviction webhook timeout in seconds

  timeoutSeconds: "20"
```

If you installed the operator using OLM (Operator Lifecycle Manager), patch the Subscription to add the environment variable:

Terminal window

```bash
kubectl -n operators patch subscription SUBSCRIPTION_NAME \

  --type='merge' \

  -p '{"spec":{"config":{"env":[{"name":"ENABLE_SAFE_POD_EVICTION","value":"true"}]}}}'
```

### Using kubectl drain

::: note
If pods are using local-attached storage, you must specify those local storage classes in the `spec.Storage.LocalStorageClasses` field of the CR before draining Kubernetes nodes. AKO uses this field to delete the corresponding local volumes so that the pods can be easily migrated out of the Kubernetes nodes.
:::

Once the safe pod eviction webhook is enabled, you can use standard Kubernetes commands to drain nodes:

Terminal window

```bash
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data
```

The webhook intercepts the eviction request for pods that belong to an AerospikeCluster and denies it. For non-Aerospike pods, the eviction request is passed through without modification.

If the eviction is blocked, the webhook sets an annotation `aerospike.com/eviction-blocked` on the pod. AKO receives this event and starts migrating the Aerospike pods safely. Wait for the AerospikeCluster to reach the `Completed` phase before retrying the drain command:

Terminal window

```bash
kubectl -n NAMESPACE wait --for=jsonpath='{.status.phase}'=Completed aerospikecluster/CLUSTER_NAME --timeout=300s
```

## Scheduling Policy

### Network-attached storage

For clusters using network-attached storage (such as cloud provider block storage), you can migrate pods by updating scheduling policies in the CR. The pods can move freely between nodes since the storage follows them.

Setting a scheduling policy like `affinity`, `taint & tolerations`, and `nodeSelectors` can help migrate the pods to a different node pool, and the current node pool can be brought down. Set `RollingUpdateBatchSize` to expedite this process by migrating pods in a batch.

For example, you can set the following `nodeAffinity` in the `podSpec` section of the Custom Resource (CR) file. AKO performs a rolling restart of the cluster and migrates the pods based on the scheduling policies.

Following `nodeAffinity` ensures that pods are migrated to a node-pool named `upgrade-pool`. AKO restarts the pods and move them to the nodes with the node label `cloud.google.com/gke-nodepool: upgrade-pool`.

```yaml
podSpec:

    affinity:

      nodeAffinity:

        requiredDuringSchedulingIgnoredDuringExecution:

          nodeSelectorTerms:

            - matchExpressions:

                - key: cloud.google.com/gke-nodepool

                  operator: In

                  values:

                    - upgrade-pool
```

## K8sNodeBlockList

### Local-attached storage

When Kubernetes pods use local storage, they are unable to move to different Aerospike cluster nodes because of volume affinity. This prevents a rolling restart with a different scheduling policy from working.

However, you can use the `K8sNodeBlockList` feature to migrate the pods out of the given Kubernetes nodes when using local storage.

`K8sNodeBlockList` specifies the list of Kubernetes node names from which you want to migrate pods. AKO reads this configuration and safely migrates pods off these nodes.

If pods are using network-attached storage, AKO migrates the pods out of their Kubernetes nodes without additional configuration. If pods are using local-attached storage, you must specify those local storage classes in the `spec.Storage.LocalStorageClasses` field of the CR. AKO uses this field to delete the corresponding local volumes so that the pods can be easily migrated out of the Kubernetes nodes.

This process uses the `RollingUpdateBatchSize` parameter defined in your CR to migrate pods in batches for efficiency.

The following example CR includes a `spec.K8sNodeBlockList` section with two nodes defined:

```yaml
apiVersion: asdb.aerospike.com/v1

kind: AerospikeCluster

metadata:

  name: aerocluster

  namespace: aerospike

spec:

  k8sNodeBlockList:

    - gke-test-default-pool-b6f71594-1w85

    - gke-test-default-pool-b6f71594-9vm2

  size: 4

  image: aerospike/aerospike-server-enterprise:8.1.1.0

  rackConfig:

    namespaces:

      - test

    racks:

      - id: 1

      - id: 2

...
```