# Scaling Aerospike on Kubernetes

Each Kubernetes pod runs a single Aerospike Database instance, which is equivalent to an Aerospike node in non-Kubernetes deployments.

You can scale Aerospike on Kubernetes _horizontally_ by adjusting the number of pods in your cluster, or _vertically_ by adjusting the resources available to them. You can also scale Aerospike namespace storage by adding a new rack and deleting the old one. This is a workaround due to StatefulSet PersistentVolumeClaim limitations.

-   See [Horizontal scaling](#horizontal-scaling) to scale the number of pods.
-   See [Vertical scaling](#vertical-scaling) to scale the CPU/memory resources allocated to the pod containers.
-   See [Aerospike namespace storage scaling](#aerospike-namespace-storage-scaling) to scale Aerospike namespace storage.
-   See [Kubernetes cluster autoscaling](#kubernetes-cluster-autoscaling) to scale the Kubernetes nodes in the cluster.

## Horizontal scaling

Scaling horizontally means adding or removing Kubernetes pods from the cluster.

The Custom Resource (CR) file controls the number of pods in a rack. When you change the cluster size in the CR file, Aerospike Kubernetes Operator (AKO) adds pods to the rack following the rack order defined in the CR.

AKO distributes the pods equally across all racks. If any pods remain after equal distribution, they are distributed following the rack order.

In a cluster of two racks and five Kubernetes pods:

-   After equal pod distribution, both racks have two pods, with one left over as a remainder.
-   The remaining pod goes to Rack 1, resulting in Rack 1 having three pods and Rack 2 having two pods.
-   If the cluster size is scaled up to six pods, a new pod is added to Rack 2.
-   Scaling down follows the rack order and removes pods with the goal of equal distribution.
    -   In this example of two racks and six pods, scaling down to four pods results in two racks with two pods each.
    -   The third pod on Rack 1 goes down first, followed by the third pod on Rack 2.

### Scale a cluster horizontally

For this example, the cluster is deployed using a CR file named `aerospike-cluster.yaml`.

1.  Change the `spec.size` field in the CR file to scale the cluster up or down to the specified number of pods.
    
    ```yaml
    apiVersion: asdb.aerospike.com/v1
    
    kind: AerospikeCluster
    
    metadata:
    
      name: aerocluster
    
      namespace: aerospike
    
    spec:
    
      size: 2
    
      image: aerospike/aerospike-server-enterprise:8.1.1.0
    ```
    
2.  Use `kubectl` to apply the change.
    
    Terminal window
    
    ```sh
    kubectl apply -f aerospike-cluster.yaml
    ```
    
3.  Check the pods.
    
    Terminal
    
    ```bash
    kubectl get pods -n aerospike
    ```
    
    Expected output
    
    ```text
    NAME               READY   STATUS    RESTARTS   AGE
    
    aerocluster-0-0    1/1     Running   0          3m6s
    
    aerocluster-0-1    1/1     Running   0          3m6s
    ```
    

### Batch scale-down

You can scale down multiple pods within the same rack with a single scaling command by configuring [`scaleDownBatchSize`](https://aerospike.com/docs/kubernetes/4.3.x/reference/config-reference#rack-config) in the CR file. This parameter is a percentage or absolute number of rack pods that AKO scales down simultaneously when the cluster size is decreased.

::: note
Batch scale-down is not supported for [strong consistency (SC)](https://aerospike.com/docs/kubernetes/4.3.x/manage/configure/strong-consistency) clusters.
:::

## Vertical scaling

Vertical scaling in Kubernetes adjusts the CPU and memory resources allocated to a Pod’s containers. This ensures variable workloads, such as traffic peaks, always have the necessary resources without wasting capacity.

The `spec.podSpec.aerospikeContainer.resources` parameter in the CR file governs the amount of compute resources (CPU or memory) available to each pod’s Aerospike container. Modifying this parameter causes a rolling restart of all pods with updated resources.

### Scale a cluster vertically

For this example, the cluster is deployed using a CR file named `aerospike-cluster.yaml`.

1.  Change the `spec.podSpec.aerospikeContainer.resources` field in the CR file to scale each pod’s Aerospike container resources.
    
    ```yaml
    apiVersion: asdb.aerospike.com/v1
    
    kind: AerospikeCluster
    
    metadata:
    
      name: aerocluster
    
      namespace: aerospike
    
    spec:
    
      size: 2
    
      image: aerospike/aerospike-server-enterprise:8.1.1.0
    
      podSpec:
    
        aerospikeContainer:
    
          resources:
    
            requests:
    
              memory: 2Gi
    
              cpu: 200m
    
            limits:
    
              memory: 10Gi
    
              cpu: 5000m
    ```
    
2.  Use `kubectl` to apply the change. This causes a rolling restart of the pods, after which the pods run with the updated resources.
    
    Terminal window
    
    ```sh
    kubectl apply -f aerospike-cluster.yaml
    ```
    
3.  Check the pods.
    
    Terminal
    
    ```bash
    kubectl get pods -n aerospike
    ```
    
    Expected output
    
    ```text
    NAME               READY   STATUS    RESTARTS   AGE
    
    aerocluster-0-0    1/1     Running   0          3m6s
    
    aerocluster-0-1    1/1     Running   0          3m6s
    ```
    

## Aerospike namespace storage scaling

AKO uses Kubernetes StatefulSets for deploying Aerospike clusters. StatefulSets use PersistentVolumeClaims (PVCs) for providing persistent storage. A PVC in a StatefulSet cannot be resized after creation, which prevents a simple solution for Aerospike namespace storage scaling.

AKO provides two approaches for storage scaling:

| Approach | Description | Benefits |
| --- | --- | --- |
| [Rack Revision](#storage-scaling-using-rack-revision) (Recommended) | Gradual pod-by-pod migration within the same rack | No double infrastructure cost, no rack-id changes, no client changes |
| [Rack Replacement](#storage-scaling-using-rack-replacement) | Replace the entire rack with a new one | Simple single-step operation |

### Storage scaling using Rack Revision

The **Rack Revision** feature enables safe, cost-effective storage updates by implementing a gradual pod-by-pod migration strategy. Instead of creating an entirely new rack, AKO migrates pods one at a time within the same rack while preserving the rack ID.

::: note
The `revision` field is a string that you can set to any value. A common practice is to use version-like strings such as `"v1"`, `"v2"`, etc., or timestamps.
:::

#### Benefits of Rack Revision

-   **No double infrastructure cost**: Pods are migrated gradually, so you don’t need to run two full racks simultaneously.
-   **No rack-id changes**: The rack ID remains the same, preserving rack topology.
-   **No client configuration changes**: Rack-aware clients don’t need to update their configuration.

#### How it works

When you update the `revision` field in a rack configuration along with storage changes, AKO:

1.  Creates new StatefulSets and PVCs with the new revision and updated storage configuration.
2.  Deletes the old revision pod and its PVCs.
3.  Scales up a new pod with the new revision and updated storage and waits for data migration.
4.  Repeats for each pod in the rack.

#### Example: Expanding storage with Rack Revision

In this example, the cluster is deployed using `aerospike-cluster.yaml`. The storage scaling process involves increasing the `ns` volume size from `3Gi` to `8Gi` and add a new block volume `ns2` using the `revision` field.

-   [Before scaling](#tab-panel-3207)
-   [After scaling](#tab-panel-3208)

```yaml
apiVersion: asdb.aerospike.com/v1

kind: AerospikeCluster

metadata:

  name: aerocluster

  namespace: aerospike

spec:

  size: 2

  image: aerospike/aerospike-server-enterprise:8.1.1.0

  rackConfig:

    namespaces:

      - test

    racks:

      - id: 1

        revision: "v1"

        zone: us-central1-b

        storage:

          filesystemVolumePolicy:

            cascadeDelete: true

            initMethod: deleteFiles

          volumes:

            - name: workdir

              aerospike:

                path: /opt/aerospike

              source:

                persistentVolume:

                  storageClass: ssd

                  volumeMode: Filesystem

                  size: 1Gi

            - name: ns

              aerospike:

                path: /dev/sdf

              source:

                persistentVolume:

                  storageClass: ssd

                  volumeMode: Block

                  size: 3Gi

            - name: aerospike-config-secret

              source:

                secret:

                  secretName: aerospike-secret

              aerospike:

                path: /etc/aerospike/secret

  aerospikeConfig:

    service:

      feature-key-file: /etc/aerospike/secret/features.conf

    security: {}

    namespaces:

      - name: test

        replication-factor: 2

        storage-engine:

          type: device

          devices:

            - /dev/sdf
```

```diff
apiVersion: asdb.aerospike.com/v1

kind: AerospikeCluster

metadata:

  name: aerocluster

  namespace: aerospike

spec:

  size: 2

  image: aerospike/aerospike-server-enterprise:8.1.1.0

  rackConfig:

    namespaces:

      - test

    racks:

      - id: 1

        revision: "v1"

        revision: "v2"  # Increment revision to trigger storage update

        zone: us-central1-b

        storage:

          filesystemVolumePolicy:

            cascadeDelete: true

            initMethod: deleteFiles

          volumes:

            - name: workdir

              aerospike:

                path: /opt/aerospike

              source:

                persistentVolume:

                  storageClass: ssd

                  volumeMode: Filesystem

                  size: 1Gi

            - name: ns

              aerospike:

                path: /dev/sdf

              source:

                persistentVolume:

                  storageClass: ssd

                  volumeMode: Block

                  size: 3Gi

                  size: 8Gi

            - name: ns2

              aerospike:

                path: /dev/sdg

              source:

                persistentVolume:

                  storageClass: ssd

                  volumeMode: Block

                  size: 5Gi

            - name: aerospike-config-secret

              source:

                secret:

                  secretName: aerospike-secret

              aerospike:

                path: /etc/aerospike/secret

  aerospikeConfig:

    service:

      feature-key-file: /etc/aerospike/secret/features.conf

    security: {}

    namespaces:

      - name: test

        replication-factor: 2

        storage-engine:

          type: device

          devices:

            - /dev/sdf

            - /dev/sdg
```

Apply the changes:

Terminal window

```shell
kubectl apply -f aerospike-cluster.yaml
```

AKO gradually migrates each pod to use the new revision with storage configuration while preserving the rack ID.

Check the pods with the `get pods` command:

Terminal

```bash
kubectl get pods -n aerospike
```

Expected output

```text
NAME                  READY   STATUS    RESTARTS   AGE

aerocluster-1-v2-0    1/1     Running   0          3m6s

aerocluster-1-v2-1    1/1     Running   0          5m12s
```

Notice that the pod names now include the revision (`v2`) as a suffix to the rack ID.

### Storage scaling using Rack Replacement

As an alternative to Rack Revision, you can replace an entire rack with a new one that has updated storage. This approach is simpler but requires running two racks simultaneously during the migration. AKO seamlessly migrates data from the old rack to the new rack and removes the old rack automatically.

::: note
This approach requires double infrastructure cost during migration and may require client configuration changes for rack-aware clients.
:::

#### Example: Expanding storage with Rack Replacement

In this example, the cluster is deployed using `aerospike-cluster.yaml`. The storage scaling process involves replacing an existing rack (`id: 1`) with a new rack (`id: 2`) that has increased storage.

The existing rack has two PersistentVolumes:

-   **`workdir` (1Gi):** Used as the Aerospike Database server work directory.
-   **`ns` (3Gi):** Used by the Aerospike namespace `test`.

The goal is to increase the `ns` volume size from `3Gi` to `8Gi`. Because Kubernetes does not allow resizing StatefulSet PVCs after creation, you must create a new rack with `id: 2` in the CR file with an updated storage configuration, then delete the old rack with `id: 1`.

-   [Before scaling](#tab-panel-3209)
-   [After scaling](#tab-panel-3210)

```diff
apiVersion: asdb.aerospike.com/v1

kind: AerospikeCluster

metadata:

  name: aerocluster

  namespace: aerospike

spec:

  size: 2

  image: aerospike/aerospike-server-enterprise:8.1.1.0

  rackConfig:

    namespaces:

      - test

    racks:

      - id: 1

        zone: us-central1-b

        storage:

          filesystemVolumePolicy:

            cascadeDelete: true

            initMethod: deleteFiles

          volumes:

            - name: workdir

              aerospike:

                path: /opt/aerospike

              source:

                persistentVolume:

                  storageClass: ssd

                  volumeMode: Filesystem

                  size: 1Gi

            - name: ns

              aerospike:

                path: /dev/sdf

              source:

                persistentVolume:

                  storageClass: ssd

                  volumeMode: Block

                  size: 3Gi

            - name: aerospike-config-secret

              source:

                secret:

                  secretName: aerospike-secret

              aerospike:

                path: /etc/aerospike/secret

  aerospikeConfig:

    service:

      feature-key-file: /etc/aerospike/secret/features.conf

    security: {}

    namespaces:

      - name: test

        replication-factor: 2

        storage-engine:

          type: device

          devices:

            - /dev/sdf
```

```diff
apiVersion: asdb.aerospike.com/v1

kind: AerospikeCluster

metadata:

  name: aerocluster

  namespace: aerospike

spec:

  size: 2

  image: aerospike/aerospike-server-enterprise:8.1.1.0

  rackConfig:

    namespaces:

      - test

    racks:

      - id: 1

       # Added new rack with id: 2 and removed the old rack with id: 1

      - id: 2

        zone: us-central1-b

        storage:

          filesystemVolumePolicy:

            cascadeDelete: true

            initMethod: deleteFiles

          volumes:

            - name: workdir

              aerospike:

                path: /opt/aerospike

              source:

                persistentVolume:

                  storageClass: ssd

                  volumeMode: Filesystem

                  size: 1Gi

            - name: ns

              aerospike:

                path: /dev/sdf

              source:

                persistentVolume:

                  storageClass: ssd

                  volumeMode: Block

                  size: 3Gi

                  size: 8Gi

            - name: aerospike-config-secret

              source:

                secret:

                  secretName: aerospike-secret

              aerospike:

                path: /etc/aerospike/secret

  aerospikeConfig:

    service:

      feature-key-file: /etc/aerospike/secret/features.conf

    security: {}

    namespaces:

      - name: test

        replication-factor: 2

        storage-engine:

          type: device

          devices:

            - /dev/sdf
```

#### Rack-configured storage after scaling

The second volume is increased from `3Gi` to `8Gi`.

Save and exit the CR file, then use `kubectl` to apply the change.

Terminal window

```shell
kubectl apply -f aerospike-cluster.yaml
```

This creates a new rack(`id: 2`) and an updated `storage` section. AKO removes the old rack after verifying that the Aerospike server has migrated the old data to the new rack.

Check the pods with the `get pods` command:

Terminal

```bash
kubectl get pods -n aerospike
```

Expected output

```text
NAME               READY   STATUS    RESTARTS   AGE

aerocluster-2-0     1/1     Running         0          3m6s

aerocluster-2-1     1/1     Running         0          3m6s

aerocluster-1-1     1/1     Terminating     0          30s
```

## Kubernetes cluster autoscaling

A Kubernetes node is a physical or virtual machine in the Kubernetes cluster infrastructure that runs pods. This is distinct from an Aerospike node, which is the Aerospike Database instance running inside a pod.

[Kubernetes Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) automatically scales up the Kubernetes cluster when resources are insufficient for the workload, and scales down the cluster when Kubernetes nodes are underused for an extended period of time. See the [documentation hosted on GitHub](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) for more details.

[Karpenter](https://github.com/aws/karpenter-provider-aws) is an open-source node lifecycle management project built for Kubernetes, with features that fit into cloud providers workflow. See the [Karpenter documentation](https://karpenter.sh/docs/) for more details.

If Aerospike cluster pods only have in-memory and dynamic network-attached storage, both autoscalers scale up and down by adjusting the number of Kubernetes nodes and shifting the load automatically to prevent data loss.

### Kubernetes cluster autoscaling with local volumes

The primary challenge with autoscalers and local storage provisioners is ensuring the availability of local disks during Kubernetes node startup after the autoscaler scales up the node. When using local volumes, the ability to successfully autoscale depends on the underlying storage provisioner: Kubernetes SIGs [Storage Local Static Provisioner](https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner), or [OpenEBS](https://openebs.io/).

The Kubernetes cluster autoscaler cannot add a new node when the underlying storage uses the Kubernetes static local storage provisioner. Scale up in this case must be done manually.

When scaling up using Karpenter, you can configure OpenEBS to automatically initialize local storage on newly-provisioned nodes by specifying the appropriate commands for volume setup.

This ensures that the local volumes required by OpenEBS are prepared automatically as soon as a node joins the cluster. If you prefer using a local storage provisioner, you can apply a DaemonSet that runs these same `pvcreate` and `vgcreate` commands on every new node. See [Google’s documentation for automatic bootstrapping on GKE](https://cloud.google.com/kubernetes-engine/docs/tutorials/automatically-bootstrapping-gke-nodes-with-daemonsets) for an example of using DaemonSets to bootstrap.

Neither autoscaler can scale down the nodes if any pod is running with local storage attached.