---
title: "Aerospike Backup Service performance tuning"
description: "Tune the Aerospike Backup Service process for throughput and efficiency."
---

# Aerospike Backup Service performance tuning

> For the complete documentation index see: [llms.txt](https://aerospike.com/docs/llms.txt)
> 
> All documentation pages available in markdown.

This page describes how to tune the Aerospike Backup Service (ABS) process for maximum throughput, compression ratio, and resource efficiency.

All performance tuning decisions must balance the two stages of the backup process:

-   Readers read records from Aerospike using primary-index queries.
-   Writers serialize, compress, encrypt, and move the data to storage.

::: note
This guide focuses on tuning signals (`metrics.pipeline`, duration, CPU, and memory) and directional recommendations. It does not include benchmarks or performance tests. Always validate final values with your own data and storage or network path.
:::

## What performance tuning means for ABS

Performance tuning for ABS means choosing configuration parameter values that finish backups faster without overloading the Aerospike cluster, the ABS host, or the storage path. The best setting depends on your goal:

-   Faster backups usually require more read or write parallelism.
-   Lower cluster impact usually requires read throttling or lower read parallelism.
-   Smaller backup files usually require more compression CPU.
-   Lower memory usage usually requires fewer writers or smaller upload buffers.

Tune one bottleneck at a time. Pick a goal, check the `pipeline` metric to find whether readers or writers are limiting throughput, adjust the parameters for that stage, and validate the result with representative data. Stop increasing concurrency when throughput gains become marginal or when CPU, memory, cluster load, or storage throttling become limiting factors.

## Backup process architecture

Each backup runs a pool of reader workers and a pool of writer workers, connected by buffered Go channels.

ABS publishes a `metrics.pipeline` value that reports how full these channel buffers between readers and writers are.

If readers fill the channels faster than writers drain them, the channels fill up and `pipeline` approaches capacity. If writers drain faster than readers fill, the channels stay empty and `pipeline` stays near `0`.

#### Readers (controlled by `parallel`):

1.  Open a primary-index query against an Aerospike node for a partition range.
2.  Receive records using Aerospike binary protocol over TCP.
3.  Push each record into a channel buffer with capacity 256 per reader.

The constraint for readers is usually the query capacity of the Aerospike Database nodes. Common slowdown causes include node I/O bottlenecks, CPU contention from production read/write traffic, and network saturation between ABS and the cluster. A cluster over capacity results in `FAIL_FORBIDDEN` errors in the ABS logs.

#### Writers (controlled by `parallel-write`):

1.  Pull a record from the channel buffer.
2.  Encode it into the `.asb` binary format.
3.  Compress with ZSTD if enabled.
4.  Encrypt if enabled.
5.  Write the encoded bytes into an in-memory buffer until it reaches the size limit set by `min-part-size`.
6.  Upload the buffer as one part of a multipart upload to object storage.
7.  When cumulative bytes reach `file-limit`, complete the current multipart upload and start a new file.

A writer stalls when any step in this chain is a bottleneck. The most common bottlenecks are CPU-bound ZSTD compression at high levels, network-bound storage uploads to S3, GCS, or Azure, and memory pressure from large upload buffers triggering garbage collection pauses.

The constraint for writers is usually CPU or storage. More writers running ZSTD on the same machine means more goroutines competing for the same cores. Each writer also allocates memory for its channel buffer, upload buffer, and encoder state. Total memory grows linearly with the number of writers and can overflow the container’s memory limit. Storage backends can throttle when they receive too many concurrent upload requests.

Always check the [`pipeline` metric](#checking-the-pipeline-metric) to identify the bottleneck stage before adding threads to either side.

## Checking the pipeline metric

Before changing any configuration, check the `pipeline` metric during a running backup:

Terminal window

```bash
curl http://ABS_HOST:8080/v1/backups/currentBackup/ROUTINE_NAME | \

  jq '{full: .full.metrics.pipeline, incremental: .incremental.metrics.pipeline}'
```

The response has separate `full` and `incremental` blocks, one per running job. The `metrics.pipeline` field inside each block reports the total number of records sitting in the channel buffers between readers and writers for that job.

Pipeline capacity equals `256 × parallel + 256 × parallel-write`. When `parallel-write` is not set, it defaults to `parallel`, so the capacity simplifies to `512 × parallel`.

| Configuration | Capacity |
| --- | --- |
| `parallel=8` (default, both read and write) | 256×8 + 256×8 = 4096 |
| `parallel=4`, `parallel-write=8` | 256×4 + 256×8 = 3072 |
| `parallel=4`, `parallel-write=4` | 256×4 + 256×4 = 2048 |

A healthy pipeline can briefly reach `0` or capacity during normal fluctuations. In a busy system, watch for sustained readings near either extreme.

-   Pipeline stays near `0`: Writers are idle, waiting for data. Readers are the bottleneck.
-   Pipeline fluctuates across the range: Neither stage is a clear bottleneck. The configuration is near optimal.
-   Pipeline stays near capacity: Readers are blocked because the buffers are full. Writers are the bottleneck.

## Tuning workflow

1.  Start with defaults. Set `parallel: 8`. Leave `parallel-write` unset so it inherits the `parallel` value. Enable ZSTD at `compression.level: 3` to use the Default preset. Keep `file-limit: 250`.
    
2.  Monitor the `pipeline` metric. Check `GET /v1/backups/currentBackup/ROUTINE_NAME` during a backup run.
    
3.  Match the `pipeline` reading to a stage in [Backup process architecture](#backup-process-architecture):
    
    -   Stays near `0`: Readers are constrained by query latency, network limits, or a low `parallel` setting. Increase `parallel` and check cluster health.
    -   Fluctuates across the range: Readers and writers are balanced. Configuration is close to optimal.
    -   Stays near capacity: Writers are constrained by compression CPU, upload latency, or a low `parallel-write` setting. Increase `parallel-write`, reduce the compression level, or increase storage bandwidth.
4.  Check system resources.
    
    -   CPU > 90%: Decrease `parallel-write` or lower the compression level.
    -   RAM high: Decrease `parallel-write` or `min-part-size`.
5.  Iterate and validate with short tests that use representative data. Steady-state throughput is reached within seconds of backup start, so test runs of 2–5 minutes are often enough to observe it.
    
6.  Scale horizontally. If a single ABS instance is fully saturated, use [partition-list slicing](https://aerospike.com/docs/database/tools/backup-and-restore/backup-service/parallel-backup) to split workload across multiple instances.
    

## Tuning parameters

This section describes in more detail the architecture and parameters that control reader and writer bottlenecks.

### Reader bottlenecks

Use these parameters when `pipeline` stays near `0`, which means writers are waiting for records.

#### `parallel` (read parallelism)

Controls the number of concurrent reader threads issuing primary-index queries against the Aerospike cluster.

When to increase: If `pipeline` stays near `0`, increase `parallel` incrementally until readers stop being the bottleneck. Throughput gains taper off once readers saturate the cluster’s query capacity. Stop increasing when gains are marginal.

When to decrease: You see `FAIL_FORBIDDEN` errors in ABS logs, which means the cluster’s query thread capacity is exhausted.

Default: `8`

::: note
`aerospike-clusters.CLUSTER_NAME.max-parallel-scans` caps the number of concurrent primary-index queries that ABS runs against a cluster. Primary-index queries were historically called scans. The limit is shared across every ABS routine that references the same cluster entry, but does not affect other Aerospike clients connected to the cluster. If multiple routines run at once, keep the sum of their active `parallel` values within `max-parallel-scans`.
:::

#### `records-per-second` (RPS throttle)

Limits the number of records read from Aerospike per second across all readers. Use this to limit ABS’s impact on a production cluster without changing `parallel`.

When to set: Production database latency or resource usage rises during backups, but you still want to keep enough read parallelism to scan partitions efficiently.

When to increase: Backups are taking too long, the Aerospike cluster has headroom, and `pipeline` stays near `0` because writers are waiting for records.

When to decrease: Backups are competing with production traffic, or cluster CPU, disk, or network usage rises beyond the level you want to allow for backup work.

Default: no limit. Omitting the parameter or setting it to `0` disables the RPS throttle.

`records-per-second` limits returned records, not query concurrency. If the cluster is rejecting primary-index queries or running out of query threads, reduce `parallel` or lower `max-parallel-scans` instead.

### Writer bottlenecks

Use these parameters when `pipeline` stays near capacity, which means readers are waiting for writers.

#### `parallel-write` (write parallelism)

Controls the number of threads for serialization, compression, encryption, and uploading. If not set, it defaults to the value of `parallel`.

When to increase: Pipeline is near capacity and you have CPU and memory headroom.

When to decrease: CPU is saturated above 90% or memory usage is high.

Default: the value of `parallel`.

Memory impact: Each writer allocates:

-   A 256-record channel buffer
-   A storage upload buffer sized to `min-part-size`, which defaults to 5 MiB
-   ZSTD encoder state if compression is enabled, sized between about 1 MiB and 8 MiB depending on preset

Reader and writer counts do not need to match. Set `parallel-write` based on writer bottlenecks and available CPU, memory, and storage bandwidth. ABS handles distribution between readers and writers.

#### `compression` (ZSTD)

ABS applies ZSTD compression per writer thread. The `compression.level` parameter accepts values from -1 through 22.

ABS uses a Go implementation of ZSTD that provides four rather than 22 discrete levels of compression. The library maps any numeric level to one of four presets. Values within the same bucket produce identical compression output, speed, and CPU usage. For meaningful configuration changes, use values at preset boundaries: `1`, `3`, `6`, and `10`.

| Level | Preset | Approx. zstd equivalent | Behavior |
| --- | --- | --- | --- |
| \-1, 0–2 | Fastest | zstd 1 | Least CPU, largest files |
| 3–5 | Default | zstd 3 | Balanced speed and compression |
| 6–9 | Better | zstd 7–8 | More CPU, smaller files |
| 10–22 | Best | zstd 11 | Most CPU, smallest files |

If `compression.level` is omitted, the API default is `0`. For balanced compression, set `compression.level` to `3`.

Practical tips:

-   Start with `compression.level=3` to use the Default preset. You can also experiment with compression level `0` to get a baseline for speed and size when testing.
-   Compression helps most on high-latency or low-bandwidth storage paths when records are highly compressible.
-   If writer CPU saturates or `pipeline` trends toward capacity after enabling a more aggressive preset, reduce `compression.level`.
-   Move from level `3` to `6` only after validating with your own baseline runs.

#### `file-limit` (max file size)

Controls the maximum size of individual `.asb` backup files, in MiB. The default is 250 MiB.

When a file reaches this limit, ABS starts writing a new file. Set to `0` for no limit, which produces one file per writer.

::: note
Very low values can create excess file rotation and object-operation overhead. Very high values, including `0`, keep each multipart upload open for the full backup. If a backup aborts, more data is stranded in incomplete multipart uploads until lifecycle rules or manual cleanup remove it.
:::

#### `min-part-size` (multipart upload chunk size)

Controls the minimum size of each part in a multipart upload to object storage. ABS buffers this amount of data in memory per writer before uploading a part.

The default is 5 MiB (5,242,880 bytes).

Minimum values vary by storage backend:

| Backend | Minimum `min-part-size` |
| --- | --- |
| S3 | 5 MiB |
| GCS | 256 KiB |
| Azure Blob | 1 MiB |

Memory impact: This parameter is the primary driver of upload buffer memory. Peak memory from chunk buffers alone is approximately `parallel-write × min-part-size`.

Larger min-part-size values reduce the number of chunk upload requests, which can improve throughput on high-latency connections. However, each writer holds this buffer in memory, so large values with high `parallel-write` risk OOM conditions.

#### `bandwidth` (write speed cap)

Throttles the total encoded backup write speed in MiB/s across all writers. Use this to prevent ABS from saturating a shared network connection.

When to increase: `pipeline` stays near capacity because the configured write cap is lower than the storage path can support, and the ABS host, network, and storage backend all have headroom.

When to decrease: Backup uploads are competing with other traffic, or the storage backend starts throttling concurrent writes.

Default: no limit. Omitting the parameter or setting it to `0` disables the bandwidth cap.

The limit is shared by the writer pipeline. Increasing `parallel-write` can improve serialization, compression, encryption, and upload concurrency, but it cannot raise throughput above the configured `bandwidth` value.

With bandwidth limiting, the writer stage runs at a fixed rate. If readers can produce faster than that cap, the pipeline trends toward capacity; if not, readers remain the bottleneck and `pipeline` can stay low.