---
title: "Latency at scale"
description: "Generate a production-scale dataset and measure whether Aerospike feature retrieval stays sub-millisecond."
---

# Latency at scale

> For the complete documentation index see: [llms.txt](https://aerospike.com/docs/llms.txt)
> 
> All documentation pages available in markdown.

You’ve seen that feature retrieval is fast with 100 drivers and 3 features. Now you’ll scale both dimensions by adding more drivers and features, then measure what happens.

## Generate a larger dataset

Create 1,000 drivers with 12 features matching the realistic feature set from the previous page. You’ll write these to a separate Aerospike set so your tutorial data stays intact. The `sd_*` names below are abbreviated bin names (`scaled-driver`) following the same short-bin convention used earlier.

1.  Run `Cell 26` to generate 1,000 scaled driver records with 12 features.
2.  Run `Cell 27` to bulk write scaled drivers to Aerospike.

Cell 26: Generate 1,000 scaled driver records with 12 features

```python
import random

from pyspark.sql import Row

random.seed(123)

scaled_drivers = []

for i in range(1, 1001):

    driver_id = f"scaled_driver_{i:04d}"

    decline_rate = random.uniform(0.01, 0.25)

    avg_rating = round(5.0 - (decline_rate * 4) + random.uniform(-0.3, 0.3), 2)

    avg_rating = max(3.5, min(5.0, avg_rating))

    scaled_drivers.append({

        "driver_id": driver_id,

        "sd_decl_rate": round(decline_rate, 4),

        "sd_avg_rating": avg_rating,

        "sd_trips_today": random.randint(0, 20),

        "sd_hrs_on_shift": round(random.uniform(0, 12), 1),

        "sd_consec_trips": random.randint(0, 10),

        "sd_dist_pickup": round(random.uniform(0.5, 20.0), 1),

        "sd_rider_rating": round(random.uniform(3.5, 5.0), 1),

        "sd_is_peak": random.choice([0, 1]),

        "sd_surge_mult": round(random.uniform(1.0, 3.0), 1),

        "sd_est_duration": random.randint(5, 60),

        "sd_accept_1h": round(random.uniform(0.5, 1.0), 2),

        "sd_total_trips": random.randint(3, 5000),

    })
```

Each record has 12 feature bins plus the ID, which is a realistic width for a production entity record.

Use Spark for the bulk write because of its strength in batch ingestion.

Cell 27: Bulk write scaled drivers to Aerospike

```python
import time

scaled_df = spark.createDataFrame([Row(**d) for d in scaled_drivers])

start = time.perf_counter()

scaled_df.write \

    .mode("overwrite") \

    .format("aerospike") \

    .option("aerospike.write-set", "scaled-driver-features") \

    .option("aerospike.write-with-key", "driver_id") \

    .save()

elapsed = time.perf_counter() - start

print(f"Wrote {len(scaled_drivers)} drivers with 12 features each")

print(f"Bulk write time: {elapsed:.1f} seconds")
```

Expected output

```plaintext
Wrote 1000 drivers with 12 features each

Bulk write time: 3.2 seconds
```

This highlights why Spark is used for bulk operations in this tutorial: writing 1,000 records in one job takes only a few seconds.

## Benchmark feature retrieval

Now measure key-based Aerospike reads against this larger dataset. You’ll run 100 retrievals from the 1,000-driver dataset. Each retrieval fetches all 12 features for one sampled driver, and you’ll benchmark time per retrieval.

1.  Run `Cell 28` to benchmark 100 lookups against scaled data.

Cell 28: Benchmark 100 lookups against scaled data

```python
import time

all_features = [

    "sd_decl_rate", "sd_avg_rating", "sd_trips_today",

    "sd_hrs_on_shift", "sd_consec_trips", "sd_dist_pickup",

    "sd_rider_rating", "sd_is_peak", "sd_surge_mult",

    "sd_est_duration", "sd_accept_1h", "sd_total_trips"

]

random.seed(456)

sample_ids = [f"scaled_driver_{i:04d}" for i in random.sample(range(1, 1001), 100)]

timings = []

for driver_id in sample_ids:

    key = ('test', 'scaled-driver-features', driver_id)

    start = time.perf_counter()

    (_, _, bins) = as_client.select(key, all_features)

    elapsed = (time.perf_counter() - start) * 1000

    timings.append(elapsed)

timings.sort()

print("Feature retrieval benchmark: 100 lookups from 1,000-driver dataset (12 features each)")

print(f"  p50: {timings[49]:.2f} ms")

print(f"  p95: {timings[94]:.2f} ms")

print(f"  p99: {timings[98]:.2f} ms")
```

Expected output

```plaintext
Feature retrieval benchmark: 100 lookups from 1,000-driver dataset (12 features each)

  p50: 0.23 ms

  p95: 0.39 ms

  p99: 0.48 ms
```

## Compare: tutorial scale vs production scale

|  | Tutorial dataset | Scaled dataset |
| --- | --- | --- |
| **Drivers** | 100 | 1,000 |
| **Features per driver** | 3 | 12 |
| **p50 retrieval** | ~0.22 ms | ~0.23 ms |
| **p95 retrieval** | ~0.38 ms | ~0.39 ms |

The numbers are effectively identical. Ten times the drivers and four times the features did not change retrieval time.

## Why this works

Aerospike stores each entity as a single record, identified by primary key. Each lookup here is an O(1) hash read: Aerospike computes the key’s partition, locates the node that owns it, and returns the requested bins. None of these steps depend on how many other records exist in the set or how many bins a record contains. (`get_feature_vector()` uses this same key-based read path.)

This means:

-   **Adding drivers doesn’t slow down lookups.** Whether you have 1,000 drivers or 100,000, the lookup path is the same.
-   **Adding features doesn’t slow down lookups.** More bins per record has negligible impact because the record is read in a single I/O operation.
-   **Scoring 50 candidate drivers costs about 12 ms total at p50.** Well within a dispatch latency budget of a few hundred milliseconds, with room to spare for model inference and network overhead.

At these latencies, engineers can run diagnostic queries against production data, such as checking a specific driver’s features, spot-testing predictions, or debugging dispatch decisions, without materially affecting serving performance. The database does not distinguish between a serving request and a diagnostic query because both are key lookups.

::: undefined
-   I have measured feature retrieval latency on a larger dataset.
-   I understand why Aerospike’s architecture makes this possible.
:::

[Previous  
Beyond the tutorial](https://aerospike.com/docs/develop/model-serving/step/3/part/0/beyond-the-tutorial) [Next  
Putting it all together](https://aerospike.com/docs/develop/model-serving/step/4/part/0/putting-it-together)