Skip to content

Latency at scale

For the complete documentation index see: llms.txt

All documentation pages available in markdown.

You’ve seen that feature retrieval is fast with 100 drivers and 3 features. Now you’ll scale both dimensions by adding more drivers and features, then measure what happens.

Generate a larger dataset

Create 1,000 drivers with 12 features matching the realistic feature set from the previous page. You’ll write these to a separate Aerospike set so your tutorial data stays intact. The sd_* names below are abbreviated bin names (scaled-driver) following the same short-bin convention used earlier.

  1. Run Cell 26 to generate 1,000 scaled driver records with 12 features.
  2. Run Cell 27 to bulk write scaled drivers to Aerospike.

Cell 26: Generate 1,000 scaled driver records with 12 features

import random
from pyspark.sql import Row
random.seed(123)
scaled_drivers = []
for i in range(1, 1001):
driver_id = f"scaled_driver_{i:04d}"
decline_rate = random.uniform(0.01, 0.25)
avg_rating = round(5.0 - (decline_rate * 4) + random.uniform(-0.3, 0.3), 2)
avg_rating = max(3.5, min(5.0, avg_rating))
scaled_drivers.append({
"driver_id": driver_id,
"sd_decl_rate": round(decline_rate, 4),
"sd_avg_rating": avg_rating,
"sd_trips_today": random.randint(0, 20),
"sd_hrs_on_shift": round(random.uniform(0, 12), 1),
"sd_consec_trips": random.randint(0, 10),
"sd_dist_pickup": round(random.uniform(0.5, 20.0), 1),
"sd_rider_rating": round(random.uniform(3.5, 5.0), 1),
"sd_is_peak": random.choice([0, 1]),
"sd_surge_mult": round(random.uniform(1.0, 3.0), 1),
"sd_est_duration": random.randint(5, 60),
"sd_accept_1h": round(random.uniform(0.5, 1.0), 2),
"sd_total_trips": random.randint(3, 5000),
})

Each record has 12 feature bins plus the ID, which is a realistic width for a production entity record.

Use Spark for the bulk write because of its strength in batch ingestion.

Cell 27: Bulk write scaled drivers to Aerospike

import time
scaled_df = spark.createDataFrame([Row(**d) for d in scaled_drivers])
start = time.perf_counter()
scaled_df.write \
.mode("overwrite") \
.format("aerospike") \
.option("aerospike.write-set", "scaled-driver-features") \
.option("aerospike.write-with-key", "driver_id") \
.save()
elapsed = time.perf_counter() - start
print(f"Wrote {len(scaled_drivers)} drivers with 12 features each")
print(f"Bulk write time: {elapsed:.1f} seconds")
Expected output
Wrote 1000 drivers with 12 features each
Bulk write time: 3.2 seconds

This highlights why Spark is used for bulk operations in this tutorial: writing 1,000 records in one job takes only a few seconds.

Benchmark feature retrieval

Now measure key-based Aerospike reads against this larger dataset. You’ll run 100 retrievals from the 1,000-driver dataset. Each retrieval fetches all 12 features for one sampled driver, and you’ll benchmark time per retrieval.

  1. Run Cell 28 to benchmark 100 lookups against scaled data.

Cell 28: Benchmark 100 lookups against scaled data

import time
all_features = [
"sd_decl_rate", "sd_avg_rating", "sd_trips_today",
"sd_hrs_on_shift", "sd_consec_trips", "sd_dist_pickup",
"sd_rider_rating", "sd_is_peak", "sd_surge_mult",
"sd_est_duration", "sd_accept_1h", "sd_total_trips"
]
random.seed(456)
sample_ids = [f"scaled_driver_{i:04d}" for i in random.sample(range(1, 1001), 100)]
timings = []
for driver_id in sample_ids:
key = ('test', 'scaled-driver-features', driver_id)
start = time.perf_counter()
(_, _, bins) = as_client.select(key, all_features)
elapsed = (time.perf_counter() - start) * 1000
timings.append(elapsed)
timings.sort()
print("Feature retrieval benchmark: 100 lookups from 1,000-driver dataset (12 features each)")
print(f" p50: {timings[49]:.2f} ms")
print(f" p95: {timings[94]:.2f} ms")
print(f" p99: {timings[98]:.2f} ms")
Expected output
Feature retrieval benchmark: 100 lookups from 1,000-driver dataset (12 features each)
p50: 0.23 ms
p95: 0.39 ms
p99: 0.48 ms

Compare: tutorial scale vs production scale

Tutorial datasetScaled dataset
Drivers1001,000
Features per driver312
p50 retrieval~0.22 ms~0.23 ms
p95 retrieval~0.38 ms~0.39 ms

The numbers are effectively identical. Ten times the drivers and four times the features did not change retrieval time.

Why this works

Aerospike stores each entity as a single record, identified by primary key. Each lookup here is an O(1) hash read: Aerospike computes the key’s partition, locates the node that owns it, and returns the requested bins. None of these steps depend on how many other records exist in the set or how many bins a record contains. (get_feature_vector() uses this same key-based read path.)

This means:

  • Adding drivers doesn’t slow down lookups. Whether you have 1,000 drivers or 100,000, the lookup path is the same.
  • Adding features doesn’t slow down lookups. More bins per record has negligible impact because the record is read in a single I/O operation.
  • Scoring 50 candidate drivers costs about 12 ms total at p50. Well within a dispatch latency budget of a few hundred milliseconds, with room to spare for model inference and network overhead.

At these latencies, engineers can run diagnostic queries against production data, such as checking a specific driver’s features, spot-testing predictions, or debugging dispatch decisions, without materially affecting serving performance. The database does not distinguish between a serving request and a diagnostic query because both are key lookups.

Feedback

Was this page helpful?

What type of feedback are you giving?

What would you like us to know?

+Capture screenshot

Can we reach out to you?