Latency at scale
For the complete documentation index see: llms.txt
All documentation pages available in markdown.
You’ve seen that feature retrieval is fast with 100 drivers and 3 features. Now you’ll scale both dimensions by adding more drivers and features, then measure what happens.
Generate a larger dataset
Create 1,000 drivers with 12 features matching the realistic feature set from the previous page. You’ll write these to a separate Aerospike set so your tutorial data stays intact.
The sd_* names below are abbreviated bin names (scaled-driver) following the same short-bin convention used earlier.
- Run
Cell 26to generate 1,000 scaled driver records with 12 features. - Run
Cell 27to bulk write scaled drivers to Aerospike.
Cell 26: Generate 1,000 scaled driver records with 12 features
import randomfrom pyspark.sql import Row
random.seed(123)
scaled_drivers = []for i in range(1, 1001): driver_id = f"scaled_driver_{i:04d}"
decline_rate = random.uniform(0.01, 0.25) avg_rating = round(5.0 - (decline_rate * 4) + random.uniform(-0.3, 0.3), 2) avg_rating = max(3.5, min(5.0, avg_rating))
scaled_drivers.append({ "driver_id": driver_id, "sd_decl_rate": round(decline_rate, 4), "sd_avg_rating": avg_rating, "sd_trips_today": random.randint(0, 20), "sd_hrs_on_shift": round(random.uniform(0, 12), 1), "sd_consec_trips": random.randint(0, 10), "sd_dist_pickup": round(random.uniform(0.5, 20.0), 1), "sd_rider_rating": round(random.uniform(3.5, 5.0), 1), "sd_is_peak": random.choice([0, 1]), "sd_surge_mult": round(random.uniform(1.0, 3.0), 1), "sd_est_duration": random.randint(5, 60), "sd_accept_1h": round(random.uniform(0.5, 1.0), 2), "sd_total_trips": random.randint(3, 5000), })Each record has 12 feature bins plus the ID, which is a realistic width for a production entity record.
Use Spark for the bulk write because of its strength in batch ingestion.
Cell 27: Bulk write scaled drivers to Aerospike
import time
scaled_df = spark.createDataFrame([Row(**d) for d in scaled_drivers])
start = time.perf_counter()scaled_df.write \ .mode("overwrite") \ .format("aerospike") \ .option("aerospike.write-set", "scaled-driver-features") \ .option("aerospike.write-with-key", "driver_id") \ .save()elapsed = time.perf_counter() - start
print(f"Wrote {len(scaled_drivers)} drivers with 12 features each")print(f"Bulk write time: {elapsed:.1f} seconds")Wrote 1000 drivers with 12 features eachBulk write time: 3.2 secondsThis highlights why Spark is used for bulk operations in this tutorial: writing 1,000 records in one job takes only a few seconds.
Benchmark feature retrieval
Now measure key-based Aerospike reads against this larger dataset. You’ll run 100 retrievals from the 1,000-driver dataset. Each retrieval fetches all 12 features for one sampled driver, and you’ll benchmark time per retrieval.
- Run
Cell 28to benchmark 100 lookups against scaled data.
Cell 28: Benchmark 100 lookups against scaled data
import time
all_features = [ "sd_decl_rate", "sd_avg_rating", "sd_trips_today", "sd_hrs_on_shift", "sd_consec_trips", "sd_dist_pickup", "sd_rider_rating", "sd_is_peak", "sd_surge_mult", "sd_est_duration", "sd_accept_1h", "sd_total_trips"]
random.seed(456)sample_ids = [f"scaled_driver_{i:04d}" for i in random.sample(range(1, 1001), 100)]
timings = []for driver_id in sample_ids: key = ('test', 'scaled-driver-features', driver_id) start = time.perf_counter() (_, _, bins) = as_client.select(key, all_features) elapsed = (time.perf_counter() - start) * 1000 timings.append(elapsed)
timings.sort()print("Feature retrieval benchmark: 100 lookups from 1,000-driver dataset (12 features each)")print(f" p50: {timings[49]:.2f} ms")print(f" p95: {timings[94]:.2f} ms")print(f" p99: {timings[98]:.2f} ms")Feature retrieval benchmark: 100 lookups from 1,000-driver dataset (12 features each) p50: 0.23 ms p95: 0.39 ms p99: 0.48 msCompare: tutorial scale vs production scale
| Tutorial dataset | Scaled dataset | |
|---|---|---|
| Drivers | 100 | 1,000 |
| Features per driver | 3 | 12 |
| p50 retrieval | ~0.22 ms | ~0.23 ms |
| p95 retrieval | ~0.38 ms | ~0.39 ms |
The numbers are effectively identical. Ten times the drivers and four times the features did not change retrieval time.
Why this works
Aerospike stores each entity as a single record, identified by primary key. Each lookup here is an O(1) hash read: Aerospike computes the key’s partition, locates the node that owns it, and returns the requested bins. None of these steps depend on how many other records exist in the set or how many bins a record contains. (get_feature_vector() uses this same key-based read path.)
This means:
- Adding drivers doesn’t slow down lookups. Whether you have 1,000 drivers or 100,000, the lookup path is the same.
- Adding features doesn’t slow down lookups. More bins per record has negligible impact because the record is read in a single I/O operation.
- Scoring 50 candidate drivers costs about 12 ms total at p50. Well within a dispatch latency budget of a few hundred milliseconds, with room to spare for model inference and network overhead.
At these latencies, engineers can run diagnostic queries against production data, such as checking a specific driver’s features, spot-testing predictions, or debugging dispatch decisions, without materially affecting serving performance. The database does not distinguish between a serving request and a diagnostic query because both are key lookups.