Bigabid | Aerospike

About Bigabid

Bigabid is a performance DSP that uses proprietary deep learning models to make automated media buying decisions in real time across billions of daily bid requests. The platform runs a complex ML architecture that requires feature freshness measured in minutes, not hours. To support real-time bidding and model inference, systems must operate within strict latency constraints, with P99 latency targets in the tens of milliseconds.

Challenge

As Bigabid’s ML models became more sophisticated and traffic scaled, the team identified that its real-time aggregation layer was becoming a bottleneck for model accuracy and operational resilience. The system needed to support fresher features, higher scale, and more variable workloads while maintaining strict latency and cost constraints in a cloud environment.

At the same time, a corporate-managed external data store used in production was scheduled to be discontinued, accelerating the timeline to rebuild the system.

A hard deadline from a forced sunset

A critical external production store was scheduled to shut down, forcing Bigabid to rebuild a core data capability under a fixed timeline. The replacement system had to be production-ready from day one, leaving little margin for solutions that introduced operational uncertainty or unpredictable performance.

Long-horizon feature windows for model accuracy

Bigabid’s models rely on user-level behavioral features aggregated over rolling windows of different time periods to optimize delayed KPIs such as Day-7 and Day-30 ROI. Some of these windows extend up to two years and operate across roughly 10 TB of data, requiring continuous recomputation at scale. This creates a level of complexity that many DSP architectures avoid.

Highly variable traffic and data behavior

The platform’s workload fluctuates significantly throughout the day, with sharp spikes, quiet periods, skewed keys, hotspots, and late-arriving data after upstream delays or outages. Systems optimized only for steady-state conditions risk instability during peaks.

Tight latency and cost constraints

User-facing decision systems required rapid access to aggregated features within strict response budgets. Every extra millisecond on a feature lookup represents a missed auction, making latency a direct driver of revenue. At the same time, the platform had to operate within a defined cost envelope in AWS, ruling out architectures that relied on large in-memory datasets or heavy overprovisioning.

Solution

Bigabid determined early on that the serving layer required a database capable of predictable high-throughput access under changing load conditions. Pure in-memory systems were evaluated but quickly proved incompatible with the cost model required to maintain two years of state at this scale.

Aerospike was selected as the serving foundation. The team designed the aggregation pipeline around it, combining Spark-based pre-aggregation with Aerospike as the real-time state store. This architecture allowed the system to rely on fast key-value reads rather than repeated scans of the data lake.

Aerospike as the foundation for real-time serving

Aerospike provides the real-time serving layer for aggregated features. Unlike fully in-memory systems, its hybrid memory architecture allows large datasets to reside on SSD while maintaining fast key-based access through memory-resident indexes. This enables Bigabid to sustain millions of reads per second with predictable latency without the prohibitive cost of memory-only databases.

Pre-aggregation using fixed time chunks

Bigabid’s data engineering team designed a custom pre-aggregation pipeline that partitions behavioral data into fixed time chunks across multiple granularities. Rolling windows are assembled from these aggregates instead of scanning raw event history, enabling efficient incremental updates without reprocessing full history.

Coordinated orchestration with clear state tracking

A lightweight orchestration layer tracks window state and data locations across the pipeline. Spark jobs run continuously and deterministically using this control layer, allowing the system to remain auditable, repeatable, and resilient to retries, late-arriving data, and partial failures.

Daily full-history snapshot

To eliminate repeated large-scale scans of historical data, Bigabid generates a daily snapshot containing the full two-year aggregation state and stores it in Aerospike using a blue-green swap process. During minute-level updates, the pipeline retrieves most historical state directly from Aerospike and touches the data lake only for incremental edges.

"We don't just use infrastructure - we architect systems purpose-built for ML at scale. Aerospike gave us a real-time data engine that could keep pace with our models' demands at billions of requests per day; our team built the intelligence and orchestration layer on top to turn that into a competitive advantage."

Amit Attias

BigabidCo-Founder, CTO

Results

After introducing the daily history snapshot stored in Aerospike, Bigabid reduced end-to-end processing time from just over eight minutes to under four minutes. This exceeded the system’s five-minute freshness requirement while significantly lowering compute overhead and infrastructure cost.

The new architecture also improved model feature freshness from hours to minutes, contributing to approximately 25 percent improvement in campaign performance for models using the latest feature sets.

Exceeded freshness requirements

The system consistently delivers updated aggregates well within the required freshness window. By retrieving most historical state directly from Aerospike rather than recomputing it from the data lake, the pipeline maintains sub-five-minute end-to-end freshness with headroom.

Lowered infrastructure costs

The architecture reduced the volume of data processed during each aggregation cycle and eliminated large-scale historical scans. This change reduced cloud infrastructure costs by roughly 50 percent while sustaining production throughput requirements.

Improved operational stability

The redesigned pipeline reduced memory pressure, large-scale compute bursts, and heavy scan workloads. As a result, the system is less sensitive to skewed traffic, spikes in demand, and delayed data arrivals. The migration was completed with zero downtime to production ML serving, maintaining bid response SLAs throughout.

Sustained high mixed workloads

In production, Bigabid runs a relatively small Aerospike cluster sustaining millions of operations per second with consistent latency. The system continues to handle mixed read and write workloads reliably under real advertising traffic.

A few weeks ago I was asked to look into what it would take to run this on managed Redis. I sat down with the AWS guys, we ran some back-of-the-napkin calculations, I had a heart attack, and that pretty much settled the discussion.

Ido Nadler

Head of Data Engineering, Bigabid

We need a database that can serve millions of reads per second with low latency, and all of this within a strict cost envelope.

Daniel Miodownik

Data Infrastructure Manager, Bigabid

If we were a little over eight minutes before, we went down to less than four minutes.