Dataminr - Customer story

About Dataminr

Dataminr is a global AI company that provides real-time alerts on high-impact events, risks, and threats. The platform processes billions of daily signals from over one million public sources across various media types, including text, images, video, audio, and machine-generated sensor data, covering more than 150 languages. Customers include major newsrooms, two-thirds of the Fortune 100, government agencies, first responders, and leading technology companies. Dataminr uses predictive AI to classify, enrich, and correlate signals, and uses generative AI to create live updating event briefs that summarize unfolding situations for users.

Challenge

Scaling real-time event detection during hyper growth

Dataminr entered a period of hypergrowth that strained its data pipeline. Rising data volume, model complexity, and real-time requirements pushed existing systems beyond their limits, creating latency, cost, and architectural bottlenecks.

Exploding ingest volume and hyperscale growth
Dataminr’s ingest volume increased far faster than its legacy DynamoDB and Redis pipeline could evolve. Sudden spikes in global activity produced sustained traffic surges that exceeded the system’s processing capacity. The previous design struggled to absorb this growth at real-time speeds, causing backpressure in downstream enrichment steps and instability during peak demand.
Bottlenecks in AI enrichment and model orchestration
The enrichment pipeline relies on more than fifty AI models, each requiring rapid reads and writes as they add metadata and pass updated signals forward. The existing data layer struggled to support this high rate of small, sequential operations. As a result, coordination between models slowed down, causing latency spikes and delaying the reduction of raw content, which must be shrunk by more than 95 percent before alert generation.
Hitting the ceiling of in-memory operations
Dataminr reached the limits of its Redis-based in-memory system as data volume and model complexity surged. The platform needed to process massive amounts of messages every second and add millions of labels every second at ultra-low latency to maintain real-time event detection. These in-memory constraints prevented Dataminr from scaling to larger message payloads, richer metadata, and higher throughput during periods of hyper growth.
Dependence on earlier cloud data services
Parts of the enrichment and search workflow still relied on earlier cloud native data services, such as DynamoDB for persistent storage and Redis for caching. This created a two-database pattern that added operational overhead, increased cost, and made it difficult to maintain low latency at scale. These systems were effective in earlier stages of the company but were not designed for the high throughput, metadata-driven access patterns required as Dataminr expanded globally and added more AI and ML workloads.
Global expansion increased cost pressure
Managing real-time workloads across multiple regions and availability zones (AZs) amplified data movement costs and scaling challenges. Dataminr needed a more cost-efficient and predictable foundation as it expanded into new markets, including EU deployments focused on AI and ML search use cases.

Solution

High-throughput real-time pipeline powered by Aerospike

Dataminr adopted Aerospike on AWS as the low latency, high throughput datastore supporting its AI enrichment pipeline, ingest, and generative event brief creation. Aerospike replaced earlier databases, improved model orchestration, and allowed Dataminr to scale globally.

Unified datastore for predictive and generative AI workflows
Aerospike now serves as the central data layer powering predictive AI models and larger generative AI systems. Predictive models provide language detection, geodata, entity extraction, OCR, and logo detection. Larger neural networks and LLMs refine events and produce dynamic briefs. Aerospike supports both stages by supplying reliable sub-millisecond access to enrichment metadata and model outputs.
Modernized pipeline replacing DynamoDB and Redis
Dataminr simplified its architecture by replacing earlier cloud data services, including DynamoDB and Redis, with Aerospike. This removed operational friction, improved consistency, and consolidated real-time data access into a single platform. Aerospike delivers the throughput required to process billions of signals and millions of label updates per second.
High volume ingestion of complex, multiple-format data
As the content acquisition system normalizes, deduplicates, segments, and transcribes global inputs, Aerospike handles the rapid write rate and persistent scale required for downstream analysis. This supports Dataminr’s goal of enriching data while reducing raw volume by more than 95 percent before alert generation.
Support for larger message payloads and richer metadata
Aerospike allows Dataminr to increase the scope and size of messages processed during enrichment. The pipeline can now store larger representations, richer labels, and more metadata per signal, enabling broader detection across new modalities and emerging data patterns.
Cost-efficient scaling for global deployments
Aerospike’s architecture minimizes unnecessary data transfer and supports operational features such as rack awareness and flexible cluster sizing. This improves cost efficiency for multi-region workloads and supports Dataminr’s expansion into the EU with consistent performance.

Results

Real-time global alert detection with higher speed and fidelity

Dataminr strengthened its real-time detection platform and improved alert speed, delivering some alerts more than an hour before major news outlets while expanding its ability to analyze new data types. Aerospike enables faster enrichment, higher throughput, and more advanced predictive and generative AI use cases.

Ultra-low latency pipeline supports faster customer alerts
Aerospike powers the enrichment path that must operate within strict real-time windows. Faster processing helps Dataminr deliver alerts significantly ahead of traditional media, with documented cases of warnings delivered more than an hour before major outlets.
Scales to billions of daily inputs with stability
Dataminr now handles billions of signals per day across text, image, video, audio, and sensor modalities. Aerospike ensures stable, predictable performance even during extreme surges in global activity.
Reduction of raw content by more than 95 percent
Dataminr reduces raw input volume by more than 95 percent before generating alerts. This aggressive reduction pipeline, supported by Aerospike’s high throughput, ensures only the most relevant signals reach downstream predictive and generative AI models.
Millions of labels per second processed reliably
Aerospike’s low-latency writes allow Dataminr to add millions of labels per second, supporting metadata-driven detection, improved filtering, and high-quality model inputs. This capability is essential to preserving Dataminr’s identity as a real-time AI platform.
Higher quality insights and broader detection coverage
With rapid back and forth between models, Dataminr enhances its detection quality. Predictive AI enriches content, while Dataminr’s generative system, ReGenAI, uses large language models to create continuously updating event briefs. Aerospike also enables Dataminr to process larger and more complex messages, expanding the range of data types and events the platform can detect.
Improved operational and cloud cost efficiency
By consolidating earlier databases into Aerospike and reducing multi-AZ data movement, Dataminr improved cost control while scaling globally. The architecture now supports rapid growth without proportional increases in infrastructure cost.

Aerospike fits perfectly because it supports low latency, high request throughput, and scalable operations for a very large amount of data. The database serves as both the sink for our AI model outputs and the source that our orchestration system pulls from in real time.

Tony Du

Senior Director of Engineering, Dataminr

We needed to scale beyond in-memory operations and process massive amounts of messages while adding millions of labels every second at ultra-low latency. Aerospike allows us to expand our business use cases and identify newsworthy events across more types of data.

Guy Dassa

Senior Vice President of Engineering, Dataminr

How Dataminr scales real-time AI and generative analysis with Aerospike