How Myntra built a GenAI shopping assistant that remembers, recommends, and responds in milliseconds
Learn how Myntra built Maya, a real-time GenAI shopping assistant using Aerospike for sub-millisecond feature lookups, conversation persistence, and scale.
A shopper browsing the Myntra app pauses on a pair of insulated boots. They type a question to Maya, Myntra's GenAI shopping assistant: "I'm heading to Shimla next week, what should I buy?"
Myntra's GenAI-powered shopping assistant, Maya, responds instantaneously with tailored suggestions for layered jackets, thermal wear, and boots suitable for mountain weather. The shopper can ask follow-up questions, refine their preferences, add items to their cart, and check out. If they return a week later, they can continue right where they left off.
Experiences like this are happening on a massive scale. Myntra, one of India's biggest fashion e-commerce platforms, serves half a million users at peak times and processes over 20,000 orders each minute during sales events.
Powering this experience requires infrastructure that can checkpoint conversations, fetch real-time personalization signals, and deliver responses quickly. This is the story of how Myntra built a real-time GenAI architecture with Aerospike as the backbone.
Why conversational commerce demands a different infrastructure
A typical naive LLM integration is stateless: it sends a prompt, gets a response, and forgets. Conversations with Maya, on the other hand, last through several back-and-forth exchanges, visits, and sometimes even days. As Narayana Pattipati, Principal Architect for Data and ML Platform at Myntra, explained: "We want to persist the session so that when users come back, they don't need to start from the beginning again."
The challenge is that traditional caching layers are fast but volatile; a restart could wipe everything. Meanwhile, traditional databases are reliable but slow, causing latency issues. Each option solves only part of the problem.
What Maya needed was a data store that could keep its state through restarts, failures, and infrastructure changes, while also allowing retrieval in less than a millisecond when the user returns.
But persistence alone isn't enough. Maya also needs fresh data. Myntra processes millions of clickstream events each minute across app and web, and for recommendations to match a shopper's current intent, personalization features must reflect behavior from seconds ago rather than hours-old batch aggregates.
How Aerospike powers Maya and Myntra's real-time ML stack
After evaluating multiple databases, Myntra landed on Aerospike, which could handle both requirements: durable storage and sub-millisecond reads.
"We use Aerospike in multiple places in our entire data and AI stack," Pattipati said. "One is stream processing, as a state store where the lookups have to happen very fast. And then, as a real-time feature store, and also as a cache. And for Maya, we're also using Aerospike to checkpoint the entire conversation."
A unified online feature store for real-time personalization
Before consolidating on Aerospike, Myntra ran separate systems for feature storage, state management, and caching. Each had different latency profiles and failure modes, which meant fetching features for a single inference request required coordinating across multiple systems.
The team unified these functions into a single Aerospike-backed cluster. Spark Streaming pipelines now write behavior-driven features directly into Aerospike using Aerospike Connect for Spark, while inference services fetch features like user embeddings, session context, product vectors, and trend signals on demand.
Single-key lookups complete in under five milliseconds at P99, and batch fetches stay in the single-digit milliseconds range. This leaves enough headroom for model inference within Myntra's 50-millisecond recommendation latency budget.
Stateful stream processing with predictable performance
Processing millions of events per minute requires maintaining a live state, tracking recent user behavior, rolling summaries, and short-lived session context used by downstream models. Aerospike acts as the real-time state layer behind these streaming pipelines, supporting concurrent reads and writes without latency spikes.
The team configures consistency based on the use case. For features requiring correctness guarantees, they use strong consistency mode; for maximum throughput, where eventual consistency is acceptable, they use AP mode. Because Aerospike avoids compactions and garbage-collection pauses in AP mode, P99 latency remains predictable as data volumes grow and campaigns spike traffic.
Prediction caching for complex models
Some of Myntra’s models dynamically optimize homepage layouts by continuously testing and learning which widgets perform best, a machine learning approach known as a multi-armed bandit. These models can take over 200 milliseconds to run, yet serving them synchronously would break the user experience.
To solve this, Aerospike stores predictions in cache. The first request comes from the cache while asynchronous inference calculates the next recommendation in the background. This approach allows Myntra to run complex models in production without slowing down the user experience.
Why these capabilities matter together
A single request to Maya can touch multiple real-time data paths. It restores conversation context so shoppers can pick up where they left off, fetches the latest user, session, and product signals for personalization, checks for precomputed predictions, and returns a response within a tight latency budget.
Using separate systems for each function would slow down coordination and increase operational complexity. By using Aerospike, Myntra achieved a single, reliable data layer that keeps consistent latency even under load.
From Community to Enterprise Edition: How Myntra deployed and scaled its architecture
Myntra had been running Aerospike Community Edition for over three years across multiple clusters. The setup worked well initially, but as the team began onboarding more machine learning workloads, they hit the Community Edition limits.
Before committing to Enterprise Edition, the team ran extensive evaluations. Over four to five months, they tested Aerospike against several other databases for their homepage personalization use case.
As Pattipati explained, "When we started onboarding multiple machine learning use cases on other databases, they hit the scale challenges. However, Aerospike scaled pretty well." The results were decisive. At 125,000 operations per second, Aerospike delivered P99 latency of 0.3 milliseconds with consistent, stable performance throughout the test.
With the evaluation complete, the team migrated to Enterprise Edition, consolidating multiple Community clusters into fewer, larger clusters. The migration took less than a month. It also coincided with an Azure region move, which meant relocating the entire Myntra tech stack with minimal disruption.
Aerospike’s Cross Datacenter Replication (XDR) enabled live migration with zero downtime: new data was replicated automatically to the destination cluster while the team restored historical data in parallel. As Nikhit Nair from Myntra's database team described it: "All that the application needs to do is just switch. A click of a button and you're done."
Operational confidence at festival scale
Myntra's peak sales events, including the Big Fashion Festival (BFF), were the ultimate stress test: half a million concurrent users and over 20,000 orders per minute. To prepare, the team ran aggressive failure simulations that included time synchronization issues, partial data unavailability, and full-region outages. None of these exercises compromised user service-level objectives.
The architecture scaled cost-effectively as well. Hybrid Memory Architecture keeps indexes in RAM while scaling storage on SSDs, avoiding the expense of pure in-memory deployments at Myntra's data volumes. The current footprint spans 25 Aerospike clusters and over 150 nodes, with the ML platform alone running across eight clusters, handling tens of millions of transactions per second across clusters.
What's next for Maya and Myntra's GenAI stack
Maya is the flagship, but not the finish line. Myntra is extending GenAI across the customer journey and exploring new infrastructure capabilities to support it:
GenAI across the customer journey: Beyond Maya, Myntra has deployed GenAI for post-purchase support. Chat summarization and sentiment analysis are already live for customer care agents.
Continued platform evolution: Self-serve ML pipelines with one-day deployment cycles, moving toward a unified data and ML platform architecture.
Outcomes at scale
The consolidation on Aerospike provided Myntra with a foundation that could handle the complexity of GenAI workloads and the needs of India's largest fashion sales events. With this architecture in place, Myntra is well-positioned to lead the next generation of AI-powered retail:
Sub-millisecond feature lookups at millions of requests per minute keep Maya's responses within conversational latency budgets, so the experience feels natural rather than robotic.
Conversation continuity across sessions allows shoppers to return to Maya and pick up where they left off, with context persisting across visits, devices, and days.
A single Aerospike layer now backs the feature store, state store, inference cache, and conversation persistence.
Festival-scale resilience holds up under half a million concurrent users and over 20,000 orders per minute, with stable P99 latency through peak events.
Keep reading

Mar 12, 2026
How Unity replaced Redis to scale its ad platform to 10 million ops

Mar 20, 2024
Navigating the genAI era: Opportunities and challenges

Aug 22, 2025
From prediction to autonomy: AI’s evolution delivers new data demands

Jul 8, 2025
Gen AI 2.0 is here: Why agentic AI runs on real-time data infrastructure
