Blog

Inside HDFC Bank’s multi-cloud, active-active architecture built for India’s next 100 million digital users

A deep look at how HDFC Bank built a multi-cloud active-active digital banking architecture to support UPI scale, real-time personalization, and always-on mobile experiences across India.

January 28, 2026 | 8 min read
Steve Tuohy website
Steve Tuohy
Director of Product Marketing

In the last decade, India’s digital ecosystem has expanded at a pace few traditional banking architectures were designed to absorb. A major catalyst has been the rise of the unified payments interface (UPI), a system for instant, 24/7 mobile transfers directly between bank accounts.

As UPI adoption accelerated, mobile banking quickly became the default way customers interacted with their bank. Monthly UPI transaction volumes climbed into the tens of billions as mobile-first financial services expanded rapidly beyond major metropolitan centers into smaller cities and regions. Mobile apps became the primary, and often the only way, millions of customers interacted with their bank.

HDFC Bank, one of India's largest private-sector banks, was especially exposed to this shift. Operating at a national scale, the bank supported tens of millions of active customers each month, with roughly 97% of transactions occurring through digital channels. 

At that level of volume, architectural constraints surfaced directly in customer experience. In 2019, just weeks after a new CIO stepped into the role, a major data center outage exposed the limits of HDFC Bank’s existing setup. The bank relied on a traditional primary-site architecture, with a secondary site synchronized through log shipping; it was a model well-suited for recovery but not for multiple locations actively serving live user traffic at the same time.

Competing in an always-on, real-time banking environment would require a different foundation. Rather than buying a packaged solution, HDFC Bank chose to design and build its next-generation digital core in-house, ensuring it could support continuous availability, geographic scale, and future personalization from the ground up.

How legacy banking architectures fall short in a digital-first era

Traditional banks now operate in a space shaped as much by FinTechs (e.g., PhonePe and Google Pay) as by other traditional financial institutions. Customers expect the same instant responses, continuous availability, and intuitive user experiences, whether they’re using a bank or payment app.

Unfortunately, the underlying architectures of many large banks were built for an operating model of yesteryears. Classic primary-site plus disaster-recovery designs prioritized recoverability after outages, but were never designed for continuous, real-time traffic.

Several architectural constraints became increasingly difficult to work around:

Asynchronous data propagation A log shipping approach records changes and replays them after the fact. While effective for backup and recovery, it's not designed to support concurrent active; some degree of data lag is unavoidable.

Recovery time measured in minutes In recovery-first architectures, recovery time objectives (RTOs) are typically measured in minutes or longer. For always-on mobile banking, even brief interruptions can destroy the customer experience, particularly when services are expected to remain continuously available.

Manual and operationally complex failover Failover required coordinated decision-making and controlled cutover steps rather than being inherent to the system. Under peak load, this operational overhead slowed incident response and increased risk.

Latency tied to physical geography Centralized primary-site designs and DNS-based routing were not designed for mobile sessions. Users farther from the active site experienced higher end-to-end latency, resulting in noticeably slower app performance, especially as adoption spread nationwide.

Centralized data models under bursty, high-concurrency access patterns Using a centralized relational core for mobile-style concurrency and real-time personalization introduced concurrency and access-pattern challenges. Fine-grained access patterns, such as tracking per-user state during login or handling bursty write activity from aggregators, created unpredictable load that traditional systems struggled to absorb efficiently.

Webinar: High throughput, real time ACID transactions at scale with Aerospike 8.0

Experience Aerospike 8.0 in action and see how to run real-time, high-throughput ACID transactions at scale. Learn how to ensure data integrity and strong consistency without sacrificing performance. Watch the webinar on demand today and take your applications to the next level.

A fully active-active, multi-cloud mobile banking platform

HDFC needed a fundamentally different platform, not incremental optimization, with scale as the starting point. The mobile banking platform had to serve tens of millions of existing users from day one, without relying on any single active site. Every region needed to be able to handle live traffic, so failures at the data center, availability zone, or even cloud-provider level would be largely invisible to customers.

That availability requirement extended directly into the user experience. Each login could trigger real-time personalization: individualized flows, contextual actions, and dynamic content tailored to the customer in that moment. Delivering those experiences meant the platform needed to support fast, consistent reads and writes at massive scale, all within tight latency bounds. 

Payments introduced a similar set of expectations. From the user’s perspective, there could be only one simple payment action. But behind the scenes, the system needed to dynamically select the appropriate clearing and settlement path without adding latency or operational complexity. Complexity had to be absorbed by the platform, not exposed to the customer.

Security requirements were equally demanding. Rather than relying on repeated one-time passwords, transactions would be protected through device binding, SIM verification, behavioral signals, and runtime application protections that could operate continuously and at scale.

How Aerospike uniquely enables HDFC

To meet these requirements, HDFC Bank turned to Aerospike. The result is a distributed data layer designed to remain fast, available, and manageable even as traffic volumes fluctuate and user activity spikes.

At the core of the new infrastructure is a multi-region design capable of delivering consistently low end-to-end latency. Distributing data close to where users are located means the platform can maintain response times under 20 milliseconds across regions, ensuring fast, predictable performance regardless of geography or load.

Availability isn’t left to automation alone. The system is designed to give the bank control over when and how failover occurs, an important requirement in regulated banking environments where automatic failover can introduce its own risks. Aerospike’s flexible failover configuration makes it possible to balance automation with human oversight, preserving both resilience and control.

That approach to availability extends to a multi-cloud deployment strategy. Aerospike nodes are distributed across AWS, Google Cloud, and on-prem data centers, spanning multiple regions and availability zones. This creates multiple layers of protection. For example, localized failures can be absorbed within the same environment, while larger outages (e.g., the loss of an entire region or cloud provider) can be handled by shifting traffic elsewhere while services remain online.

Because Aerospike serves as a single, high-performance data layer, the architecture also avoids the need for separate caching systems, which often compensate for database latency. Real-time reads and writes are handled directly by Aerospike, delivering near in-memory performance without adding architectural complexity. The result is a simpler system that maintains tight service-level objectives while remaining easier to operate and reason about at scale.

How the architecture was deployed and proven at scale

Designing a new architecture was only the first step. Deploying it within a live, nationally scaled banking platform required extreme care, as downtime was not an option and existing systems were already supporting millions of active users.

To migrate without disruption, HDFC Bank moved 20 to 25 million user records from its relational databases to Aerospike using a one-time change data capture (CDC) pipeline. Both systems ran in parallel during the migration, with updates continuously synchronized from the legacy RDBMS to Aerospike.

That constraint also shaped how workloads were introduced. Rather than switching over all functionality at once, the team transitioned critical paths in stages, moving personalization logic and app-startup flows gradually to Aerospike and validating behavior at each step before expanding usage further.

Once in production, the team tested the architecture by running aggressive failure simulations (e.g., clock skew, partition loss, and full regional outages) to observe system behavior under stress. The platform’s stability gave the team confidence to run repeated failure tests, including rebalancing, traffic shifts, and recovery operations, without breaching SLAs. 

By the time the architecture was fully deployed, it had been proven through sustained production use and deliberate failure testing at scale.

From mobile banking to broader transformation

HDFC’s new banking architecture extends well beyond a single application. By delivering active-active operation, predictable performance, and controlled resiliency in production, the bank established a model that can be applied to additional real-time workloads across the organization.

That model enables deeper real-time personalization driven by live behavioral signals, as more customer interactions shift toward in-session decisioning. Experiences increasingly depend on fresh, consistent, and available data, wherever the user is; requirements that now apply to a growing set of customer-facing systems beyond mobile.

The same architecture also supports integrating machine learning directly into live user flows. With Aerospike acting as a real-time data layer and feature store, models can inform decisions at the moment of interaction without adding latency or operational complexity.

Importantly, this platform was built in-house rather than delivered as a packaged solution. Operating more like an internal startup, HDFC Bank’s engineering teams were able to move quickly, test aggressively, and evolve the architecture alongside real production demands, which is a huge advantage as digital banking continues to accelerate.

Outcomes at scale

HDFC’s results speak for themselves:

  • 125,000-plus concurrent end-user sessions sustained in load tests with chaos scenarios

  • Sub-20-millisecond latency across regions for faster logins and consistent performance across geographies

  • 4 active clusters across AWS, GCP, and on-prem

  • 20 to 25 million monthly active users supported with real-time reads, making per-user, first-screen personalization possible at a national scale

  • Continuous-service migration of 20 to 25 million user records from Oracle

HDFC’s architecture shows that large, complex banks can lead in digital infrastructure, not just follow. This is a blueprint for meeting fintech-grade expectations with banking-grade guarantees.

Try Aerospike Cloud

Break through barriers with the lightning-fast, scalable, yet affordable Aerospike distributed NoSQL database. With this fully managed DBaaS, you can go from start to scale in minutes.