Engineering AI-powered AdTech for planetary scale at InMobi
Discover how InMobi engineers AI-powered AdTech to be fast, rock-solid, and ready for the next big thing at planetary scale.
There's a lot of pressure in being India's first unicorn. Even more so when you're in the competitive AdTech space and getting ready to go head-to-head with giant household names at planetary scale. InMobi, a global mobile ad and commerce platform, has been handling this pressure since its founding in 2007.
AdTech has changed a lot since 2007 and is now undergoing another major transformation. The rise of artificial intelligence (AI) and machine learning (ML) is changing how consumers engage with content and how platforms process and act on data. InMobi’s data infrastructure, built for speed and scale, was already optimized to respond quickly to ad bid opportunities and deliver seamless user experiences. With Aerospike, they reached impressive targets:
2 to 3 million queries per second sustained on user store clusters
10-plus billion keys, 30-plus terabytes of data, stored across 50-plus nodes around the world
<10 milliseconds at P99 latency
But with AI/ML, every customer interaction becomes a potential signal to improve targeting and personalization. This shift increases the need for real-time data capture, faster writes, and fresher inputs. It also creates new opportunities for platforms that can keep up.
InMobi Principal Engineer Sudheer Bhat spoke at the Aerospike Bangalore Summit about what's next for AdTech and InMobi, and about transitioning from traditional data stores to next-generation feature stores for advanced AI/ML applications operating at planetary scale.
Profile stores to feature stores
There are many AdTech companies that have spent years optimizing their data platforms for real-time decisioning at scale. Aerospike has long played a key role in that stack, supporting not just high-throughput reads but also the heavy write loads needed for user profiles and real-time bidding.
What’s changing now is the nature and intensity of those writes. AI/ML workloads introduce new pressures: every user interaction becomes a signal, writes are more frequent, and models often require versioned features with tight freshness windows. Supporting this shift requires smarter pipelines, rethinking data flow, and ensuring low-latency ingestion without overwhelming the system. For InMobi, that meant reexamining the entire process to support advanced AI features in Glances, a lock-screen mobile commerce platform, and their other products.
Doing less to do more: Finding efficiencies in process and architecture
Every data operation has a cost, and these costs can grow quickly at scale if not managed carefully. Over time, InMobi learned that scaling successfully is not just about doing more. It is about doing the right things in the most efficient way possible. As Bhat put it, "The only way to do something very fast is to do it in an efficient way. Do it right, or don't do it. Do less as much as possible."
That mindset came from hard-won experience. As systems grew more complex, so did the overhead. Duplicated logic, excessive round-trips, and reactive patches made it harder to meet latency targets. InMobi responded by simplifying where it mattered. They replaced ad hoc fixes with more deliberate architecture, used tools like Aerospike to consolidate functionality, and reduced unnecessary operations without sacrificing speed or accuracy.
What has allowed it all to work is understanding the different latency requirements for different operations. Serving data to an exchange needs to happen immediately, or you risk losing transactions; writing data to an ad platform profile store may not need sub-millisecond latencies, even though AI/ML feature stores now demand real-time writes. That understanding pushed InMobi to batch where appropriate. "Reads are batched, writes are batched, our streams actually do a key windowing for certain amounts of time. Batch it so you can improve the write throughputs," Bhat said.
To handle their workload, InMobi worked with Aerospike to implement a key-windowed micro-batching process. Events are buffered by key near the source, and then operations are performed on the batch as a whole rather than on every individual event. This system improves throughput, reduces latency spikes during busy windows, and still keeps data fresh enough for ads and even AI/ML.
At the same time, complex calculations were moved from the application layer to the Aerospike server. The move brought data closer to compute to reduce round-trips, but also allowed for lower-level operations that could execute calculations and transforms with UDFs and expressions much faster than the abstracted functions at higher layers.
The results brought total system latency below 10 milliseconds, with critical user profile lookups returning in under five milliseconds. That kind of speed is not just impressive on average. It is essential for meeting strict SLAs in a system where each request fans out to multiple services. When even one slow response can delay the entire transaction, performance must remain consistent across the board. InMobi relies on Aerospike not just for raw speed but for the ability to maintain low latency at high percentiles, often approaching P99.999. This level of reliability is what enables real-time AI at global scale.
Moving slow to move fast
That philosophy is now helping steer InMobi as it navigates into AI/ML waters. Instead of the typical "move fast, break things" tech approach, Bhat said they're taking their time and really thinking through implementation and application.
For infrastructure, that means fully embracing the Aerospike Kubernetes Operator. In keeping with InMobi's "do less" mantra, the Aerospike Kubernetes Operator
implementation gave InMobi's engineers a lot of their time back by enabling automations across the entire data platform at any size. And in keeping with their cautious approach, InMobi took its time.
"It generally does not fail. And let's say, if it fails one day, what to do, and how we can really quickly come back," Bhat said. "What is our MTTR? We want some way of inducing failures and different types of failures and degradations." Time saved on managing basic cluster functions was instead spent on diving deep into chaos engineering: identifying ways to methodically break things to see where the failure points might be and how best to mitigate and respond to them.
Answering the mean time to detect (MTTD) and mean time to respond (MTTR) questions before things fail provides a stable foundation for building new platforms. Validating those answers at InMobi became a chaos engineering project that was anything but chaotic: a planned failure injection process allowed engineers to put hard numbers to "generally does not fail."
That methodical approach to testing and exploring new approaches holds true across the entire stack and gives InMobi the confidence that when something new enters production, it will be the best possible implementation.
When Linux 5.1 introduced io_uring, an alternative asynchronous I/O API that minimizes system overhead, it showed a lot of promise for improving tail latencies for network-heavy, small-payload access patterns. The InMobi team is benchmarking it heavily against existing paths before diving in to ensure that the final implementation is justified and creates reliable improvements.
Historically, InMobi used Aerospike UDFs to push logic into the database and minimize application layer overhead.
Now, with the release of Aerospike expressions, the team is exploring how to gain even better performance. Bhat and his team begin by writing UDFs, then see if they can rework them into an expression and compare the resulting latencies. Rather than diving in blind, a methodical "control, experiment, results" process is helping to shape a deep understanding of which approach works best in which specific use case.
Working through the hard questions and taking things one step at a time has allowed Bhat and his team to manage the transition and control the chaos. "So this is again, a very ambitious goal for us to have a centralized feature store, which will be a backbone for all of our ML-related stuff," Bhat said.
Change is InMobi's constant
InMobi has stayed competitive by focusing on operational excellence, adapting to change, and prioritizing efficiency at scale. As the company moves deeper into AI and ML, it is working closely with Aerospike to ensure that this transformation builds on the same principles that brought success to its core ad platform and newer products, such as Glances.
Fully transitioning to the Aerospike Kubernetes Operator is helping InMobi simplify operations and automate scaling for its distributed Aerospike clusters. The team is also finding more efficient ways to handle complex compute, starting with UDFs and now exploring Aerospike expressions for faster execution. Controlled testing of io_uring is underway to reduce latency spikes and improve consistency under heavy load. Together, these improvements are strengthening the foundation InMobi needs to support real-time AI workloads at global scale.
The finished, all-in-one feature store will give InMobi and its customers unprecedented access to all of the company's data, unlocking entirely new product pathways and growth opportunities. Building it on the Aerospike data platform fits in perfectly with the conservative course InMobi has charted: Aerospike has the latency, scale, and operations simplicity to "just work." That frees up engineers to innovate and implement more complex solutions, like concurrent feature versioning across models running simultaneously. All while performing the in-depth, methodical feature analysis that has helped InMobi get this far.
Previous evaluations of HBase and Cassandra, by comparison, required constant heavy performance tuning to meet targets. Aerospike just worked out of the box and provided low latency with minimal operations overhead while consistently meeting SLAs. That mix of speed and simplicity led to the decision to standardize on Aerospike.
"This is going to be the full-fledged feature store, and we are seeing how we can do this entire thing," Bhat said. "... This is something that we are excited about and something that we really want to do this year."
Keep reading
How mPokket scaled personalized lending for 20 million borrowers with Aerospike

How a hyperscale fintech company cut infrastructure costs by 75% and read latencies by 88% with Aerospike

How PhonePe runs real-time transactions, AI feature stores, and governance with Aerospike

Inside Flipkart’s journey to 90 million QPS: Scaling with Aerospike and Kubernetes

