How agentic reasoning is rewriting the rules of data infrastructure
Agentic AI compounds latency across every reasoning step. Learn why caching fails, what tail latency math means for your stack, and what production-grade agents actually demand.
Agentic reasoning is different from traditional artificial intelligence (AI) because it involves iterative, multi-step reasoning loops that plan, act, observe, and reflect. This creates latency and performance demands that most enterprise data systems were never intended to handle.
Unlike traditional LLM calls that generate a response in one pass, agentic AI systems make dozens to hundreds of sequential decisions, each potentially fanning out to multiple databases, APIs, and tools. The result is a potential compounding latency problem that breaks caching strategies, overwhelms conventional databases, and exposes weaknesses in an organization's data infrastructure. With Gartner projecting that 40% of enterprise applications will integrate task-specific AI agents by the end of 20261, up from less than 5% in 2025, you need to understand how agentic reasoning works and what it needs from data systems.
The history of iterative reasoning loops
Standard LLM inference is one pass: input goes in, tokens come out. It maps roughly to "System 1" thinking that’s fast, intuitive, and pattern-based, as described in Daniel Kahneman’s book "Thinking Fast and Slow." Agentic reasoning operates more like "System 2" in that it’s slow, deliberate, and self-correcting. The LLM is called multiple times in a structured loop: it reasons about the task, decides on an action, observes the result, and reasons again.
The foundational framework for this is ReAct (Reasoning + Acting), published by Shunyu Yao and colleagues at Princeton and Google Research in October 2022. ReAct interleaves reasoning traces ("Thoughts") with task-specific actions2 in a Thought → Action → Observation cycle that repeats until the task is complete. 3On benchmarks like HotPotQA, ReAct overcame hallucination by grounding reasoning in external knowledge retrieval, 4 outperforming pure chain-of-thought by verifying facts against real data sources at each step.
ReAct is not the only pattern. Chain-of-thought prompting, introduced by Jason Wei and colleagues at Google Research in January 2022, demonstrated that asking models to show their reasoning steps dramatically improves performance on math and logic tasks5 PaLM-540B, with just eight CoT examples, surpassed fine-tuned GPT-3 on the GSM8K math benchmark. 6Tree of Thoughts, also from Yao's group in May 2023, extended this to a tree-structured exploration of multiple reasoning paths 7with evaluation and backtracking, improving GPT-4's success rate on the Game of 24 from 4% to 74%. 8Princeton University 9and Reflexion, published by Noah Shinn and colleagues in March 2023, added self-critique: the agent evaluates its own failures, stores reflections in episodic memory, and uses them to improve on subsequent attempts.10
What unites these approaches is the core agentic loop: plan, execute, observe, reflect, repeat. Andrew Ng formalized this in March 2024 as four agentic design patterns, Reflection, Tool Use, Planning, and Multi-Agent Collaboration11, which have become the standard vocabulary for the field. The practical consequence is that a single user request no longer maps to a single inference call. A complex agentic task can trigger 30 to 100+ model completions in a single trajectory, according to Anthropic's evaluations on TAU-bench airline customer service tasks. Scale AI's ToolComp benchmark found that 85% of compositional prompts require at least three tool calls, and roughly 20% require seven or more.12
Meanwhile, reasoning models like OpenAI's o1 and o3, DeepSeek-R1, and Anthropic's Claude with extended thinking add another dimension. These models invest additional computation at inference time, generating internal chains of thought before producing a final answer. 13 OpenAI's o1 scored 83.3% on AIME 202414 versus GPT-4o's roughly 13.4%. DeepSeek-R1, trained purely through reinforcement learning, matched on multiple benchmarks while being open-source. 15Claude's extended thinking mode improves performance logarithmically with the number of thinking tokens allocated. 16This "test-time compute" scaling means that reasoning quality improves with more inference-time computation,17 but at a direct cost in latency and resource consumption per request.
Enterprise adoption is real but fragile
The enterprise world is moving fast on agentic AI, though the gap between experimentation and production remains wide. McKinsey's 2025 State of AI survey, covering 1,993 respondents across 105 countries, found that 23% of organizations are scaling agentic AI 18 in at least one function, with an additional 39% experimenting. However, nearly two-thirds remain in pilot mode. Only about 6% of respondents qualify as AI high performers, defined as those attributing more than 5% of EBIT to AI.Gartner's data tells a similar story. A May 2025 poll of 147 CIOs found that just 24% had deployed fewer than a dozen AI agents 19 and only 4% had deployed more than a dozen; half were still researching or experimenting. Gartner projects that agentic AI could drive roughly 30% of enterprise application software revenue by 2035,20 surpassing $450 billion, up from 2% in 2025. But the firm also warns that more than 40% of agentic AI projects will be canceled by the end of 2027,21 due to escalating costs, unclear business value, or inadequate risk controls. By October 2025, Gartner noted that agentic AI supply already exceeded demand, with only about 130 of thousands of vendors offering genuine agentic capabilities,22 the rest constituting "agent washing."
Deloitte's 2026 State of AI in Enterprise report, surveying 3,235 leaders across 24 countries, found that only 25% of organizations have moved 40% or more of their AI pilots into production.23 The biggest barriers are not model capabilities but infrastructure readiness: 48% cite data searchability and 47% cite data reusability24 as obstacles, while roughly 60% point to legacy system integration and risk/compliance concerns. PwC's May 2025 survey of 308 US executives found 79% of organizations already adopting AI agents and 88% planning budget increases,25 but only 17% had achieved "full adoption" across agentic workflows.
Industry adoption varies.
JPMorgan Chase's Coach AI tool helps advisors respond 95% faster26 during market volatility.
In healthcare, organizations report $3.20 return for every $1 invested within 14 months,27 with one health system finding that AI clinical assistants reduced documentation time by 42%.28
Gartner predicts agentic AI will autonomously resolve 80% of common customer service issues by 2029,29 reducing operational costs by 30%.
Why every reasoning step affects latency
Here is where the infrastructure story gets urgent. Traditional AI inference is a request-response pattern: one call, one response, one latency measurement. But agentic AI introduces iterative, multi-step reasoning where each step may require data retrieval, tool execution, or external API calls. The latency of an agentic interaction is not the latency of one call, but the compounded latency of every call in the chain.
Jeff Dean and Luiz André Barroso's seminal 2013 paper "The Tail at Scale" established the mathematics of this problem for distributed systems,30and those mathematics apply with devastating precision to agentic workloads. 31 Their core finding: if each individual server has a 99th-percentile latency of one second (with a typical response of 10ms), fanning out to 100 such servers means 63% of user requests will take more than one second.32 Real measurements from a Google production service showed that p99 latency amplified from 10ms for a single leaf to 140ms when waiting for all leaves, a 14x amplification. Waiting for just the slowest 5% of responses accounted for half of that total.33
Paul Cavallaro's mathematical formalization makes the implications concrete: to achieve p90 parent latency with fan-out to 10 children, you must care about the p99 latency of each child.34 Fan out to 100 children, and you need p99.99 child latency to maintain p99 at the parent level.
Agentic AI systems face a compounding effect worse than either fan-out or sequential chains alone. They combine sequential reasoning steps, which adds latency at every step, with parallel fan-out within each step, where latency is dominated by the slowest component. Research found that in deep service chains where five or more services are chained sequentially, 10–20ms latency increases per hop accumulate into latency increases of 100ms or more.35
Industry latency benchmarks for agentic systems reflect this reality. Simple agent queries target p50 under 500ms and p95 under 1 second. Complex workflows aim for p50 under 2 seconds and p95 under 4 seconds. But multi-agent orchestration pushes targets to p50 under 3 seconds and p95 under 6 seconds.36 Voice AI agents show a mean time of 934ms round-trip latency,37 ranging from 417ms to over 3 seconds. well above the 300ms human turn-taking benchmark.38
Caching doesn’t help with agentic access patterns
The first response when facing latency problems is typically to add caching, but for agentic workloads, this doesn’t usually work. While using cache means that hits are fast, it still means that misses are slow, and in fan-out architectures, just one cache miss among many parallel calls dominates the overall response time.
Agentic workloads make this worse because they generate structurally low cache hit rates,39 according to research. Traditional web applications create predictable, repetitive access patterns where hot data results in efficient caches. But because agent reasoning paths are less predictable, each agent call follows a different reasoning chain depending on context, which generates unique data access patterns with little reuse potential across different agent sessions. Semantic caching, or caching based on query similarity rather than exact match, is a little better but has its own problems. Testing found that semantic cache misses more than doubled latency,40 and that model updates change embeddings, breaking matches. Vector drift causes misses even for similar queries, and users phrase things differently enough to defeat similarity thresholds.
For an agentic system making five parallel data calls per reasoning step, even a 99% cache hit rate means a 5% chance that at least one call misses on any given step. Over 10 sequential reasoning steps, the probability of encountering at least one cache miss climbs above 40%. In a fan-out architecture, one slow response determines the latency of the entire step.
For an agentic system making five parallel data calls per reasoning step, even a 99% cache hit rate means a 5% chance that at least one call misses on any given step. Over 10 sequential reasoning steps, the probability of encountering at least one cache miss climbs above 40%. In a fan-out architecture, one slow response determines the latency of the entire step.
What agentic workloads need from databases
Agentic systems create different database access patterns. Instead of human-paced, predictable bursts with idle periods, agents generate continuous, 24/7 query traffic with no throttling, no batching, and no allowance for maintenance windows. The data itself is different, too. Agent context consists of small, fast, session-specific data objects, which some call "micro-datasets,” that are constantly created, queried, updated, and retired within seconds. Scale this to millions of concurrent agents, and the result is a workload profile that looks nothing like traditional OLTP or OLAP patterns.
This creates five requirements for the data layer supporting agentic systems:
Bounded tail latency, not average latency. Averages don’t work for fan-out architectures. A system with p50 of 5ms and p99 of 500ms is unusable for agentic workloads; one with p50 of 5ms and p99 of 15ms may work. What matters is the gap between median and tail; the ratio between p99 and p50 must be small and stable under load.
Cache-independent performance. When an agentic system makes a data request that has never been seen before, which is the common case due to non-deterministic reasoning paths, the response must still be fast. The underlying storage engine must deliver acceptable latency without relying on a caching tier for performance.
Predictable behavior under volatile access patterns. Garbage collection pauses, compaction storms, lock contention, and performance degradation during access pattern shifts are all sources of tail latency spikes that propagate through agentic reasoning chains. The data layer must avoid hidden background operations that create periodic, unpredictable latency spikes.
Support for both sequential and parallel access. Agentic workloads combine low-latency point lookups (individual context retrieval), efficient batch operations (parallel fan-out within a reasoning step), and consistent write performance (persisting agent state and guardrail validations).
Predictable responses even with concurrent users. When multiple agents gain access to shared data simultaneously, one agent's burst must not affect another's latency. Workloads need to be isolated at the infrastructure level, not just at the application level.
The infrastructure must be designed for the workload from the beginning.
The effect of multi-agent orchestration
As the industry moves from single-agent to multi-agent systems, where multiple specialized agents collaborate on complex tasks, this gets worse. Gartner reported a 1,445% surge in multi-agent system inquiries from Q1 2024 to Q2 2025.41
Multi-agent architectures typically follow one of several patterns:
The supervisor pattern uses a planning agent that breaks tasks into multiple pieces and delegates them to specialized sub-agents.
The concurrent fan-out pattern sends the same input to multiple agents for parallel analysis from different perspectives.
The sequential pipeline chains agents in order, each processing the previous agent's output.
Each pattern creates different data access profiles, but all share one trait: they multiply the number of data calls per user request.
A platform with 100,000 users, each running 10 tasks via agents, each testing 10 branches, generates 10 million concurrent database interactions. Multi-agent fan-out creates parallel database access where multiple specialized agents query simultaneously. The orchestration layer itself, which includes managing agent registries, routing, state stores, and supervision, adds its own data requirements and latency overhead.
Managing context between agents is also hard. The database must remain consistent while multiple agents are using it simultaneously. Eventual consistency isn’t good enough when multiple agents depend on a shared database for autonomous decision-making.
Full context forwarding between agents is simple but expensive, with costs growing as handoff chains lengthen.
Structured context objects reduce token overhead from 5,000–20,000 tokens to 200–500 per handoff, but require careful schema design.
The model context protocol (MCP), created by Anthropic in late 2024 and donated to the Linux Foundation in December 2025, has emerged as the standard for agent-to-tool connectivity, along with Google's Agent-to-Agent (A2A) protocol, which handles inter-agent communication with 150+ supporting organizations. Forrester predicts that 30% of enterprise application vendors will launch their own MCP servers42 in 2026. These protocols standardize communication but do not solve the underlying performance challenge: They simply make it easier for more agents to generate more data calls.
Governance and oversight are required
Agentic AI introduces risks that are different from traditional AI systems. McKinsey's 2025 analysis found that 80% of organizations have already encountered risky behaviors from AI agents, including improper data exposure and unauthorized system access, and identified five novel risk drivers unique to agentic systems:
Chained vulnerabilities, where a flaw in one agent cascades across the system
Cross-agent task escalation
Synthetic-identity risk
Untraceable data leakage
Data corruption propagation
The problem is that a minor error in an early reasoning step propagates through the entire multistage task43 window, biasing all subsequent planning toward irreversible failure. In the Open Worldwide Application Security Project’s Top 10 for Agentic Applications, ASI08 calls cascading failures more dangerous than in traditional distributed system failures because agent-to-agent communications occur in natural language, agents operate in multi-turn loops, and broad permissions amplify localized errors.
Moreover, only 21% of organizations have mature agent governance models44, according to Deloitte.NIST launched its AI Agent Standards Initiative45 in February 2026, which is the first US government program dedicated to interoperability and security standards for agentic AI, with sector-specific listening sessions in healthcare, finance, and education.
The industry is converging on three oversight models:
Human-in-the-loop requires approval before agent actions take effect, which is appropriate for high-risk situations.
Human-on-the-loop provides supervision after completion, with humans reviewing outcomes and flagging exceptions.
Human-out-of-the-loop grants full autonomy only for low-risk, well-understood tasks.
But incorporating human oversight on agentic AI requires comprehensive audit trails capturing not just actions but prompts, decisions, internal state changes, intermediate reasoning, and outputs, which creates a lot of work for the system.
Aerospike and agentic reasoning
Agentic reasoning changes single-call interactions into complex, multi-step workflows that perform differently. Each LLM reasoning step is an opportunity for tail latency to accumulate. Each fan-out multiplies the probability of a slow response. And the unpredictable nature of agent reasoning doesn’t take advantage of the caching strategies that traditional systems rely on.
Enterprises that successfully deploy agentic AI at scale will be those that recognize this is a data infrastructure problem. They need an underlying data layer that provides bounded, predictable, cache-independent latency at the tail consistently, under the volatile and unpredictable access patterns that autonomous agents create. Aerospike is built for this operating reality. Its patented Hybrid Memory Architecture delivers sub-millisecond P99 latency without relying on a warm cache, its performance remains stable as access patterns shift, and it scales horizontally without the tail latency amplification that breaks agentic reasoning chains. Organizations such as Adobe, AppsFlyer, and PhonePe already depend on Aerospike for predictable performance under volatile, high-concurrency workloads that agentic systems generate.
The question facing every infrastructure team is no longer whether agentic AI is coming, but whether their data systems are ready for what it needs.
Frequently asked questions about agentic reasoning
Find answers to common questions below to help you learn more and get the most out of Aerospike.
Traditional AI models process input and generate output in a single pass.46 Agentic AI systems operate through iterative reasoning loops, planning actions, executing them via tools and APIs, observing results, and reflecting before continuing.47 The key distinctions are autonomy (pursuing goals independently), tool use (interacting with external systems), planning (decomposing complex tasks), and reflection (self-correcting based on feedback).48 A standard LLM call might take one inference step; an agentic task routinely involves 3 to 100+ steps.
The canonical loop proceeds as follows: First, the agent receives a goal and reasons about how to decompose it into subtasks. Second, it selects and invokes a tool, a database query, API call, web search, or code execution. Third, it observes the result. Fourth, it evaluates whether the result is sufficient, whether errors occurred, and whether the plan needs adjustment.49 This cycle repeats, with each iteration adding to the agent's context, until the task is complete or a step limit is reached.50 Frameworks like ReAct formalize this as Thought → Action → Observation sequences.51
Beyond GPU compute for model inference, agentic systems demand data infrastructure with bounded tail latency (not just low average latency), cache-independent performance (since agentic access patterns defeat traditional caching), deterministic behavior under volatile workloads, and support for both high-throughput parallel reads and consistent sequential access. 52The data layer must perform reliably regardless of whether the query has been seen before, because non-deterministic reasoning paths mean most queries are novel.
Technically yes, but responsibly, no, at least not for consequential decisions. Simulation testing shows agents fail multi-step tasks roughly 70% of the time without structured oversight.53 The industry consensus, reflected in NIST guidelines, OpenAI's governance framework, and the EU AI Act, is that human oversight should be proportionate to risk. Low-risk, well-understood tasks can operate with human-on-the-loop monitoring; high-risk or irreversible actions require human-in-the-loop approval.54
The primary risks include error compounding across multi-step chains (where a single hallucination propagates through all subsequent reasoning), cascading failures in multi-agent systems,55 prompt injection attacks, unauthorized data access,56 and the irreversibility problem, agents taking actions that cannot be undone.57 CrowdStrike and Mandiant data show that 1 in 8 enterprise security breaches now involves an agentic system. Agent-involved breach incidents grew 340% year-over-year between 2024 and 2025.58
Multiple specialized agents collaborate via orchestration patterns — supervisor agents delegate to sub-agents, agents process in parallel then merge results, or agents debate in shared contexts.59 Each pattern multiplies data access requirements. Communication between agents uses protocols like MCP (for tool access)60 and A2A (for agent-to-agent messaging).61 The key challenge is maintaining coherent shared state under concurrent access while keeping latency bounded across the entire multi-agent interaction.
Footnotes
Gartner, "Gartner Predicts 40% of Enterprise Apps Will Feature Task-Specific AI Agents by 2026, Up From Less Than 5% in 2025," Gartner Newsroom press release, 26 August 2025, https://www.gartner.com/en/newsroom/press-releases/2025-08-26-gartner-predicts-40-percent-of-enterprise-apps-will-feature-task-specific-ai-agents-by-2026-up-from-less-than-5-percent-in-2025
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao, "ReAct: Synergizing Reasoning and Acting in Language Models," arXiv preprint (ICLR 2023 camera-ready), 2022, https://arxiv.org/abs/2210.03629
Wollen Labs, "Navigating Modern LLM Agent Architectures: Multi-Agents, Plan-and-Execute, ReWOO, Tree of Thoughts and ReAct," Wollen Labs blog, https://www.wollenlabs.com/blog-posts/navigating-modern-llm-agent-architectures-multi-agents-plan-and-execute-rewoo-tree-of-thoughts-and-react
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao, "ReAct: Synergizing Reasoning and Acting in Language Models," arXiv preprint (PDF), 2022, https://arxiv.org/pdf/2210.03629
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou, "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models," arXiv preprint, 2022, https://arxiv.org/abs/2201.11903
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, and Karthik Narasimhan, "Tree of Thoughts: Deliberate Problem Solving with Large Language Models," OpenReview (NeurIPS 2023), 2023, https://openreview.net/forum?id=_VjQlMeSB_J
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, and Karthik Narasimhan, "Tree of Thoughts: Deliberate Problem Solving with Large Language Models," Semantic Scholar, 2023, https://www.semanticscholar.org/paper/Tree-of-Thoughts:-Deliberate-Problem-Solving-with-Yao-Yu/2f3822eb380b5e753a6d579f31dfc3ec4c4a0820
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, and Karthik Narasimhan, "Tree of Thoughts: Deliberate Problem Solving with Large Language Models," Princeton University Research Publications, 2023, https://collaborate.princeton.edu/en/publications/tree-of-thoughts-deliberate-problem-solving-with-large-language-m-2/
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, and Karthik Narasimhan, "Tree of Thoughts: Deliberate Problem Solving with Large Language Models," arXiv preprint, 2023, https://arxiv.org/abs/2305.10601
Noah Shinn, Federico Cassano, Edward Berman, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao, "Reflexion: Language Agents with Verbal Reinforcement Learning," arXiv preprint, 2023, https://arxiv.org/abs/2303.11366
ML Notes, "4 Agentic Design Patterns and 4 Key Skills," ML Notes Substack, https://mlnotes.substack.com/p/4-agentic-design-patterns-and-4-key
Scale AI, "Tool Use & Agentic Capabilities Enterprise Leaderboard," Scale Labs, https://labs.scale.com/leaderboard/tool_use_enterprise
Ksenia Se, "Test-Time Compute: A Deep Dive," Hugging Face blog, https://huggingface.co/blog/Kseniase/testtimecompute
OpenAI, "Learning to Reason with LLMs," OpenAI Research, 2024, https://openai.com/index/learning-to-reason-with-llms/
Epoch AI, "What Went Into Training DeepSeek-R1," Gradient Updates, Epoch AI, https://epoch.ai/gradient-updates/what-went-into-training-deepseek-r1
Anthropic, "Visible Extended Thinking," Anthropic News, https://www.anthropic.com/news/visible-extended-thinking
OpenAI, "Learning to Reason with LLMs," OpenAI Research, 2024, https://openai.com/index/learning-to-reason-with-llms/
McKinsey & Company, "The State of AI," QuantumBlack by McKinsey, https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai
Gartner, "Gartner Predicts That Guardian Agents Will Capture 10–15% of the Agentic AI Market by 2030," Gartner Newsroom press release, 11 June 2025, https://www.gartner.com/en/newsroom/press-releases/2025-06-11-gartner-predicts-that-guardian-agents-will-capture-10-15-percent-of-the-agentic-ai-market-by-2030
Gartner, "Gartner Predicts 40% of Enterprise Apps Will Feature Task-Specific AI Agents by 2026, Up From Less Than 5% in 2025," Gartner Newsroom press release, 26 August 2025, https://www.gartner.com/en/newsroom/press-releases/2025-08-26-gartner-predicts-40-percent-of-enterprise-apps-will-feature-task-specific-ai-agents-by-2026-up-from-less-than-5-percent-in-2025
Gartner, "Gartner Predicts Over 40% of Agentic AI Projects Will Be Canceled by End of 2027," Gartner Newsroom press release, 25 June 2025, https://www.gartner.com/en/newsroom/press-releases/2025-06-25-gartner-predicts-over-40-percent-of-agentic-ai-projects-will-be-canceled-by-end-of-2027
Gartner, "Gartner Says Agentic AI Supply Exceeds Demand, Market Correction Looms," Gartner Newsroom press release, 7 October 2025, https://www.gartner.com/en/newsroom/press-releases/2025-10-07-gartner-says-agentic-ai-supply-exceeds-demand-market-correction-looms
Deloitte, "State of AI Report 2026," Deloitte Press Room, https://www.deloitte.com/us/en/about/press-room/state-of-ai-report-2026.html
Deloitte Insights, "Agentic AI Strategy," Tech Trends 2026, https://www.deloitte.com/us/en/insights/topics/technology-management/tech-trends/2026/agentic-ai-strategy.html
PwC, "AI Agent Survey," PwC Tech Effect, https://www.pwc.com/us/en/tech-effect/ai-analytics/ai-agent-survey.html
Reuters, "JPMorgan Says AI Helped Boost Sales, Add Clients in Market Turmoil," Reuters Business, 5 May 2025, https://www.reuters.com/business/finance/jpmorgan-says-ai-helped-boost-sales-add-clients-market-turmoil-2025-05-05/
Vellum AI, "AI Agent Use Cases: A Guide to Unlock AI ROI," Vellum blog, https://www.vellum.ai/blog/ai-agent-use-cases-guide-to-unlock-ai-roi
Becker's Hospital Review, "Healthcare Enters the AI Agent Era," Becker's Healthcare Information Technology, https://www.beckershospitalreview.com/healthcare-information-technology/ai/healthcare-enters-ai-agent-era/
Gartner, "Gartner Predicts Agentic AI Will Autonomously Resolve 80% of Common Customer Service Issues Without Human Intervention by 2029," Gartner Newsroom press release, 5 March 2025, https://www.gartner.com/en/newsroom/press-releases/2025-03-05-gartner-predicts-agentic-ai-will-autonomously-resolve-80-percent-of-common-customer-service-issues-without-human-intervention-by-20290
Jeffrey Dean and Luiz André Barroso, "The Tail at Scale," Communications of the ACM, Vol. 56, No. 2, pp. 74–80, 2013, https://research.google/pubs/the-tail-at-scale/
Jeffrey Dean and Luiz André Barroso, "The Tail at Scale," Communications of the ACM, Vol. 56, No. 2, pp. 74–80, 2013, https://cseweb.ucsd.edu/classes/sp18/cse291-c/post/schedule/p74-dean.pdf
Aayan Anand, "The Tail at Scale: Concepts, Techniques and Impact," Medium, https://aayanand.medium.com/the-tail-at-scale-concepts-techniques-and-impact-106b69b5c770
Jeffrey Dean and Luiz André Barroso, "The Tail at Scale," Communications of the ACM, Vol. 56, No. 2, pp. 74–80, 2013, https://cseweb.ucsd.edu/classes/sp18/cse291-c/post/schedule/p74-dean.pdf
Paul Cavallaro, "Fanouts and Percentiles," Paul Cavallaro's blog, https://paulcavallaro.com/blog/fanouts-and-percentiles/
Journal of Information Systems Engineering and Management, article 13473, JISEM, https://jisem-journal.com/index.php/journal/article/download/13473/6334/22797
Aviso, "How to Evaluate AI Agents: Latency, Cost, Safety, ROI," Aviso blog, https://www.aviso.com/blog/how-to-evaluate-ai-agents-latency-cost-safety-roi
Vignesh Ethiraj, Ashwath David, Sidhanth Menon, and Divya Vijay, "Toward Low-Latency End-to-End Voice Agents for Telecommunications Using Streaming ASR, Quantized LLMs, and Real-Time TTS," arXiv preprint, 2025, https://arxiv.org/html/2508.04721v1
Antje S. Meyer, "Timing in Conversation," Journal of Cognition, Vol. 6, No. 1, article 20, 2023, https://pmc.ncbi.nlm.nih.gov/articles/PMC10077995/
Songze Liu, et al., "Adaptive Cache Pollution Control for Large Language Model Inference Workloads Using Temporal CNN-Based Prediction and Priority-Aware Replacement," arXiv preprint, 2025, https://arxiv.org/pdf/2512.14151
Catchpoint, "Semantic Caching: What We Measured and Why It Matters," Catchpoint blog, https://www.catchpoint.com/blog/semantic-caching-what-we-measured-why-it-matters
Gartner, "Multiagent Systems," Gartner articles, https://www.gartner.com/en/articles/multiagent-systems
Forrester, "Predictions 2026: AI Agents, Changing Business Models, and Workplace Culture Impact Enterprise Software," Forrester Blogs, https://www.forrester.com/blogs/predictions-2026-ai-agents-changing-business-models-and-workplace-culture-impact-enterprise-software/
Jiaxin Zhang, Prafulla Kumar Choubey, Kung-Hsiang Huang, Caiming Xiong, and Chien-Sheng Wu, "Agentic Uncertainty Quantification," arXiv preprint (Salesforce AI Research), 2026, https://arxiv.org/html/2601.15703
Deloitte, "State of AI Report 2026," Deloitte Press Room, https://www.deloitte.com/us/en/about/press-room/state-of-ai-report-2026.html
National Institute of Standards and Technology (NIST), "Announcing AI Agent Standards Initiative for Interoperable and Secure Systems," NIST News & Events, February 2026, https://www.nist.gov/news-events/news/2026/02/announcing-ai-agent-standards-initiative-interoperable-and-secure
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao and Karthik Narasimhan, "Tree of Thoughts: Deliberate Problem Solving with Large Language Models," Hugging Face Papers, 2023, https://huggingface.co/papers/2305.10601
Data Science Dojo, "Agentic LLM in 2025," Data Science Dojo blog, https://datasciencedojo.com/blog/agentic-llm-in-2025/
"AI Agent," Wikipedia, https://en.wikipedia.org/wiki/AI_agent
IBM, "What is Agentic Reasoning?," IBM Think Topics, https://www.ibm.com/think/topics/agentic-reasoning
Hugging Face, "Agent Steps and Structure," AI Agents Course, Unit 1, https://huggingface.co/learn/agents-course/en/unit1/agent-steps-and-structure
Wollen Labs, "Navigating Modern LLM Agent Architectures: Multi-Agents, Plan-and-Execute, ReWOO, Tree of Thoughts and ReAct," Wollen Labs blog, https://www.wollenlabs.com/blog-posts/navigating-modern-llm-agent-architectures-multi-agents-plan-and-execute-rewoo-tree-of-thoughts-and-react
World Economic Forum, "3 Obstacles to AI Adoption and Innovation — and How to Overcome Them," WEF Stories, December 2025, https://www.weforum.org/stories/2025/12/3-obstacles-to-ai-adoption-and-innovation-and-how-to-overcome-them/
Elementum AI, "Human-in-the-Loop Agentic AI," Elementum AI blog, https://www.elementum.ai/blog/human-in-the-loop-agentic-ai
iMerit, "The Rise of Agentic AI: Why Human-in-the-Loop Still Matters," iMerit Resources, https://imerit.net/resources/blog/the-rise-of-agentic-ai-why-human-in-the-loop-still-matters-una/
Adversa AI, "Cascading Failures in Agentic AI: Complete OWASP ASI08 Security Guide 2026," Adversa AI blog, https://adversa.ai/blog/cascading-failures-in-agentic-ai-complete-owasp-asi08-security-guide-2026/
Arunkumar V., Gangadharan G.R., and Rajkumar Buyya, "Agentic Artificial Intelligence (AI): Architectures, Taxonomies, and Evaluation of Large Language Model Agents," arXiv preprint, 2026, https://arxiv.org/html/2601.12560v1
Stellar Cyber, "Agentic AI Security Threats," Stellar Cyber learn, https://stellarcyber.ai/learn/agentic-ai-securiry-threats/
Digital Applied, "AI Agent Security 2026: 1 in 8 Breaches Involve Agentic Systems," Digital Applied blog, https://www.digitalapplied.com/blog/ai-agent-security-2026-1-in-8-breaches-agentic-systems
Springs, "Everything You Need to Know About Multi-AI Agents in 2024: Explanation, Examples and Challenges," Springs knowledge base, https://springsapps.com/knowledge/everything-you-need-to-know-about-multi-ai-agents-in-2024-explanation-examples-and-challenges
Ben Dickson, "LLM Tool Use and Agentic AI," BD Tech Talks, 29 December 2025, https://bdtechtalks.com/2025/12/29/llm-tool-use-agentic-ai/
Machine Learning Mastery, "7 Agentic AI Trends to Watch in 2026," Machine Learning Mastery blog, https://machinelearningmastery.com/7-agentic-ai-trends-to-watch-in-2026/
Keep reading

Oct 22, 2024
LangGraph for fast, recoverable, and observable agent workflows

Oct 15, 2025
The foundation for real-time AI: Inside Aerospike’s high-performance data infrastructure

Aug 22, 2025
From prediction to autonomy: AI’s evolution delivers new data demands

Jul 8, 2025
Gen AI 2.0 is here: Why agentic AI runs on real-time data infrastructure
