Blog

LangGraph for fast, recoverable, and observable agent workflows

Analyze how LangGraph handles latency, checkpointing, replay, and fan-out under production load, and why predictable state storage determines agent reliability.

October 22, 2024 | 16 min read

Alexander Patino

Solutions Content Leader

Enterprise buyers don’t judge agent frameworks by how convenient they are for developers. They judge them by whether they can run reliably in real production systems, running quickly, accurately, traceably, and resiliently when something goes wrong.

That’s why LangGraphadoption has scaled. It’s an orchestration framework for agents that’s built for real-world use: clear control over what the agent does, and a way to keep its state intact, which are two things that tend to break first once an agent leaves the demo stage.

MLPerf Inference benchmarks underscore the importance of interactive latency metrics, such as time to first token (TTFT) and time per output token (TPOT). In many real-time applications, systems target TTFT well under a second and TPOT measured in tens of milliseconds to maintain responsive user experiences. Those kinds of latency budgets leave little room for agent orchestration overhead in workflows, where each additional step may introduce I/O, serialization, or state persistence before the model can continue generating output.

Enterprises also need reliable recovery. If a run pauses for human approval, or a tool call times out, the system must resume from a known good point without repeating side effects such as writing a ticket twice or executing a payment action twice. Finally, enterprises need the system to behave predictably under load. An agent that checkpoints state every step and stores long-term memory creates a repeatable pattern of reads and writes. That pattern can be sized and engineered, but only if:

The framework makes state boundaries explicit, so you know what gets saved and why
The storage tier delivers stable latency even when traffic is high

What makes LangGraph different

LangGraph is a low-level orchestration framework and runtime for building long-running, stateful agents. It’s developed by the same team that put together the LangChain ecosystem. It is intentionally narrower than general LLM libraries because it focuses on orchestration and execution semantics rather than integrations and prompt composition.

That focus is the practical difference that shows up during production. A framework that treats the agent loop as a “black box” makes it harder to insert enterprise controls such as approval steps, reproducible replay, and persisted state checkpoints that save progress along the way. LangGraph has these built in.

LangChain chains collapse many application flows into a linear sequence of calls unless the developer constructs custom branching logic. Instead, LangGraph models the workflow as a graph and supports cycles directly. That matters because most agents do not execute once and exit. They loop between planning, tool use, and synthesis. In a graph model, that loop is explicit and inspectable.

Unlike agent frameworks that focus on having the agent act independently, LangGraph is built around controllability. For example, look at how frameworks handle tool execution. In a measured multi-agent workflow, some systems introduce seconds of agent deliberation before invoking a tool, while others run tools directly. The directness is not always better, but it is measurable, and it is a design choice. LangGraph uses explicit orchestration and direct execution patterns rather than mandatory deliberation overhead, which means less internal overhead, more predictable timing, and more control over the workflow.

In comparison with batch and job orchestrators such as Apache Airflow, LangGraph is designed for interactive per-request execution with streaming outputs and persisted state snapshots. While a batch orchestrator is useful for offline pipelines, it doesn’t work for a user-facing agent where the same request needs controlled branching, keeping track of the state at each step, and delivering output from a language model as it’s generated, piece by piece, rather than waiting until the model finishes the whole response.

Five signs you have outgrown Cassandra

Does your organization offer real-time, mission-critical services? Do you require predictable performance, high uptime and availability, and low TCO?

If you answered yes to one or both of these questions, it is likely that your Cassandra database solution isn’t cutting it. Check out our white paper and learn how Aerospike can help you.

Download white paper

How LangGraph runs and how that changes latency and failures

LangGraph executes workflows in step boundaries where a node runs, updates shared graph state, and produces messages that determine the next set of nodes to execute.

Each step has three phases:

Planning which actors to execute
Executing the selected actors in parallel
Applying updates

What that means for latency is that within a superstep, intermediate updates are not visible to other actors until the next step. That structure makes race conditions less likely, and makes it practical and manageable to rerun a workflow from a previous point, because the system treats each superstep as a boundary, with a well-defined input and output.

The upshot is that checkpoints align with supersteps. Using a checkpointer, LangGraph can persist state at step boundaries, which allows workflows to resume from previously saved graph states. Even a simple example in the documentation saves four snapshots during a basic run. But the number of snapshots depends on how many steps the agent takes, not how long it runs.

For instance, if an agent does 12 steps per request and the system handles 2,000 requests per second, the storage layer has to handle 24,000 writes per second, plus reads for resuming or inspecting workflows. This is how much your storage has to support in production.

That works for crash recovery and resuming from pauses as well. LangGraph can pause a workflow and resume it later, even after a delay as long as a week. It does this by saving its progress at key steps. When resuming, the workflow doesn’t just jump back into the exact line where it stopped, but instead replays the steps from a safe starting point. To make sure nothing happens twice by accident, any steps that have unpredictable results or side effects, such as sending email messages or charging a card, need to be marked as a separate task. Workflows should be designed so that repeating steps doesn’t break anything and always produces the same result.

Persistence, memory, and the data decisions that determine the worst case

LangGraph distinguishes between two kinds of “state” in workflows, and enterprises should handle each type in its own way.

Thread-local execution state and checkpointing

LangGraph saves the progress of each workflow run using a unique ID called a thread identifier, commonly called a thread_id. This saved state lets you resume workflows after interruptions, inspect past runs, and recover from failures. Without the ID, the system can’t track or resume the workflow.

For production checkpointers, the LangGraph reference documentation recommends using the Postgres-backed checkpoint saver rather than the in-memory saver. That’s because in-memory persistence fails when a process restarts, which happens a lot in enterprise environments.

Cross-thread long-term memory and stores

LangGraph provides a long-term memory system that remembers information across multiple conversations. It lets you organize data, search by similarity, and automatically expire old memory (time to live, or TTL). This is important for enterprise agents that need to keep user-level context between sessions, not just during one conversation.

Semantic search is disabled by default. To use it, you have to set up an index and use a compatible store. This avoids unnecessary work, but it also means teams need to decide in advance whether they want to use semantic search.

Benchmarks and tradeoffs that show real-world cost and latency

It’s easy to get lost in abstract ideas about LangGraph. For enterprises, it’s better to look at actual measurements, such as how many tokens are used, how long tool calls take, and how much extra time the framework adds, to predict real-world performance.

Orchestration overhead compared with LLM generation time

A published benchmark comparing agentic orchestration frameworks ran the same five agent workflow 100 times and measured pipeline latency, token usage, and phase-level gaps. In that benchmark, LangGraph finished more than twice as fast as the open source CrewAI agent platform. The same benchmark found that most of CrewAI’s latency came from the time it took for an agent to interact with a tool, where about five seconds of a nine-second latency segment came from that gap. It also reports that AutoGen wasn’t as efficient with tokens, because LangGraph passes only necessary state changes rather than full conversation histories.

Here’s what that means for enterprise architects: Token volume is not just cost, but also latency, because tokens take both prompt processing and generation time. In a workflow with multiple agents, frameworks that propagate full histories and verbose intermediate steps inflate both.

Tool calling accuracy and why tool catalogs must stay small

The more tools you have, the less accurate your tool selection is, and OpenAI makes that an explicit guideline. The OpenAI function calling guide recommends keeping the number of functions small for higher accuracy and recommends fewer than 20 functions at any one time. That’s because function definitions are injected into the system message, count against the context limit, and are billed as input tokens. That costs money.

By default, every request tells the agent about all available tools. But if the system is set up to only include the relevant tools, the request is smaller and faster. Instead of giving the agent all tools every time, enterprises often store tool information in a database and fetch only the relevant ones for each request. This speeds up decision-making but adds some operational complexity, such as managing searches and keeping cached data consistent, but these are problems companies already know how to handle.

LLM serving throughput and the limit on end-to-end latency

No matter how good your workflow or orchestrator is, it can’t make the LLM respond faster than the model allows. Benchmarks such as MLPerf measure how quickly the model produces its first token (TTFT) and each following token (TPOT), giving realistic numbers for interactive performance.

For example, large models, such as Llama 2 70B, often generate tens of tokens per second depending on hardware and serving configuration. These figures help set expectations for real-world responsiveness, not just ideal lab results.

Specialized LLM-serving engines are designed to handle more requests at once without slowing down individual responses. For example, vLLM processes two to four times more requests than older systems at the same speed by using memory and batching more efficiently. This sets the realistic performance ceiling for any agent framework that runs the model itself.

Aerospike Customer Story: CRED - Scaling real-time applications: An Aerospike approach

Real-time agent workflows put constant pressure on the data layer with checkpoint writes, memory lookups, and user-facing latency budgets. See how CRED built a platform that keeps interactive applications fast and predictable even under massive production load.

Watch now

Running LangGraph in production with the controls enterprises need

In real-world workflows, companies need tools to keep operations safe, track what’s happening, and debug issues. LangGraph includes these features as part of the workflow system, so teams don’t have to build them from scratch around a hidden agent process.

Human approval without timeouts

LangGraph interrupts pause execution at specific points and waits for external input before continuing. When an interrupt triggers, LangGraph saves the graph state using its persistence layer and then waits indefinitely until execution resumes. Human approval gates let someone review and approve high-risk steps in a workflow, such as updating a system, sending a customer message, or escalating an incident.

When a workflow pauses to wait for a human, it doesn’t waste computing power, but saves its state and resumes later. This makes the difference between workflows that safely involve people and workflows that crash if they take too long.

Debug workflows by replaying past runs

LangGraph time travel supports resuming execution from a prior checkpoint, replaying and modifying state, and then continuing down a new path. Resuming past execution produces a new fork in history. This matches how enterprise debugging works. A team needs to reproduce a failure with the same inputs, then test a fix by changing state and rerunning from the point of divergence. Time travel provides that without inventing a custom event-sourced run log.

Durable execution that forces correct side effect design

When workflows call external services, they must be able to save their progress and resume safely.

To do this, workflows should be deterministic and idempotent, and any side effects or non-deterministic operations should be wrapped in task boundaries so they are not repeated during replay. This is the same thing that workflow engines and payment systems do, just applied to LLM tools.

Deploying without custom infrastructure

In hosted LangGraph deployments, such as LangGraph Platform, checkpointing is handled automatically by the runtime. Enterprises that use a standard platform for deployment can avoid building custom storage systems and writing lots of custom code just for state management.

LangGraph resumes a workflow after a crash or waits for human input

Durable execution in LangGraph persists state, so a workflow resumes without having to repeat steps over again, even after long delays. LangGraph makes sure workflows can be safely restarted by replaying steps in a predictable way and keeping actions that affect the outside world separate, so they aren’t accidentally done twice. Interrupts persist state and wait indefinitely until the system resumes execution.

Artificial intelligence foundations: Generative AI, agentic AI, and AI workflows

Enterprises evaluating LangGraph are usually not starting with orchestration. They are starting with generative AI, then realizing the need for agentic control.

Generative AI refers to models that produce new content, such as text, code, images, or structured outputs based on learned patterns in large datasets. In enterprise contexts, generative AI systems are typically large language models (LLMs) run behind APIs or self-hosted inference stacks. These models excel at reasoning over unstructured input, synthesizing responses, extracting structured data, and generating draft artifacts.

However, by themselves, generative models don’t remember past requests and may give different answers each time. They also aren’t built to manage multi-step workflows or guarantee that external actions happen safely and predictably.

Agentic AI builds on generative AI by embedding models inside decision loops. Agentic systems do more than answer prompts. They plan, call tools, evaluate intermediate results, and iterate toward a goal. In practice, this means combining an LLM with external capabilities such as APIs, databases, search systems, and internal services. The agent decides when to call those tools, interprets results, and continues execution.

This introduces autonomy, but it also introduces complexity. Every tool call becomes a potential side effect. Every loop takes time and uses tokens. Every branching decision creates additional states that must be tracked if the system needs to resume or audit execution later.

AI workflows sit between pure generation and fully autonomous agents. An AI workflow may involve multiple model invocations arranged in a predefined sequence, such as “classify, then retrieve, then generate,” without giving the model full autonomy to decide the control path. Many enterprise use cases start here because workflows are easier to reason about, test, and constrain. Over time, as requirements grow to include dynamic tool selection, exception handling, and human-in-the-loop approvals, these workflows evolve toward agentic architectures.

The enterprise shift from generative AI to agentic AI is driven by real operational demands. Simple generation drafts and summarizes. But customer support automation, incident response, sales enrichment, fraud review, and internal copilots require multi-step reasoning with access to systems of record. At that point, the architecture must answer questions that go beyond model quality:

How is state preserved across steps and across sessions?
How are tool side effects prevented from executing twice during retries?
How are long-running tasks paused and resumed without losing context?
How is execution audited and replayed for compliance and debugging?
How is latency bounded when each additional step uses tokens and reads and writes data?

These are not model questions. They are system questions.

This is where orchestration frameworks such as LangGraph come in.

Generative AI provides probabilistic reasoning.
Agentic AI introduces tool use and goal-directed loops.
AI workflows impose structure.

LangGraph supplies explicit control flow, persisted state boundaries, and durable execution semantics that meet enterprise reliability requirements.

In other words, generative AI produces tokens. Agentic AI produces actions. AI workflows produce coordinated sequences. Orchestration determines whether those sequences run at scale without breaking correctness, latency budgets, or recovery guarantees.

Replacing Cassandra: A digital transformation story

Real-time identity and risk systems cannot afford slow or unpredictable data access. See how the world’s largest digital identity network replaced a 96-node Cassandra cluster with just 28 Aerospike nodes, cutting latency from 120 ms to 30 ms while processing more than 130 million transactions per day for fraud and trust decisions.

Watch now

Aerospike and LangGraph

LangGraph is built for the part of agent development that enterprises struggle with most: controlling execution over time while preserving state, auditability, and recovery behavior. But where does the system keep fast, durable state for checkpoints, memory, caches, and routing metadata while keeping latency low? That is a data infrastructure problem as much as it is an agent framework choice.

The answer becomes more urgent under fan-out and superstep amplification.

Fan-out happens when one user request triggers multiple parallel branches inside a graph. A planning node may spawn several tool calls. A retrieval step may query multiple data sources. A validation phase may execute parallel checks. Those branches often execute within the same superstep and then synchronize. Each branch reads state, writes intermediate results, and contributes to the next checkpoint.

Superstep amplification compounds this effect. Because LangGraph persists state at superstep boundaries, the number of checkpoint writes scales with the number of supersteps, not just with request count. If a workflow averages twelve supersteps and includes three-way fan-out in several of them, the number of state changes per user request grows quickly. This produces a predictable but high-volume pattern of small, latency-sensitive reads and writes.

This is where predictable low latency matters more than raw throughput.

In agent systems, the part of the system that handles the actual data is involved in generating the model’s output tokens. A checkpoint write that stalls for 20 or 30 milliseconds also pushes out the next model invocation, which pushes out time to first token for the user. When hundreds or thousands of concurrent agent runs all hit superstep boundaries at roughly similar times, any storage tier that doesn’t have reliable latency will become obvious to the user.

Aerospike is commonly used in enterprise architectures for these high fan-out, low-latency key-value access patterns. It is engineered for predictable performance under sustained high request rates, rather than opportunistic cache hits. Agent workloads require that predictability. Session state, checkpoint blobs, routing metadata, semantic cache entries, and user memory lookups all become part of the synchronous execution path.

The practical implication is straightforward. Under fan-out and superstep amplification, the storage tier must sustain:

A high volume of small writes for checkpoint persistence
A high volume of small reads for resume, memory access, and tool routing
Low and stable worst-case latency under bursty synchronization patterns
Operational durability so that restart or failover does not invalidate workflows that are in process

Many enterprises already use Aerospike as the low-latency data tier for session state, user profiles, fraud features, and other workloads. When LangGraph-based agents are introduced, those same characteristics are important for orchestration. Every superstep boundary and every memory lookup is another transaction that takes up time.

Predictable low-latency state under fan-out is what allows agentic workflows to scale without violating TTFT targets, blowing up worst-case response times, or compromising recovery guarantees.

Try Aerospike Cloud

Break through barriers with the lightning-fast, scalable, yet affordable Aerospike distributed NoSQL database. With this fully managed DBaaS, you can go from start to scale in minutes.

Get started

Frequently asked questions about LangGraph

Find answers to common questions below to help you learn more and get the most out of Aerospike.

What is LangGraph used for in enterprise applications?

LangGraph implements stateful agent workflows where control flow must be explicit, persisted, and recoverable. It’s useful for agents that run longer than one request, require human approvals, or need resumability after failures. The framework provides persistence through checkpointing and supports interrupts and time travel replay, which enterprises need.

How is LangGraph different from LangChain, and why does that matter for production reliability?

LangChain provides broad LLM integrations and composable building blocks, and often expresses flow as linear chains unless a developer builds custom routing. In contrast, LangGraph focuses on orchestration as a graph, which makes cycles, branching, and step boundaries explicit. That difference matters in production because interrupts, resumable execution, and replay require explicit state boundaries and deterministic control flow.

Does LangGraph make agents faster?

LangGraph reduces orchestration overhead in workflows where other frameworks propagate large histories or introduce tool deliberation cycles. In one benchmark, running a fixed five-agent workflow 100 times, LangGraph completed more than twice as fast as CrewAI and used tokens efficiently by passing state deltas rather than full conversation histories. Actual end-to-end latency remains bounded by LLM serving TTFT and TPOT constraints, which MLPerf formalizes for interactive scenarios.

How do I keep tool calling token costs under control with LangGraph?

The most direct strategy is to reduce the number of tools presented per request. OpenAI recommends aiming for fewer than 20 functions at any one time for higher accuracy and states that function definitions count as input tokens because they are injected into the system message. Architecturally, this encourages teams to use routing strategies that select a subset of tools per request, including vector retrieval of tool descriptions and semantic caching patterns that reduce repeated retrieval work.