AI Agents

LangGraph in Production: Fixing the Agent Gap

Agents that ace evals still fail in production. Discover how LangGraph's error handling primitives and MCP's expanding surface area close the gap.

Philip

22 Apr 2026 — 5 min read

Why agents collapse after demos, how LangGraph's error taxonomy maps to real primitives, and why MCP is becoming the connective tissue for enterprise tool integration.

Summary

LangGraph's error handling primitives and the context layer problem are converging into a single question: why do agents that work in demos collapse in production? This issue covers the architectural gap between orchestration and context, what LangGraph 1.1.9 fixes under the hood, and how MCP is quietly becoming the connective tissue for tool integration across crypto, enterprise, and browser-based agents.

The Production Gap Nobody Wants to Talk About

You can build an agent that passes every eval you write and still watch it destroy itself on day three in production. The failure mode is almost never the model. It is the plumbing around the model, specifically how errors propagate, how context degrades across turns, and how tool calls fail silently when the environment changes.

This week's signal is dense but directional. Three separate threads are converging: LangGraph's push toward production-grade error handling, the enterprise context layer problem finally getting named clearly, and MCP expanding its surface area into wallets, browsers, and beyond.

Error Classification Is the Skill Nobody Teaches

The most practically useful framing to come out of the LangGraph ecosystem recently is the four-class error taxonomy: Transient, LLM-Recoverable, User-Fixable, and Unexpected. This sounds obvious until you watch an agent retry a user-fixable error 47 times, rack up token costs, and return a timeout.

The mapping to LangGraph primitives is direct. Transient errors (network blips, rate limits) get RetryPolicy. LLM-Recoverable errors, where the model made a bad tool call but can self-correct given the error message, get stored in state as data and fed back into the reasoning loop. User-Fixable errors trigger interrupt(), surfacing to a human rather than burning retries. Unexpected errors get caught and escalated without the system pretending it knows how to handle them.

Errors Belong In State, Not The Void

The critical architectural insight is this: errors are state, not exceptions. If you throw them away, the LLM cannot adjust. If you store them in the graph state, the model can see what failed, why it failed, and route differently. This is not a minor implementation detail. It is the difference between an agent that recovers gracefully and one that hallucinates its way past a broken tool call.

LangGraph 1.1.9 ships a fix for ReplayState propagation to subgraphs on plain resume. This matters specifically if you are running nested graphs with checkpointing. Before this fix, a resume operation could incorrectly propagate replay context into child subgraphs, causing them to re-execute already-completed steps. Small release, significant correctness fix for anyone running complex multi-agent topologies with human-in-the-loop interrupts.

Errors are data, not exceptions. If your agent discards error context rather than storing it in graph state, you are throwing away the LLM's best signal for self-correction.

Context Is a Four-Layer Problem, Not a RAG Problem

Most teams treat context as a retrieval problem. They add a vector store, tune their embeddings, and wonder why the agent still gives wrong answers on multi-step tasks. The retrieval is fine. The context architecture is broken.

The four-layer context problem breaks down as follows. First, there is the immediate context window, what the model can see right now. Second, there is episodic context, what happened earlier in this session. Third, there is semantic context, what the model knows about the domain and the user's goals. Fourth, there is structural context, the implicit rules of the environment the agent is operating in.

Tooling Solves the Wrong Layer Entirely

Current tooling handles layer one adequately and layer four barely at all. The result is agents that are locally coherent but globally confused. They answer the current question correctly but violate a constraint established three turns ago, or fail to understand that they are operating inside a financial compliance workflow where certain actions require explicit confirmation.

Enterprise Orchestration Is Not the Same as Agent Architecture

The MuleSoft integration story is worth examining critically. The claim of up to 40% latency reduction through Anypoint Platform integration with LLMs like GPT-4o is stated without methodology. Forty percent compared to what baseline, measured at what stage of the pipeline, under what load conditions? The number is unverifiable as presented.

What is structurally interesting, though, is the architectural argument underneath it: when you give an LLM a unified data interface instead of forcing it to reason across disconnected APIs, you reduce the prompt complexity required to ground each response. Fewer tools, cleaner schemas, less ambiguity in the tool call space. That is a real benefit, even if the specific number is not trustworthy. The mistake is conflating enterprise integration middleware with agent architecture. They solve adjacent problems. MuleSoft handles the data plumbing. It does not solve the context degradation problem across a long agentic session.

Retrieval is not the same as context. You can retrieve perfectly and still give an agent no coherent picture of the task it is actually doing.

MCP Is Eating the Tool Integration Layer

Model Context Protocol is now showing up in three distinct environments this week: isolated sandbox environments for browser-use agents, enterprise orchestration layers, and crypto wallet infrastructure.

The WAIaaS implementation is the most structurally interesting. It exposes 45 tools for wallet management, DeFi protocol interaction, NFTs, and payment automation through an MCP server that Claude Desktop connects to directly. The three-layer architecture (daemon, MCP server, Claude Desktop) eliminates the need for custom REST API integration per tool. If this pattern holds, it means any agent runtime that speaks MCP gets crypto capabilities without bespoke integration work.

MCP Quietly Separates Tools From Agents

The practical implication for builders is sharper than the crypto angle suggests. MCP is functioning as an abstraction layer that decouples agent capability from agent runtime. You build the tool server once. Any MCP-compatible client consumes it. This is the same bet that LangChain made on tool abstraction in 2023, but MCP is doing it at the protocol level rather than the library level. Protocol-level abstractions are stickier.

Browser-Use Agents Still Need Isolation You Can Reason About

The AIO Sandbox approach of combining browser-use with MCP in a programmatically isolated environment addresses a real operational concern. Browser-use agents are among the most dangerous in terms of side effects. They can click, submit, navigate, and exfiltrate in ways that are difficult to audit after the fact. An isolated, programmable sandbox where you can inspect state before and after each action is not a development convenience. It is a safety requirement if you are running these agents against anything that matters.

MCP at the protocol level is more durable than MCP as a library pattern. If you are building tool servers today, build them to the protocol spec, not to a specific SDK wrapper.

What to Build With This

The plan-and-execute architecture used in FinAdvise, a finance assistant built on LangGraph, demonstrates that the pattern is mature enough for domain-specific production deployment. The reported 30% increase in user engagement is a product metric, not a model metric, so treat it as directional signal rather than evidence of architectural superiority. What matters is that plan-and-execute in LangGraph is now a documented, deployable pattern with enough production surface area to debug from.

Three things to do before your next agent goes to production

Classify your error types before you write error handlers. Map each error class to a LangGraph primitive. Retrying user-fixable errors is the most common and most expensive mistake.

Audit your context layers, not just your retrieval. Ask whether your agent has episodic context across turns and whether it knows the structural constraints of the environment it is operating in.

Build tool servers to the MCP spec if you are building for agent consumption. The ecosystem is converging there faster than alternatives.

The Bottom Line

LangGraph's error taxonomy (Transient, LLM-Recoverable, User-Fixable, Unexpected) is the most actionable production framework in the ecosystem right now. Use it.
Context failure is architecturally distinct from retrieval failure. Fixing RAG does not fix context.
MCP is becoming the protocol-level abstraction for tool integration across runtimes, environments, and now blockchain infrastructure.
LangGraph 1.1.9's ReplayState fix is small but critical for anyone running nested graphs with checkpoints and human-in-the-loop interrupts.
The gap between demo performance and production stability is still almost entirely a plumbing problem, not a model problem.

Sources: Medium: LLM (April 22, 2026), Medium: Agentic AI (April 21, 2026), DEV.to (April 21, 2026), GitHub: LangGraph Releases, Medium: LangChain (April 21, 2026), Dev.to: AI tag (April 21, 2026), Medium: AI Agents (April 21, 2026)