Agent Security

MCP in Production: Power, Peril, and 138 CVEs

MCP is reshaping multi-agent architecture fast. But OpenClaw's 138 CVEs prove the stack is cracking. Where does your real exposure live?

Philip

10 Apr 2026 — 6 min read

MCP is now the connective tissue of agentic AI—from medical dosimetry to telecom. OpenClaw's 138 CVEs reveal what ignoring security costs.

Summary

The agentic AI stack is maturing fast and cracking under pressure simultaneously. MCP is becoming the connective tissue of production agent systems while OpenClaw's 138 CVEs expose what happens when you deploy that stack without treating security as a first-class concern. This piece maps the technical terrain from medical dosimetry to telecom to hardware infrastructure, and tells you where the real exposure lives.

The MCP Moment Is Real, and So Is the Attack Surface

Model Context Protocol has quietly become the default plumbing for multi-agent systems. Two weeks ago it was a spec. Today it is the integration layer inside DosimeTron, a production agent system automating Monte Carlo radiation dosimetry on PET/CT scans using GPT-5.2 as its reasoning engine across 23 tools distributed over four MCP servers. It is also the integration layer inside Gemini Enterprise deployments scaling past prototype stage. And it is, apparently, the integration layer inside OpenClaw, which has accumulated 138 known CVEs including critical exploits that allow full system control.

That last fact deserves to sit in your working memory before you read anything else here.

MCP Is a Trust Boundary, Not Just a Protocol

The architectural appeal of MCP is real. It gives you a standardized way to expose tools to a reasoning model, manage context windows across server boundaries, and compose complex capability graphs without writing custom glue code for every integration. DosimeTron's architecture demonstrates this concretely: DICOM metadata extraction, image preprocessing, and dosimetric reporting are handled as discrete tool calls across a structured server topology, not as monolithic pipeline code. The system achieves Pearson's r between 0.965 and 1.000 across organ-level dosimetric accuracy, with mean absolute percentage difference below 5% for 19 of 22 organs. These are peer-reviewed numbers on a public PSMA-PET/CT dataset of 597 studies. When MCP-based architecture works, it works.

But MCP servers are also trust boundaries. Every tool endpoint is a potential injection surface. Every server-to-server handoff is a place where a compromised context payload can propagate laterally. The OpenClaw vulnerability count is not a coincidence. It is what happens when teams treat MCP as a convenience layer and skip the access control, input validation, and sandboxing work that the protocol does not provide for you.

Running MCP without auth validation exposes your entire tool surface to prompt injection. OpenClaw's 138 CVEs include critical exploits enabling full system control. This is not theoretical risk. It is documented, reproducible, and sitting in your dependency tree right now.

What OpenClaw Tells Us About the Entire Agentic Stack

138 CVEs on a single agentic platform is not a patch management problem. It is a signal about how the category was built. The agentic AI ecosystem grew faster than its security posture. Frameworks optimized for developer velocity and demo-ability, not for the threat model that emerges when an LLM has tool access to filesystems, APIs, and internal services.

The Attack Model Is Different From Traditional Software

In a traditional application, the attack surface is bounded by the inputs your code explicitly handles. In an agentic system with MCP, the attack surface includes the model's interpretation of natural language tool descriptions, the content of documents retrieved into context, the outputs of upstream tool calls that become inputs to downstream ones, and the reasoning chain that determines what gets executed next.

Prompt injection through retrieved content is the canonical example: an attacker embeds instructions in a document that the agent retrieves and processes, causing the model to treat adversarial text as legitimate directives. This is not a hypothetical. It has been demonstrated against multiple deployed systems. In a multi-agent architecture with MCP, a successful injection at one server can propagate instructions to other servers in the graph because the contaminated reasoning output becomes a trusted input to the next tool call.

Your Attack Surface Is Now the Model's Mind

This means hardening an MCP-based system requires more than patching OpenClaw CVEs. It requires:

Input Validation at Every Tool Boundary

Treat every tool input as untrusted, even when it comes from another agent in your own system. Validate schema and reject payloads that contain instruction-like natural language in structured fields.

Sandboxed Tool Execution

MCP servers that touch filesystems, execute code, or make outbound network calls should run in isolated environments with minimal permissions. The principle of least privilege applies to agent tools the same way it applies to service accounts.

Context Provenance Tracking

Log not just what was executed but what context payload triggered execution. When an exploit occurs, you need to reconstruct the reasoning chain, not just the API call log.

Human-in-the-Loop Gates for Irreversible Actions

Any tool call that writes, deletes, or exfiltrates data should require explicit confirmation before execution. This is not UX friction. It is the difference between a contained incident and a full compromise.

The Gemini Enterprise Transition Validates the Risk Model

The framing that "the chat-only LLM wrapper era is ending" is accurate and understated. Gemini Enterprise positioning MCP as the scaling path from prototype to production means the protocol is being pushed into enterprise environments with existing security review processes, compliance requirements, and blast radius concerns that consumer-grade deployments never had to care about.

If you are scaling an agentic system on Gemini Enterprise with MCP today, your security review needs to include the MCP server topology explicitly. Not the LLM. The servers. The tools. The trust relationships between them.

The vulnerability is not in the model. It is in the assumption that tool calls are safe because the model decided to make them.

The Hardware Layer Is Catching Up to the Inference Demands

While the security conversation is urgent, the infrastructure conversation is also moving. Intel and SambaNova have announced a heterogeneous compute architecture that splits the inference workload across Intel Xeon 6 processors, GPUs, and SambaNova RDUs. The claimed division of labor: Xeon 6 handles host and action CPU tasks, GPUs handle prefill, RDUs handle decode.

This is architecturally coherent for agentic workloads specifically. Agentic inference patterns are not uniform. Prefill is compute-intensive and parallelizable. Decode is memory-bandwidth-bound and benefits from specialized silicon. Action execution is latency-sensitive but not compute-heavy. The heterogeneous split maps onto these different phases reasonably well.

Bold Claims Deserve Harder Questions

That said, Intel and SambaNova are making this claim in the context of a partnership announcement, not a published benchmark. Faster than what? Under which agentic workload configurations? Measured how? Browser-agent and computer-use scenarios have highly variable inference patterns that depend heavily on tool latency, not just model throughput. Until there are reproducible numbers on realistic agentic task distributions, treat the performance positioning as directionally interesting and empirically unverified.

SoundHound in Telecom Is a Deployment Pattern, Not a Breakthrough

The SoundHound and Associated Carrier Group partnership follows a well-established playbook: take a voice AI platform with existing telephony integrations and position it as agentic customer service. The "agentic" framing here means the system can handle multi-turn service interactions with some degree of goal-directed behavior, not that it is running complex plan-and-execute loops over enterprise tool graphs.

This matters because the security and reliability requirements for a telecom customer service agent are categorically different from what DosimeTron or a Gemini Enterprise deployment requires. Conflating them under the same "agentic AI" label obscures the engineering work.

What To Do With This Right Now

If you are running any MCP-based system in production, the OpenClaw CVE count should trigger an audit of your own server configurations, even if you are not using OpenClaw directly. The vulnerability classes that accumulate on one platform tend to exist in similar form across the ecosystem because the underlying design patterns are shared.

If you are evaluating Gemini Enterprise for agentic scale-up, push your vendor contact for the MCP security documentation before you push for the Gemini 2.5 Pro integration. The model capability question is mostly answered. The tool security question is not.

DosimeTron's Variance Reveals Your Pipeline's Hidden Risk

If you are building with DosimeTron's architecture as a reference, the 32.3-minute per-study processing time with a 6.0-minute standard deviation tells you something important: variance in agentic pipeline execution is real and needs to be budgeted for in SLA design, not smoothed over in demo conditions.

DosimeTron processed 597 PSMA-PET/CT studies with mean absolute percentage dosimetric error below 5% for 19 of 22 organs. This is what peer-reviewed agentic AI performance looks like. Most enterprise agentic deployments cannot show you anything close to this level of validation.

The Bottom Line

OpenClaw's 138 CVEs are a category warning, not just a product problem: audit your MCP server topology now
Every MCP tool boundary is a trust boundary that requires explicit input validation and sandboxed execution
DosimeTron is the clearest published evidence that agentic AI with MCP can reach production-grade accuracy when built rigorously
Intel/SambaNova's heterogeneous inference architecture is architecturally coherent but empirically unverified for agentic workloads
The security review for your agentic system needs to cover tools and context propagation, not just the model

Sources: Medium: AI Agents (April 10, 2026), Medium: Agentic AI (April 10, 2026), ArXiv CS.AI (April 10, 2026), NewsAPI (April 9, 2026)