Agent Security

AI Agent Security: The 2026 Infrastructure Gap

66% of firms already suffered AI agent breaches. Why system prompts fail as policy layers, and how just-in-time firewall architecture closes the gap.

Philip

22 Apr 2026 — 6 min read

Two thirds of firms hit by AI agent incidents. System prompts aren't security boundaries — here's what the SupraWall firewall architecture gets right.

Summary

Two thirds of firms have already suffered security incidents from unchecked AI agents, and the tooling to stop the bleeding is arriving in fragments. This piece examines SupraWall's just-in-time firewall architecture, what the Cloud Security Alliance data actually implies for practitioners, and why the infrastructure gap under agentic systems is the real engineering problem of 2026.

The Security Numbers Are Not Hypothetical

The Cloud Security Alliance report landed this week with a number that should stop any engineering leader mid-sprint: two thirds of firms have experienced a cybersecurity incident directly attributable to unchecked AI agents. Not theoretical exposure. Not pen-test findings. Actual incidents, with data exposure, operational disruption, and financial losses already recorded.

This is the cost of moving fast on agent deployments without solving the control plane first. The pattern is consistent across every production agent stack that has shipped without proper tooling: the model behaves well in eval, it behaves well in staging, and then it calls a tool it was never supposed to call in production because no hard enforcement existed between intent and execution.

System Prompts Are Not a Security Boundary

The architectural mistake is treating the system prompt as a policy layer. It is not. A system prompt is a probabilistic suggestion. Under adversarial prompting, under distribution shift, under any sufficiently complex multi-step ReAct loop, a system prompt will eventually fail to prevent a tool call it was supposed to prevent. This is not a criticism of any specific model. It is a structural property of instruction-following systems. Probabilistic compliance is not compliance.

The gap between "the agent was told not to do this" and "the agent cannot do this" is exactly where the two-thirds number lives.

Two thirds of firms have experienced cybersecurity incidents from unchecked AI agents. The incidents involve real data exposure and financial loss, not theoretical risk vectors.

What SupraWall Actually Does

SupraWall is an open-source firewall for AI agents, Apache 2.0 licensed, released this week by a developer who spent 14 months building it. The architecture pattern is a hard gate that intercepts every tool call before execution and evaluates it against a real-time policy. The agent does not bypass this gate by being clever. The gate is not in the prompt.

The second architectural decision is just-in-time credential injection. Rather than baking credentials into the agent context at initialization, SupraWall pulls from a vault at the moment a tool call passes policy evaluation. The agent never holds live credentials in its working context. If a prompt injection attack or a runaway plan-and-execute loop attempts to exfiltrate credentials, there is nothing to exfiltrate because the credentials were never there.

Why JIT Injection Matters Downstream

In a standard agentic setup running on something like LangGraph or AutoGen, credentials are typically injected at session start and travel through the entire execution graph. Every node in that graph, every sub-agent, every tool wrapper has access to the full credential scope for the duration of the session. This is convenient. It is also the reason a single compromised node can escalate to full environment access.

JIT injection breaks that blast radius. Each tool call receives only the credentials scoped to that specific action, issued at that specific moment, and gone immediately after. The agent's memory never contains a live AWS key or a database connection string. This does not eliminate all attack surfaces, but it makes the most common escalation path structurally unavailable.

Compliance Claims Outrun the Actual Audits

The project also claims EU AI Act readiness. No independent audit of that claim exists yet, so treat it as a design intention rather than a certification. What it signals is that the architect was building with compliance requirements in mind from day one, which is a different starting point than retrofitting governance onto an existing system.

In most agentic architectures, credentials live in the agent's working context for the full session. JIT injection means there is nothing to steal because the secrets are never present until the exact moment they are needed.

The Infrastructure Gap Is Structural, Not a Sprint

The Cloud Security Alliance data and the SupraWall release are symptoms of the same underlying condition: agentic systems outpaced the infrastructure designed to support them. This is not a novel observation, but the 2026 version of this problem is more specific and more urgent than the 2024 version.

In 2024, the infrastructure gap was mostly about orchestration. Teams were gluing together LangChain, custom tool wrappers, and ad-hoc retry logic. The failures were mostly reliability failures: agents that looped, hallucinated tool inputs, or timed out. Annoying, but recoverable.

The 2026 Gap Is About Control, Not Performance

The 2026 gap is about control and enforcement. Agents are now being handed real permissions: write access to databases, execution rights on cloud infrastructure, the ability to send communications on behalf of users. The failure mode is no longer "the agent gave a wrong answer." The failure mode is "the agent deleted production data" or "the agent sent a phishing email to a customer list."

NeoCognition, which raised $40 million in seed funding this week, is attacking the reliability gap from the model side. Their approach trains agents through on-the-job experience rather than pre-training, building domain-specific world models that they claim will close the current 50 percent task completion rate for complex agentic tasks. The 50 percent number is plausible for multi-step tasks in novel environments, though NeoCognition's own data on this figure comes from the company itself and has not been independently validated. The mechanism is interesting: rather than generalizing from a large pre-training corpus, the agent specializes through accumulated experience on its target domain. Whether this holds at production scale is the question their $40 million is supposed to answer.

Capability Without Control Is Just Faster Damage

The distinction worth making: NeoCognition is trying to make agents more capable. SupraWall is trying to make the agents that already exist less dangerous. Both problems are real. The security problem is more urgent because the agents are already deployed.

What You Should Actually Do

If you are running agents in production with real tool permissions today, the immediate priority is auditing what your agents can do, not what you told them to do. Pull your tool schemas. List every permission scope those tools carry. Ask whether a compromised or misbehaving agent with access to those tools could cause an incident that would show up in a Cloud Security Alliance report. If the answer is yes, you are in the two-thirds bucket by exposure even if you have not been in it by incident yet.

The gap between "the agent was told not to do this" and "the agent cannot do this" is exactly where two thirds of enterprise security incidents now live.

SupraWall is worth evaluating as a drop-in enforcement layer, particularly if you are running on open-source orchestration where no native policy gate exists. The Apache 2.0 license removes procurement friction. The JIT credential architecture directly addresses the most common escalation vector. The codebase is fresh and has not been battle-tested at scale by external teams yet, so run it in a staging environment, review the policy evaluation logic yourself, and do not deploy it into a critical path without understanding what happens when the policy gate itself fails.

Separate Reasoning From Policy, Always

The broader architecture principle applies regardless of which specific tool you use: separate the policy enforcement layer from the agent's reasoning layer. The agent decides what to do. A separate, hard-coded system decides what it is allowed to do. These two systems should not be the same system.

Three Layers Every Production Agent Stack Needs

Policy enforcement at the tool call boundary, implemented as a hard gate not a prompt instruction, evaluated against explicit allow/deny rules

Credential isolation through JIT injection or equivalent, ensuring no live secrets persist in agent working memory or context windows

Audit logging at the tool execution level, capturing what was called, with what parameters, and what the policy decision was, separate from model-level logging

The Bottom Line

Two thirds of firms have hit real security incidents from unchecked agents, and the number will grow as tool permissions expand
System prompts are probabilistic, not enforceable. A hard policy gate at the tool call boundary is the minimum viable security architecture for any agent with real permissions
JIT credential injection closes the most common escalation path by ensuring agents never hold live secrets in context
NeoCognition's experience-based learning approach is technically interesting but unvalidated at scale. The reliability problem is real but secondary to the security problem for teams already in production
Audit your tool permission scopes now. The question is not whether your agent would misuse them. It is whether it structurally cannot.

Sources: Dev.to: AI tag (April 22, 2026), The Decoder (April 22, 2026), The Next Web AI (April 22, 2026), NewsAPI (April 21, 2026)