40 Threat Scenarios for MCP, A2A and AG-UI

Last week I published the Open Agent Threat Format (OATF) specification, a YAML-based format for describing and reproducing attacks against AI agents. Today I'm releasing the companion to that specification: the OATF Scenario Registry, a browsable threat library of 40 scenarios covering three protocols shaping how agentic AI connects to tools, other agents and users.

The registry is at oatf.dev. Every scenario is a valid OATF v0.1 document with a structured attack timeline and protocol-level message flow diagrams. Each one includes references to the original research and mappings to MITRE ATT&CK, MITRE ATLAS and OWASP Top 10 for LLMs.

This is a threat library, not a validated benchmark. The scenarios document known attack patterns from published security research, or model plausible threats by combining independently demonstrated components. They validate against the OATF v0.1 schema, they describe real threats. What they don't yet have is statistical validation data from repeated execution runs. That comes next. I didn't want to hold this off while that work continues, because the catalogue itself is useful right now to anyone building, auditing or defending agent infrastructure.

What the registry covers

The 40 scenarios span three protocols:

MCP (Model Context Protocol) has the deepest coverage with 19 scenarios, plus two more that span MCP and A2A. This reflects the state of the research: MCP has been in production longest and has attracted the most adversarial attention. Scenarios range from the canonical tool description injection (OATF-001) through supply chain attacks (OATF-007), credential theft (OATF-021) and cross-tenant data exposure (OATF-038).

A2A (Agent-to-Agent) has 11 scenarios covering inter-agent communication risks. These include Agent Card spoofing (OATF-009), transitive trust chain abuse (OATF-018), task hijacking (OATF-022) and push notification webhook abuse (OATF-032).

AG-UI (Agent User Interaction Protocol) has 8 scenarios targeting the agent-to-frontend boundary. Message list injection (OATF-011), shared state manipulation (OATF-017), stream hijacking (OATF-029) and lifecycle event spoofing (OATF-030) are representative.

The two cross-protocol scenarios, the pivot attack (OATF-015) and the multi-hop prompt injection chain (OATF-013), model what happens when MCP, A2A and AG-UI operate together with no security validation at the boundary seams.

A few to start with

If you want a sense of the range: OATF-002 models a temporal rug pull where a malicious MCP server builds trust with legitimate responses before swapping its tool definition mid-session. OATF-015 chains a poisoned MCP response through an A2A delegation into AG-UI rendering, crossing three trust boundaries with no validation at any seam. OATF-027 uses ANSI escape sequences to hide prompt injection payloads in plain sight inside terminal-based MCP clients. Each of these is worth a longer write-up, and I'll dig into several individually in future posts.

Coverage matrix and framework mappings

Every scenario in the registry maps to at least one technique in MITRE ATT&CK, MITRE ATLAS, or the OWASP Top 10 for LLMs. The coverage matrix provides an interactive view of which established techniques have OATF scenarios and, just as usefully, which don't yet.

Some observations from the current mapping:

OWASP LLM01 (Prompt Injection) appears in nearly half the scenarios. This is unsurprising. Prompt injection is the root cause of most agent-level attacks, even when the delivery mechanism varies. What the scenario library makes visible is how many different surfaces prompt injection can enter through: tool descriptions, tool responses, Agent Card fields, AG-UI message lists, shared state objects, prompt templates, and artifact payloads.

MITRE ATLAS AML.T0061 (AI Agent Tools) and AML.T0058 (AI Agent Context Poisoning) together cover over a third of the scenarios. These are newer technique identifiers that didn't exist when most agent frameworks were designed. The scenarios in the registry provide concrete, structured examples of what these abstract technique descriptions look like in practice.

The A2A and AG-UI mappings are thinner. This reflects both the relative maturity of adversarial research on these protocols and the fact that MITRE ATLAS and OWASP have not yet developed technique taxonomies specific to agent-to-agent or agent-to-frontend interactions. As those taxonomies develop, the OATF mappings will expand.

Building your own scenarios

The registry includes a web-based editor for creating and validating new OATF scenarios. You can write YAML directly, load an existing scenario for modification, and preview the structured output before exporting.

If you find a threat that should be in the registry, contributions are welcome via pull requests to the oatf-spec/scenarios repository. The bar for inclusion is a scenario that describes a real, documented or plausible threat, follows the OATF v0.1 schema, and includes at least one framework mapping.

What comes next

The immediate next step is execution. ThoughtJack, an open-source adversarial testing runtime, will be the first tool to run these scenarios against live agent deployments and measure statistical success rates. That is what turns a threat library into a benchmark. Beyond that, the registry will grow. 40 scenarios is a reasonable starting point, but there are known attack patterns not yet covered, and new research appears weekly. The OAuth confused deputy surface alone (OATF-020) probably warrants three or four additional scenarios as MCP's auth model evolves.

The long-term goal is a leaderboard: objective, reproducible measurements of how different LLM and agent framework combinations perform against the full scenario library. The registry and the validation pipeline are the foundations it needs.

The registry is at oatf.dev. The specification is at oatf.io. The scenario source is at github.com/oatf-spec/scenarios. All of it is open.

40 Threat Scenarios for MCP, A2A and AG-UI

What the registry covers

A few to start with

Coverage matrix and framework mappings

Building your own scenarios

What comes next

More from this blog

Depth vs breadth: the two kinds of AI agent security testing

A Declarative Schema for MCP Attacks: Why We Need One

Command Palette

What the registry covers

A few to start with

Coverage matrix and framework mappings

Building your own scenarios

What comes next

More from this blog