A2A and MCP Agent Security: Identity, Delegation, and Audit Trails
Protocol security is who may act, not the model.
Prompt injection gets most of the security attention in LLM systems, and it deserves attention, but it is not the whole problem once agents start calling tools and delegating work to other agents.
MCP gives an agent structured access to files, APIs, databases, and ticketing systems. A2A lets one agent send tasks, messages, and artifacts to another agent that may belong to a different team, vendor, or runtime. Those protocols are useful precisely because they cross trust boundaries, which means identity, authorization, delegation limits, and audit trails become first-class architecture rather than optional hardening.

This article is the canonical guide for agent protocol security in the LLM Architecture cluster. It covers threat models, identity, gateways, registries, delegation, and production checklists. For input validation, output filtering, and prompt safety patterns, see LLM Guardrails in Practice instead.
Guardrails vs Protocol Security vs Runtime Policy
These three layers solve different problems and fail in different ways when conflated.
LLM guardrails operate on model input and output: blocking injection patterns, filtering harmful content, validating JSON shape, and enforcing tone or compliance rules on generated text. They protect the conversation layer.
Protocol security operates on agent boundaries: who may call which MCP tool, which agent may delegate to which peer, what OAuth scopes attach to a task, and whether a downstream agent may act on a user’s behalf. It protects the action layer.
Runtime policy sits between them: a policy engine that evaluates requests against rules regardless of whether the trigger was natural language or a structured protocol call. It can require human approval before a tool executes, block egress to unknown domains, or deny delegation when scope exceeds the originating user.
My opinion is blunt: guardrails without protocol security produce polite chatbots that still exfiltrate data through a tool call. Protocol security without guardrails produces well-authenticated agents that still follow malicious instructions embedded in an artifact. You need both, plus runtime policy for high-risk actions.
Threat Model for A2A and MCP Agent Systems
Start with assets and adversaries, not with a shopping list of controls.
Assets worth protecting: user data in prompts and artifacts, credentials for MCP servers, production systems reachable through tools, agent reputation, billing accounts tied to token usage, and audit integrity.
Realistic adversaries: external users abusing public agent endpoints, compromised MCP servers returning poisoned tool results, malicious agents misrepresenting skills in Agent Cards, insiders over-delegating authority, and supply-chain tampering with tool metadata that manipulates model behavior.
Malicious or compromised tools (MCP)
An MCP server is code plus data exposed to the model. A hostile server can return misleading tool descriptions, exfiltrate arguments passed by the model, or perform actions beyond what the user intended when the host executes tool calls without scoped credentials.
Malicious or impersonated agents (A2A)
An agent that accepts tasks may be evil, compromised, or simply over-permissioned. Agent Cards describe capabilities; they do not prove identity unless you verify signatures, TLS, and issuer trust.
Confused deputy
Agent B holds permission to access a finance API. Agent A, with lower privilege, asks B to “summarize this invoice” while smuggling a transfer instruction in an artifact. B executes using its own credentials unless delegation scope is enforced end to end.
Over-broad permissions and hidden delegation chains
User approves one step. The orchestrator silently chains three A2A hops and five MCP calls. The user never sees the full graph, but the organization is still accountable for the outcome.
Prompt injection through artifacts and cross-agent messages
Injection is not only a user-message problem. A PDF artifact, a web page fetched by a tool, or a message from Agent C can carry instructions aimed at Agent D’s model. Treat all protocol-carried content as untrusted input at the model boundary.
Poisoned or misleading Agent Cards
Descriptions and skill names are prompt surface area. A card that advertises safe_read_only_analysis while accepting write-capable backends is a social-engineering layer, not a technical guarantee.
Identity Model for Multi-Agent Systems
Protocol security begins with clear identity types and what each one is allowed to prove.
| Identity type | What it represents | Typical proof |
|---|---|---|
| Human user | End user or operator who initiated work | OIDC session, SSO token |
| Agent service | Deployed agent runtime (orchestrator, specialist) | OAuth client credentials, mTLS cert |
| MCP server | Tool provider process | API key, mTLS, scoped service account |
| Task / session | Unit of work spanning hops | task ID, trace ID, delegated scope token |
A2A’s Agent Card advertises supported authentication schemes (OAuth 2.0, API keys, mTLS, and similar patterns aligned with OpenAPI practice) and skills with optional security requirements. The card is discovery metadata, not a trust anchor. Clients obtain credentials out of band and send them in standard HTTP headers on every request; servers must validate on every call and return 401 or 403 when auth or scope fails.
Internal vs external views of the same agent
Production agents often publish a public Agent Card with a limited skill list and a richer authenticated card for internal callers. The A2A specification allows extended cards for authenticated clients. Use that split deliberately: partners should not see internal skills, and internal orchestrators should not rely on public discovery alone for authorization.
Authentication and Authorization for MCP and A2A
Authentication answers who is calling. Authorization answers what they may do.
MCP tool access
For each MCP connection, define:
- which agent host may connect
- which tools are enabled for that host
- which OS user or service account executes side effects
- whether the human user must approve each mutating call
Prefer tool allowlists over “connect everything” MCP configs. A coding agent does not need payroll MCP servers on the same profile as a public support bot.
A2A agent access
For each agent peer relationship, define:
- which caller agent IDs may invoke which skills
- maximum delegation depth
- which artifact types may cross the boundary
- whether user context must propagate as signed claims
Map OAuth scopes (or equivalent) to skills, not to blanket agent admin. Least privilege at the token layer beats hope at the prompt layer.
Gateway-enforced vs per-agent policy
Per-agent policy works when one team owns the whole graph and releases are coordinated. Gateway-enforced policy works when multiple teams, tenants, or vendors share an agent network and you need one place to enforce allowlists, rate limits, and audit.
A2A Gateway as the Control Plane
An A2A gateway is not strictly required by the protocol, but it becomes necessary when agent traffic needs centralized governance.
A gateway typically handles:
- authentication termination and token exchange
- routing to the correct agent service by skill or tenant
- policy checks before tasks are accepted or forwarded
- protocol version negotiation
- rate limiting and abuse detection
- structured audit emission on every task transition
When a gateway is overkill vs necessary
A gateway is often overkill for a single orchestrator and two specialist agents in one Kubernetes namespace maintained by one team. It becomes necessary when partners invoke your agents, when multiple business units share infrastructure, when compliance requires uniform logging, or when you cannot trust every agent implementation to enforce policy correctly.
Pair an A2A gateway with an MCP gateway (or MCP proxy) so tool access receives the same treatment: identity, allowlists, egress controls, and audit at the tool boundary rather than only at the chat UI.
Partner-facing vs internal Agent Cards
Publish different discovery metadata for external and internal callers. External cards expose narrow skills and stricter auth. Internal cards may list maintenance or admin skills but must never be reachable without stronger authentication than the public card implies.
Agent Registry and Discovery Security
Discovery is part of the attack surface. Anyone who controls what agents appear “available” controls where orchestrators send work.
Registry vs well-known Agent Card URLs
Small deployments use well-known URLs per agent (/.well-known/agent-card.json). Enterprise deployments add a registry that indexes agent IDs, versions, endpoints, owners, and policy tags. The registry is a policy object: entries should record which tenants may discover which agents, not only where they live.
Versioning, deprecation, and ownership
Registry records need owners, change history, and deprecation dates. An orchestrator that caches Agent Cards must refresh on TTL and verify signatures where supported. Stale cards are how retired skills keep receiving traffic long after a vulnerability is patched.
Enterprise internal networks vs external partners
Internal agent meshes can rely on mTLS and private DNS. Partner agents need explicit federation rules, contractually scoped skills, and stronger artifact inspection because you do not control their runtime.
Delegation Across Agent Boundaries
Delegation is where A2A security is won or lost. When Agent A sends a task to Agent B, three questions must have crisp answers:
- Whose authority is being exercised? The user’s, A’s service account, or a blended delegated token?
- What is B allowed to do with that authority? Read-only analysis, or mutating tools on A’s behalf?
- Who is accountable if B exceeds scope? A, B, the gateway policy, or the human who approved an unclear prompt?
Propagating user intent vs over-delegation
Pass signed delegation claims that include user ID, original task ID, allowed skills, expiry, and maximum hop count. Downstream agents must reject tasks that expand scope silently. If B needs higher privilege than A held, transition to input_required and obtain explicit human approval rather than upgrading tokens invisibly.
Human-in-the-loop approval flows for risky delegation are covered in A2A Streaming and Async Tasks for Long-Running Agent Workflows where input_required is a first-class task state rather than an error.
Separate reasoning from execution permissions
An agent may need broad read access to plan while write tools sit behind approval. Split credentials or use distinct MCP profiles for planning vs execution so a model mistake cannot immediately mutate production.
Audit Trails and Answer Provenance
If you cannot reconstruct a delegation chain, you cannot explain an incident, pass an audit, or dispute a billing anomaly.
Log at three layers:
Gateway: authentication result, policy decision, routed agent ID, task ID, parent task ID, rate-limit events.
Agent: task state transitions, messages sent/received, model/tool invocations (arguments redacted as needed), artifacts created, delegation outward.
MCP server: tool name, caller agent ID, user context, success/failure, latency, rows affected or resource IDs (policy permitting).
Correlate with trace ID across all layers. Observability for LLM Systems covers instrumentation backends; this article defines what must be captured so those backends have meaningful signal.
Final answer provenance should answer: which user, which orchestrator task, which specialist agents, which tools, which artifacts influenced the text the user saw, and which policy gates fired along the way.
Runtime Policy, Egress, and Secrets
Runtime policy engines (OPA, Cedar, custom rule services) evaluate structured events: “tool X with args Y for user Z.” They complement guardrails because they do not depend on the model behaving well.
Human approval belongs in runtime policy for irreversible or high-cost actions: payments, external email, production config changes, privilege grants.
Egress controls limit which domains MCP servers and agents may call. An agent that can both read secrets and POST to arbitrary URLs is a data-loss waiting to happen.
Secrets never belong in Agent Cards or prompts. MCP hosts should inject short-lived credentials at execution time from a secrets manager. For transport encryption, key management, and baseline infra security patterns, see Architectural Patterns for Securing Data.
Push notification webhooks in async A2A flows need the same rigor: verify sender identity, reject stale events, and never treat a webhook payload as authorization on its own.
Reference Security Architecture
The following diagram summarizes a production-oriented layout for A2A outside, MCP inside deployments at scale.
The orchestrator sees specialist agents through A2A. Specialists see tools through MCP. Users never receive raw MCP credentials, and partners never receive internal skill surfaces without policy review.
For protocol concepts (Agent Cards, tasks, artifacts), see What Is the A2A Protocol?. For adoption and enterprise framing, see Google A2A Protocol in 2026. For topology when many agents coordinate, see Multi-Agent Orchestration Patterns.
Production Checklist for A2A and MCP Security
Before exposing agent protocols beyond a trusted sandbox, verify:
Identity and auth
- No anonymous agents in production paths
- Every MCP and A2A call authenticated on every request
- OAuth scopes or equivalent mapped to skills/tools, not blanket admin
- Public vs authenticated Agent Card views defined intentionally
Delegation and policy
- Delegation tokens carry user ID, task ID, scope, expiry, hop limit
- Downstream agents reject scope expansion without explicit approval
- High-risk tools require runtime policy or human approval
- Reasoning and execution use separate credentials where possible
Discovery and registry
- Agent registry entries have owners and version history
- Agent Cards refreshed on TTL; signatures verified where supported
- Partner agents federated with explicit skill allowlists
Audit and observability
- Gateway, agent, and MCP layers emit correlated audit events
- Delegation chains logged with parent and child task IDs
- Artifact provenance recorded for final answers
- Trace IDs connect to observability backends
Abuse and resilience
- Rate limits per user, agent, and tenant
- Timeout policies on delegated tasks
- Egress allowlists on tool servers
- Secrets in a manager, not in cards, prompts, or repos
Conclusion
A2A and MCP interoperability is powerful because agents and tools can compose across team and vendor boundaries, but that power is unsafe without identity, authorization, delegation limits, and audit design. Guardrails protect the model conversation; protocol security protects the actions agents take on behalf of users.
Treat Agent Cards as advertisements, delegation as a signed contract, MCP tools as privileged code execution, and audit logs as the evidence chain you will need when something interesting happens at 2 a.m.
Build the gateway when governance needs a single throat to choke. Split credentials before you split agents. Log every hop so the answer “the model decided” is never the final incident report.
Frequently Asked Questions
What is the difference between LLM guardrails and A2A MCP agent security? Guardrails constrain model input and output. Protocol security constrains who may invoke tools, delegate tasks, and act on whose behalf across MCP and A2A with identity, authorization, and audit trails.
How should agent identity work in an A2A deployment? Separate human, agent service, and task identities. Validate credentials on every request, use scoped tokens, and treat Agent Cards as discovery metadata rather than proof of trust.
What is the confused deputy problem in multi-agent systems? It occurs when a privileged agent or tool performs a sensitive action because a less privileged caller smuggled instructions through delegation or artifacts. Enforce scope at every hop.
Do you need an A2A gateway in production? Single-team internal deployments may enforce policy per agent. Multi-tenant, multi-vendor, or partner-facing networks usually need a gateway for centralized auth, routing, rate limits, and audit.
What should an A2A MCP audit log contain? User ID, agent ID, task ID, parent task ID, tool calls, policy decisions, artifacts, and timestamps correlated with trace IDs across gateway, agent, and MCP layers.
Sources
- A2A Protocol – Enterprise-ready security topics: https://github.com/a2aproject/A2A/blob/main/docs/topics/enterprise-ready.md
- A2A Protocol – Specification overview: https://a2a-protocol.org/latest/specification/
- A2A Protocol – Streaming and push notification security: https://a2a-protocol.org/latest/topics/streaming-and-async/