DEFENSE MEDIUM NEW

Cognitive Firewall: a split-compute defense for browser agents

A March 2026 eBay paper layers an on-device sentinel, a cloud planner and a deterministic execution guard to cut indirect prompt injection in browser agents from 100% to under 1%.

2026-06-22 // 6 min affects: browser-agents, gemini-nano, llama-3, gpt-4, llm-agents

What is this?

Browser agents read a page’s DOM, plan, and then act — clicking, posting, deleting, sending. Because the system prompt, the user’s instruction and the untrusted web content all land in the same context window, an attacker who controls page content can inject instructions the agent will follow. This is indirect prompt injection (IPI), ranked LLM01 in the OWASP Top 10 for LLM applications.

The Cognitive Firewall, a paper by Qianlong Lan and Anuj Kaul of eBay Inc published on arXiv as arXiv:2603.23791 on 24 March 2026, proposes a defense built around a simple admission: no single probabilistic check will catch every injection, so the consequences of a missed one must be contained deterministically. It is a “split-compute” architecture that spreads three checks across the browser and the cloud, and it reduces attack success on the authors’ benchmark from 100% (no defense) to 0.88%.

How it works

The design is a “defense funnel”: cheap checks run first at the edge, and only ambiguous cases escalate to the cloud. Three layers map to three attack classes.

Layer 1 — the Edge Sentinel runs in the browser on a small on-device model (Gemini Nano via Chrome’s built-in AI API). It enforces visual consistency: the agent must not reason over text the user cannot see. It compares the DOM source against the computed style (CSSOM) and blocks presentation-layer tricks — opacity: 0, zero-size fonts, left: -9999px off-screen text — while sparing legitimate accessibility markup. This runs in roughly ~0.06 ms, so blocked payloads never reach the cloud and never cost tokens.

Layer 2 — the Deep Planner runs server-side on a larger model (the prototype used Llama 3 8B or GPT-4) under a dedicated “security analyst” prompt. It enforces goal integrity, catching semantic manipulation that the edge model misses: role-play (“you are now in Developer Mode”), fabricated delimiters like --- END SYSTEM INSTRUCTIONS ---, and manufactured urgency.

Layer 3 — the Origin Guard is a deterministic, model-free execution monitor. It enforces execution safety via a synchronous JavaScript interceptor that validates every outbound action against an origin allowlist and a verb policy before dispatch. Even if the planner is fully fooled, a DELETE or an exfiltration request to a non-whitelisted origin is blocked. The authors call this fail-closed: ambiguity in a plan is grounds for rejection.

Why it matters

The ablation makes the layering case concretely. On 1,000 mixed benign-and-adversarial samples, edge-only filtering left an 86.9% attack success rate; edge plus cloud still missed hijacking at 39.2%; the full three-layer stack reached 0.88%, with a ~99.1% interception rate. Against an adaptive white-box attacker who knows the pipeline, aggregate success was 0.67% (2 of 300). DOM-fragmentation attacks that split a payload across single-character nodes slipped past the edge but were reconstructed by the cloud planner (95% of the time) and stopped at the guard.

The wider lesson is architectural: deterministic enforcement at the execution boundary can backstop probabilistic language models. As browser agents take on state-changing actions, “the model decided not to” is not a control — “the request was blocked before it left the device” is.

Defenses

The paper is itself a defense blueprint; the takeaways generalize beyond the prototype:

Separate the control plane from the data plane. Treat all web content the agent ingests as untrusted, and don’t let it redefine the user’s goal. The same instinct behind Site Isolation, applied to the context window.
Gate side effects deterministically. Put a model-free interceptor between plans and actions. Allowlist origins, constrain HTTP verbs to the declared intent (a read-only task should never issue a DELETE or a parameterized outbound GET), and fail closed on anything ambiguous.
Filter presentation-layer tricks at the source. Compare rendered output against raw DOM and drop invisible or off-screen text before it enters the prompt — cheaply, on-device, before any cloud call.
Don’t trust a small on-device model as a semantic judge. The edge layer missed 86.9% of semantic jailbreaks; it is a fast pre-filter, not the safety mechanism. Escalate hard cases to a stronger model.
Add a human for high-stakes or ambiguous actions. The residual failures were “benign-wrapping” attacks (2.0%) that talked the planner into a permissive mode, plus a 1.7% false-positive rate on legitimate tasks — both arguments for an interactive confirmation step rather than silent allow/deny. This echoes the broader debate on whether firewalls alone are enough or stronger benchmarks are needed.

Status

Item	Reference	Notes
Paper	arXiv:2603.23791	Lan & Kaul, eBay Inc, 24 Mar 2026
Architecture	Cognitive Firewall — Sentinel / Planner / Guard	Split-compute, defense-in-depth, fail-closed
Edge model	Gemini Nano (Chrome built-in AI)	~0.06 ms, blocks visual obfuscation
Cloud model	Llama 3 8B / GPT-4 (prototype)	Security-analyst prompt, semantic checks
Result	ASR 100% → 0.88% static, 0.67% adaptive	N = 1,000; ~99.1% interception
Known limits	Image-based injection bypasses Layer 1; 1.7% false positives; ~950 ms full-pipeline latency	Prototype, not real-world traffic

The takeaway: browser agents collapse code and data into one token stream, so semantic checks will always be probabilistic and occasionally wrong. The Cognitive Firewall’s contribution is to stop treating that as the last line of defense — and to put a deterministic, fail-closed guard at the point where reasoning turns into a real-world action.