Oracle poisoning: corrupting the knowledge graph an agent reasons over
A paper published on arXiv on May 10, 2026 defines Oracle Poisoning: corrupt the knowledge graph an agent queries at runtime and it reaches wrong conclusions through correct reasoning. Across nine models, trust in poisoned data hit 100% under directed agentic queries.
What is this?
On May 10, 2026, researchers Ben Kereopa-Yorke, Guillermo Diaz, Holly Wright, Reagan Johnston, Ron F. Del Rosario and Timothy Lynar published Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise AI Agent Reasoning (arXiv:2605.09822, cs.CR/cs.AI). They define Oracle Poisoning as an attack class in which an adversary corrupts a structured knowledge graph that an AI agent queries at runtime through tool-use, so the agent reaches incorrect conclusions through correct reasoning.
The distinction from prompt injection is the whole point. Prompt injection tampers with an agent’s instructions; Oracle Poisoning tampers with the data the agent reasons over. The model is never tricked into misbehaving — it faithfully retrieves a fact from a trusted tool and reasons soundly from it. The fact is simply false. This is the same family of integrity problem explored for graph-based retrieval in work like KEPo (knowledge-evolution poisoning of graph RAG, ACM Web Conference 2026), but demonstrated here against a production-scale agentic system rather than a benchmark.
How it works
Many agents now treat a knowledge graph as an authoritative oracle: a tool call returns nodes and edges (entities, relationships, claims), and the agent folds those results into its answer. The paper studies a 42-million-node production code knowledge graph and shows six attack scenarios where an adversary alters graph contents — for example, injecting a fabricated claim that a component is secure.
The evaluation uses real SDK tool-use across nine models from three providers (N=30 per model): the model autonomously invokes a graph query tool and reasons from the results. The findings:
- Directed queries: every tested model accepted poisoned data at 100% once the attacker reached “moderate sophistication” (level L2). In 269 of 270 valid trials the models accepted fabricated security claims.
- Open-ended prompts: trust dropped to 3–55%, which the authors flag as a prompt-framing confound — and report both conditions honestly rather than cherry-picking.
- A sophistication gradient with break points: trust flips from 0% to 100% past a minimum attacker skill level, reframing the question as how much effort an attacker needs, not whether it works.
- Delivery mode is a first-order confound. Evaluating the same payload inline can produce false negatives: GPT-5.1 showed 0% trust when content was tested inline, but 100% under both simulated and real agentic tool-use. Testing a model in a chat box does not tell you how its agent will behave.
No exploit string is needed to understand the lesson, and none is reproduced here: the mechanism is data integrity, not a clever prompt.
Why it matters
Agentic systems increasingly outsource “ground truth” to retrieval layers — knowledge graphs, vector stores, internal wikis — on the assumption that retrieved data is trustworthy. Oracle Poisoning shows that assumption is load-bearing and largely unguarded. If an attacker can write to the oracle, the agent becomes a confident, well-reasoned conduit for the attacker’s claims, and the usual defenses (alignment, instruction hierarchies, prompt-injection filters) never engage because no instruction was injected.
The authors note the attack appears to generalise across the knowledge-graph ecosystem based on analysis of four additional platforms. The practical exposure surface is anywhere an agent has, or can be steered toward, a mutable shared knowledge store — code-intelligence graphs, CMDBs, threat-intel graphs, RAG corpora with write paths.
Defenses
The paper evaluates five defences and is candid that only one is decisive:
- Read-only access control eliminates the direct mutation vector — if agents and untrusted writers cannot modify the oracle, the cleanest attack path closes. Treat the knowledge graph as a privileged datastore with strict write authorization and audit logging.
- The remaining four defences are partial and model-dependent; do not rely on any single one. Layer them.
- Provenance and integrity on graph contents: sign or attribute claims, track who wrote each node/edge, and surface confidence/source to the reasoning step rather than presenting retrieved facts as unconditioned truth.
- Test under real tool-use, not inline. Because delivery mode flips results, security evaluations and red-team exercises must exercise the actual agentic path, or they will report false negatives.
- Constrain trust in retrieved claims: require corroboration for high-impact assertions (e.g., “X is secure”), and keep humans in the loop for decisions that hinge on a single retrieved fact.
Status
| Item | Value |
|---|---|
| Paper | Oracle Poisoning (arXiv:2605.09822) |
| Published | May 10, 2026 |
| Class | Knowledge-graph / oracle data poisoning (distinct from prompt injection) |
| Tested | 9 models, 3 providers, 42M-node production code graph |
| Headline result | 100% trust in poisoned data under directed agentic queries (L2) |
| Strongest defence | Read-only access control (others partial, model-dependent) |
Key date: May 10, 2026 — first empirical demonstration of knowledge-graph poisoning against a production-scale agentic system. This article summarizes publicly available research for defensive purposes and reproduces no attack payload.