The agent that writes its own logs: why self-reported agent audit trails can't be trusted
If a compromised agent produces its own activity log, it can omit, alter, or fabricate what it did. Three June 2026 efforts — arXiv's Notarized Agents, an IETF agent-audit-trail draft, and SCITT — converge on the same fix: move the trust boundary off the agent.
In brief When an AI agent records its own audit trail, the thing being logged and the thing writing the log are the same process — so a compromised or buggy agent can quietly drop, edit, or invent entries, and the operator has no independent way to notice. A paper published 2 June 2026, Notarized Agents (arXiv 2606.04193), names this structural flaw and proposes inverting the trust boundary: let the service that receives the agent’s call sign a receipt of what it observed. The same week’s IETF agent-audit-trail draft and the maturing SCITT transparency-log work point the same direction. This is a defensive and governance gap, not an exploit — but it undermines every after-the-fact investigation that assumes the agent’s log is true.
What is this?
Agent observability is usually built on a quiet assumption: the agent emits a trace of its own tool calls, and we trust that trace when something goes wrong. Notarized Agents (Juan Figuera, arXiv 2606.04193, submitted 2 June 2026) states the problem plainly: “the entity producing the activity log is the same entity whose activity is being logged.” If an attacker takes over the agent — or the operator running it — the log becomes whatever they want it to say. Omitted exfiltration, rewritten arguments, fabricated approvals: all are invisible to anyone reconstructing events later.
This matters now because regulation is about to lean on those logs. The IETF draft-sharif-agent-audit-trail-00 (Raza Sharif) notes that the EU AI Act (Regulation 2024/1689) mandates automatic event recording for high-risk AI systems effective August 2026, and maps its format to SOC 2, ISO/IEC 42001, and PCI DSS v4.0.1. An audit obligation is only as good as the integrity of what it audits.
How it works
The weakness is not a payload; it is a trust topology. A self-logging agent sits on both sides of the boundary:
# Self-reported trail (today's default): one writer, no witness
agent --> tool call --> [agent writes log entry] --> store
^ |
\---- same process controls both ----/
# A compromised agent simply does not write the incriminating line.
Three June-2026 designs converge on moving the writer off the agent:
- Receiver-side attestation. In Notarized Agents, the protocol Sello has the service that receives a call sign a receipt of what it saw, encrypt it (HPKE) to the agent owner’s public key bound to the authorization token via JWS, and publish it to a witness-cosigned Merkle transparency log. The owner later reconstructs a tamper-evident trail without trusting the agent or its operator. The authors are explicit about residual limits — a suppression attack, service collusion, and an adoption-incentive problem.
- Hash-chained records. The IETF agent-audit-trail draft links JSON records with SHA-256 hash chaining (per RFC 8785) plus optional ECDSA signatures, so a deleted or altered middle entry breaks the chain.
- Append-only transparency. SCITT generalises the pattern: signed statements committed to an append-only log that issues verifiable receipts.
The common move is the same one Certificate Transparency made for the web PKI: stop asking the actor to vouch for itself, and anchor evidence somewhere it cannot silently rewrite.
Why it matters
Most agent security debate is about preventing bad actions — prompt injection, the lethal trifecta, tool-argument validation. Audit-trail integrity is about what happens after: incident response, forensics, compliance, and liability all assume you can reconstruct what the agent actually did. If that record is self-attested, a single agent compromise poisons every downstream investigation, and “the logs show nothing happened” becomes meaningless. With high-risk-system logging duties arriving in August 2026, the gap moves from academic to regulatory.
Defenses
- Treat the agent’s self-report as untrusted by default. Anchor critical evidence where the agent cannot rewrite it — a write path the agent process does not control.
- Log at the receiver, not just the caller. Have tool servers, MCP servers, and downstream APIs record what they observed (caller identity, arguments, outcome), independently of the agent’s own trace, so the two can be cross-checked.
- Make tampering detectable. Hash-chain records (SHA-256, RFC 8785) and sign them; a broken chain or missing signature is a hunting signal. This is cheap and available today without adopting a full protocol.
- Append-only, off-host storage. Ship logs to a sink the agent and its operator can’t delete from (append-only object storage, a SIEM, or a transparency service). Controlling the write path is controlling the truth.
- Track the standards, don’t hand-roll crypto. Follow SCITT, the IETF agent-audit-trail draft, and receipt-protocol work (Signet, SCITT, Sello) rather than inventing bespoke notarisation — and remember none fully closes suppression or collusion.
Status
| Item | Reference | Date | Notes |
|---|---|---|---|
| Notarized Agents / Sello | arXiv 2606.04193 | 2026-06-02 | Receiver-attested receipts; Merkle transparency log; names suppression & collusion limits |
| Agent Audit Trail (AAT) | draft-sharif-agent-audit-trail-00 | expires 2026-09-29 | JSON + SHA-256 hash chaining (RFC 8785), optional ECDSA; maps EU AI Act / SOC 2 / ISO 42001 |
| SCITT architecture | draft-ietf-scitt-architecture | IETF WG | Append-only transparency log, signed statements, verifiable receipts |
| EU AI Act logging | Regulation 2024/1689 | 2026-08 (high-risk) | Mandatory automatic event recording |
The right framing isn’t “add more logging.” It’s that a log written by the actor it describes is a statement of intent, not evidence. The fix — proven by Certificate Transparency and now being ported to agents — is to move the writer off the agent and anchor receipts somewhere it cannot silently edit.