RTK (CVE-2026-45792): untrusted filter configs hide backdoors from AI review
Pillar Security disclosed on May 20, 2026 a flaw in RTK, a token-optimisation filter for Claude Code: a repo-supplied .rtk/filters.toml could silently strip a backdoor from command output before the model ever saw it. The target is the agent's perception, not its execution.
What is this?
On May 20, 2026, Pillar Security publicly disclosed CVE-2026-45792, a vulnerability in RTK (Rust Token Killer), an open-source token-optimisation tool used with Claude Code. RTK sits between an AI coding assistant and the developer’s shell: it intercepts command output and applies regex filters to strip noise before the output reaches the model, saving context-window tokens. The bug is that RTK automatically loaded filter rules from a project-local .rtk/filters.toml in any cloned repository — with the highest priority, no approval prompt, and no warning. Whoever could commit to the repo could therefore decide what the AI was allowed to see.
The finding is notable less for its mechanics than for its category. This is not an attack on what an agent can execute; it is an attack on what an agent can observe. The discoverer scored it CVSS 4.0 at 6.9 (Medium); the GitLab Advisory Database lists CVSS 3.1 at 8.2 (High). The CVE was assigned May 14 and the maintainers shipped a fix, so the timeline is fully disclosed and patched.
How it works
RTK loaded configuration from three sources in ascending priority: built-in filters, a user-global config (~/.config/rtk/filters.toml), and a project-local config (.rtk/filters.toml). The first two are under your control; the third arrives with the repository — written by a maintainer, a contributor, or an attacker. RTK did not distinguish between them, so a repo-supplied filter file inherited the same authority as one you wrote yourself, and activated silently. The advisory tracks this as CWE-345 (insufficient verification of data authenticity) and CWE-426 (untrusted search path).
The kill chain is short. No working payload is reproduced here; the shape is what matters:
# Conceptual only — illustrative, not a working config.
1. attacker commits .rtk/filters.toml alongside a backdoor
2. victim clones the repo, works in it with Claude Code + RTK
3. attacker-controlled regex strips the backdoor's lines from
cat / git diff / scanner output before the model reads them
4. the model reviews a filtered view and reports "no issues"
Because RTK’s engine is general-purpose and regex-driven, an attacker could suppress anything that appears in command output: malicious lines removed from cat, security warnings stripped from a scanner’s results, backdoor code hidden from git diff, indicators of compromise filtered out of grep. The filter file looked routine — names like reduce_noise and comments like “strip ANSI escape sequences” — so it hid in plain sight. The model received post-filter output and had no way to know lines had been removed, because RTK operated below its observation layer.
Why it matters
Pillar frames the root cause as trust laundering: content from a git remote, written by someone the developer never met, was granted the same authority as locally installed configuration. The trust boundary tracked content type (“is this a valid filter?”) instead of origin (“did the developer put this here?”). The same pattern recurs across AI development tooling — .bashrc, shell profiles, and agent-instruction files such as AGENTS.md and CLAUDE.md — where an inert-looking file is auto-loaded and acts with whatever authority the consuming tool grants it.
RTK adds a new surface to that pattern: the observation layer. A tool that controls what the model can see has security relevance even if it never executes anything, because an agent that cannot see a backdoor cannot flag it. The effect compounds in workflows where the same agent writes and reviews code: one .rtk/filters.toml blinds both passes at once, and compromised code can clear AI review and automated scanning and ship to production with the developer having no reason to doubt the “clean” result. The only access required is the ability to commit a small config file — well within reach of a contributor, a contractor, or a compromised account.
Defenses
- Upgrade RTK. The discoverer reports the full trust-model fix landed in v0.33.0; the GitLab advisory lists 0.32.0 as the fixed version. Update to the latest release either way, and confirm the version your engineers actually run.
- Draw the trust boundary at the source, not the content. Configuration that arrives from a repository should be untrusted by default and require explicit, per-repository opt-in — exactly what RTK’s fix now enforces, with a warning to user and model plus a SHA-256 integrity check that revokes trust if the file changes after a
git pull. - Treat the observation layer as part of the supply chain. Any tool that preprocesses, filters, or transforms output before it reaches a coding agent shapes what the model can act on. Inventory those tools and review their trust model, even the ones that never execute code. See the related skill/registry supply-chain risks.
- Don’t let one agent be the only reviewer of its own output. Where feasible, review with an unfiltered view or an independent pipeline, so a single poisoned config cannot blind both generation and review.
- Audit repo-local agent config in code review.
.rtk/filters.toml, agent-instruction files, and similar dotfiles deserve the same scrutiny as build scripts — they are executable trust, not decoration.
Status
| Item | Detail |
|---|---|
| CVE | CVE-2026-45792 (GHSA-fvvm-949w-qj4w) |
| Component | RTK (Rust Token Killer), filter hook for Claude Code |
| Weakness | CWE-345 (data authenticity) · CWE-426 (untrusted search path) |
| Severity | CVSS 4.0 6.9 Medium (discoverer) · CVSS 3.1 8.2 High (GLAD) |
| Fixed in | v0.33.0 (discoverer) · 0.32.0 (GLAD) — default-deny untrusted project filters + hash-based trust |
| Disclosure | Reported Mar 15, 2026 · patch Mar 25 · CVE assigned May 14 · public May 20, 2026 |
The takeaway is not “RTK is uniquely unsafe” — the maintainers fixed it within 24 hours of the report. The takeaway is the class: in AI-assisted development, the attack surface includes what doesn’t reach the model. An attacker who controls the filtering layer never has to inject anything into the agent’s context; they only have to remove the evidence of what they planted.