INDIRECT INJECTION MEDIUM

Discourse AI XSS (CVE-2026-27740): when LLM output is trusted as HTML

A flagged post, an AI moderator, an htmlSafe call. The Discourse AI plugin treated LLM output as trusted markup, turning indirect prompt injection into Staff-side XSS. Published March 19, 2026.

2026-05-26 // 6 min affects: discourse-ai, discourse < 2026.3.0-latest.1, discourse < 2026.2.1, discourse < 2026.1.2

What is this?

On March 19, 2026, the Discourse team published GHSA-95hc-42c6-wvvr and assigned CVE-2026-27740 — a stored cross-site scripting flaw in the Discourse AI plugin, triggered through indirect prompt injection of an LLM moderator. The bug landed in NVD that same day with a CVSS 4.0 score of 5.1 (MEDIUM) and is fixed in Discourse 2026.3.0-latest.1, 2026.2.1 and 2026.1.2.

The shape of the bug is small and instructive: an AI triage automation reads a flagged post, asks an LLM to summarise the reason, and renders that summary in the Staff Review Queue. The renderer calls htmlSafe on the LLM’s response. An attacker crafts a post designed to manipulate the model into returning a <script> tag inside its “reason”, and the payload fires the next time a moderator opens the queue. This is OWASP LLM05 — Improper Output Handling — in the wild (OWASP LLM Top 10).

How it works

Three actors are in play: the attacker who writes a post, the AI triage job that calls an LLM, and the Staff member who opens the Review Queue.

[Attacker post]
   │ contains content designed to manipulate the LLM's "reason" field
   ▼
[Discourse AI triage automation]
   │ llm_triage.rb — sends the post to the configured LLM
   ▼
[LLM response]
   │ contains attacker-controlled markup, e.g. a script-style payload
   ▼
[I18n template "discourse_automation.scriptables.llm_triage.flagged_post"]
   │ interpolates llm_response and automation_name as `htmlSafe`
   ▼
[Review Queue UI in Staff browser]
   │ payload executes in an authenticated admin session
   ▼
[Session token / admin actions / config tampering]

The root cause is documented in the patch commit 44b84439. Before the fix, plugins/discourse-ai/lib/automation/llm_triage.rb passed the LLM’s reply straight into the translation call:

I18n.t(
  "discourse_automation.scriptables.llm_triage.flagged_post",
  base_path: Discourse.base_path,
  llm_response: result,                 # raw LLM output
  automation_name: automation&.name.to_s
)

The flagged_post template rendered those values with htmlSafe, which is Rails’ explicit opt-out from HTML escaping. The patch wraps both fields with ERB::Util.html_escape:

llm_response: ERB::Util.html_escape(result),
automation_name: ERB::Util.html_escape(automation&.name.to_s),

Equivalent fixes ship in plugins/discourse-ai/lib/personas/tools/flag_post.rb and plugins/discourse-ai/lib/agents/tools/flag_post.rb. No exploit payload is reproduced here; the public commit diff is the canonical reference for defenders.

The user interaction (UI:P in the CVSS vector) is the Staff member opening the queue — which they are paid to do. That is what makes a CVSS 5.1 “medium” feel heavier than its score in practice: the trigger is part of the moderator’s normal workflow.

Why it matters

Two patterns make this CVE worth attention beyond Discourse itself.

The first is the trust boundary mistake. The Discourse AI plugin treated the LLM as an internal, trusted component — the same way you would trust a server-side template — and so its output was rendered as markup. But the LLM’s output is a function of attacker-controlled input. Every byte of an LLM response inherits the trust level of the most adversarial token in its context. The right mental model is the one the OWASP LLM project pushes in LLM05: an LLM is an untrusted user whose output must be sanitised for every downstream sink — HTML, SQL, shell, file paths, URLs. The Discourse codebase is otherwise careful about this on the post-content path, where Markdown is sanitised by an allow-list; the AI feature simply skipped that step because the data “came from us”.

The second is the moderator-as-target twist that comes with AI triage features generally. Every product that ships an “LLM reviews user content for the admin” workflow inherits the same delivery vehicle: the attacker writes content that the admin’s tooling renders inside the admin’s own session. The same pattern has appeared in customer-support copilots, security-orchestration AI summarisers, and SOC ticket triage. If your application has an LLM summarising untrusted input into an admin-facing UI, you have the precondition for this class of bug. Treat the audit of those rendering paths as a P1.

Defenses

The fix Discourse shipped is the floor, not the ceiling. The general lessons are larger than one plugin.

Escape LLM output at every HTML sink. Treat htmlSafe, dangerouslySetInnerHTML, v-html, bypass_sanitize and friends as security-sensitive primitives that require a written justification. For LLM output specifically, the safe default is the framework’s standard HTML escape (ERB::Util.html_escape, Rails.html_safe? checks, React’s default text rendering, Django’s autoescape). If you need rich text from an LLM, pass it through an allow-list sanitiser (Sanitize gem, DOMPurify, bleach) — never raw.
Enable Content Security Policy and report violations. Discourse enables CSP by default since 2.2, which materially constrains what a <script> injected into the Review Queue can do. For any AI-facing admin UI, a strict CSP without unsafe-inline and with report-uri is the second line of defence when a sink slips through. CSP violation reports also give you the detection signal called out in SentinelOne’s advisory writeup.
Constrain the LLM’s output shape. AI triage does not need to emit HTML. Force the model into structured output — JSON schema, Pydantic / Zod validation, OpenAI structured outputs, Anthropic tool-use — and reject responses that contain HTML or markup tokens before they reach a renderer. A reason: string field validated against a regex of allowed characters would have killed this CVE upstream of the templating layer.
Threat-model every “LLM summarises user content for admin” path. Inventory the AI features your product ships, list the LLM input sources (user posts, emails, tickets, web fetches), and list the rendering sinks (admin panels, Slack notifications, email digests, SIEM enrichment). Each (source, sink) pair is a candidate for the same bug. Map it, decide on the sanitisation contract, and write a regression test that asserts a payload-like string round-trips as text, not markup.
Patch and rotate. If you run Discourse with the AI plugin enabled, update to one of the patched releases (2026.3.0-latest.1, 2026.2.1, 2026.1.2). If you delayed the upgrade, the temporary workaround in the advisory is to disable AI triage automation scripts. Treat moderator session tokens issued before the patch as potentially exposed if any flagged post was opened in the meantime — Discourse’s GHSA page describes the rotation flow.
Read the patch, not the score. The CVSS 5.1 reflects the low-privilege precondition and required user interaction, both of which are inherent to a moderation workflow. The realistic blast radius is “any admin who does their job opens the queue”, which is closer to a session-takeover bug than a typical medium-severity XSS. Use the patch commits — 44b84439, 8ae7cb24, ed70949f — as your reference for grep-equivalent reviews of your own LLM-output rendering paths.

Status

Item	Reference	Date	Notes
CVE published	NVD	2026-03-19	CVE-2026-27740
GitHub Security Advisory	Discourse	2026-03-19	GHSA-95hc-42c6-wvvr
Patched releases	Discourse	2026-03	`2026.3.0-latest.1`, `2026.2.1`, `2026.1.2`
CVSS 4.0	NVD	2026-03-19	5.1 MEDIUM, CWE-79
Vulnerable code paths	Patch commits	2026-03	`llm_triage.rb`, `flag_post.rb` (personas + agents)
Workaround	Advisory	2026-03-19	Disable AI triage automation

The right summary of CVE-2026-27740 is not “Discourse had an XSS bug” — that part is routine. It is “an LLM was treated as a trusted server-side component, and its output went through htmlSafe”. The same trust mistake exists in any product that lets a model write into an admin-facing render path. The Discourse patch is a useful litmus test: if you cannot point at the line in your own codebase where LLM output gets HTML-escaped before reaching the browser, you have not finished fixing this class of bug.