RESEARCH MEDIUM NEW

Proprietary Problems: Cisco's 15-model paired-regime study shows single-turn safety scores miss most multi-turn risk

A May 27, 2026 Cisco study of 15 flagship closed models from OpenAI, Anthropic, Google, Amazon and xAI records multi-turn attack success rates of 7.89% to 88.30% — and cross-regime gaps up to 55 percentage points over single-turn baselines.

2026-05-29 // 7 min affects: gpt-5.2, gpt-5.4, claude-opus-4.5, claude-opus-4.6, claude-sonnet-4.5, claude-sonnet-4.6, claude-haiku-4.5, gemini-3-pro, nova-lite, nova-micro, nova-2-lite, grok-4.1-fast

What is this?

On May 27, 2026, Nicholas Conley and Amy Chang of Cisco’s AI Defense team published Proprietary Problems: No Frontier Model Is Multi-Turn Immune, alongside a downloadable full report. The study evaluates 15 closed/proprietary flagship models from OpenAI (GPT-5.2 and the GPT-5.4 family), Anthropic (Claude Opus 4.5/4.6, Sonnet 4.5/4.6, Haiku 4.5), Google (Gemini 3 Pro), Amazon (Nova Lite, Nova Micro, Nova 2 Lite) and xAI (Grok 4.1 Fast, in reasoning and non-reasoning configurations) under a paired single-turn versus multi-turn regime. It extends Cisco’s earlier Death by a Thousand Prompts (November 2025), which covered eight open-weight models.

The finding is structural: published single-turn attack-success-rate (ASR) numbers — the basis of most model cards, safety reports and procurement decisions — are not a reliable proxy for what an adaptive attacker achieves across turns. Every model in the cohort failed a non-trivial fraction of multi-turn attacks.

How it works

The harness fires a fixed corpus at each model under identical conditions: 30,090 single-turn prompts (2,006 per model) and 6,986 multi-turn attacks across 1,456 conversations. Attack strategies are grouped into five families that match how real adversaries iterate: Role-Play / Persona Adoption, Contextual Ambiguity / Misdirection, Refusal Reframe / Redirection, Information Decomposition & Reassembly, and Crescendo / Incremental Escalation. The Cisco Integrated AI Security and Safety Framework taxonomy is then applied for downstream slicing.

The headline numbers are paired so the same model can be read on both axes:

Model	Single-turn ASR	Multi-turn ASR	Gap
Grok 4.1 Fast (non-reasoning)	high	88.30%	very wide
Gemini 3 Pro	18.10%	73.35%	+55.25 pp
GPT-5.4	2.74%	24.68%	~9×
Claude family (Opus/Sonnet/Haiku)	2.19% – 3.64%	11.16% – 16.20%	~4-5×
Grok 4.1 Fast (reasoning on)	—	43.47%	—
Nova 2 Lite	34.05%	7.89%	−34.74 pp

Two patterns stand out. First, the rank order changes between regimes: a model with the cleanest single-turn score can sit in the middle of the multi-turn pack, and vice versa. Eight of the 15 models exceed an absolute cross-regime gap of 15 percentage points, in both directions. Second, deployment-time configuration moves the needle by tens of points: turning Grok 4.1 Fast’s reasoning mode on cuts multi-turn ASR roughly in half — a swing that is not currently documented on any public benchmark or model card the authors are aware of.

Failure is concentrated on a few tactical surfaces. Cisco reports Imposter AI procedures at 37.50% weighted ASR, Soft Paraphrase at 29.21% and System Prompts at 27.69%. On the content side, Hate Speech, Profanity and Specialized Advice dominate.

Why it matters

The study formalises an intuition that has been drifting through red-team write-ups for two years: alignment that holds under a single prompt does not necessarily hold under iterative pressure. The Cisco numbers are consistent with the academic literature — including the TrustNLP 2025 result of a 71% increase in vulnerability after five-turn conversations versus single-turn evaluation — and with Cisco’s own open-weight study, where multi-turn ASR ran 2× to 10× higher than single-turn baselines and reached 92.78% on Mistral Large-2. Taken together, multi-turn vulnerability looks like a property of the current frontier, not of any one alignment philosophy or weight-availability choice.

For procurement, governance and assurance, the practical consequence is that a model card reporting 2.74% single-turn ASR is not the same product as one holding the line at 24.68% multi-turn ASR — and without paired data the two are indistinguishable. The NIST AI Risk Management Framework, the draft NIST Cyber AI Profile (IR 8596) and Article 15 of the EU AI Act all call for adversarial robustness testing, but none currently specify the interaction regime, strategy decomposition or slice-support labelling the Cisco data suggests is necessary.

Defenses

Cisco translates the results into three procurement-friendly rituals that require no new tooling:

Publish ASR by strategy family on every model release, alongside the headline number. Aggregate multi-turn ASR hides actionable per-strategy variation.
Gate deployments on the top-3 procedures and top-3 content types (Imposter AI, Soft Paraphrase, System Prompts; Hate Speech, Profanity, Specialized Advice) with a 3-percentage-point regression threshold, calibrated above the largest single-turn 95% confidence half-width in the cohort.
Flag any model with a >15 pp absolute cross-regime gap for manual review. In this cohort the rule surfaces eight of 15 models, including GPT-5.4, Gemini 3 Pro, both Grok configurations and all three Nova variants.

At the system level, the authors’ conclusion is that if no base model is iteratively safe, the security perimeter has to move outside the model: runtime guardrails, monitoring, application-layer policies, persona and intent classifiers on follow-up turns, and red-teaming that explicitly exercises Crescendo-style escalation rather than only single-shot prompts.

Status

The study is industry research, not a CVE. There is no patch to apply. The actionable signal is procurement and evaluation: any vendor benchmark presented to a buyer should now be expected to carry paired single-turn/multi-turn numbers and strategy-family decomposition. Cisco’s LLM Security Leaderboard publishes adversarial signals against frontier models in this format; the full Proprietary Problems PDF includes model-level confidence intervals and the strategy × model heatmap referenced above.