Cisco: Frontier AI Models Fail Multi-Turn Attacks, Benchmarks Miss It

Cisco's AI threat intelligence team published research this week showing that nearly every closed frontier AI model — across OpenAI, Anthropic, Google, Amazon, and xAI — collapses under realistic multi-turn attacks. The published safety benchmarks vendors lean on, the researchers say, miss almost all of this behavior.

The findings, released May 27, raise a serious question for enterprise buyers: the safety scores cited in vendor marketing don't predict how a model actually holds up in real attack conditions.

What Cisco Tested

The Cisco team evaluated 15 closed flagship models from the five biggest AI labs. The test scale was substantial: 30,090 single-turn prompts and 6,986 multi-turn attacks across 1,456 conversations.

Single-turn attacks are what most published safety benchmarks measure — one prompt, one judgment of whether the model refused appropriately. Multi-turn attacks are different. Attackers build context over multiple messages, adopt personas, paraphrase rejected requests, and escalate gradually. That's how real-world adversaries operate.

The gap between the two attack styles turned out to be enormous.

The Numbers Are Brutal

Multi-turn attack success rates ranged from 7.89% to 88.30% across the 15 models tested. A few specific results:

OpenAI's GPT-5.4: 2.74% single-turn attack success → 24.68% multi-turn (nearly 10x increase)
Google's Gemini 3 Pro: 18.10% → 73.35% (4x increase)
Anthropic's Claude: passed 97% of single-turn tests but still failed 16% of multi-turn attacks

Why this matters: a model that looks 97% safe on the standard benchmark can fail 1 in 6 multi-turn attacks. For enterprises deploying AI in customer-facing or sensitive contexts, that's not a rounding error — it's a regulatory and reputational problem waiting to happen.

The Attack Techniques

The Cisco researchers cataloged the techniques attackers used. Single-turn attack procedures produced weighted success rates of 37.5% for "Imposter AI" (impersonating another AI system), 29.2% for soft paraphrase attacks (gently rewording requests until they slip through), and 27.7% for system-prompt attacks (trying to override the underlying instructions).

But it was the multi-turn approaches that did the real damage. Attackers reframe their request mid-conversation, build context over several turns, adopt a sympathetic persona ("I'm a researcher studying harm"), and escalate in small steps. Each individual turn looks fine. The aggregate doesn't.

This pattern matches what red teamers have been saying informally for over a year: single-prompt jailbreaks are mostly patched. The way you actually break a frontier model now is by having a conversation with it.

Why the Benchmarks Miss This

Most public AI safety benchmarks — including the ones vendors quote in safety cards and system cards — measure single-turn behavior. That's a methodology choice, not a comment on the underlying threat. Single-turn is cheaper to grade, easier to standardize, and faster to run.

Cisco's argument is that this methodology gives a misleading picture of how safe a model is in deployment. The research explicitly calls out that the gap between published scores and observed multi-turn resilience is wide enough to misrank leading models — meaning the "safest" model on the leaderboard might not actually be the safest in practice.

The researchers concluded that multi-turn vulnerability is a structural property of current frontier AI, not an artifact of any particular lab's training choices. In other words: nobody has solved this.

What This Means for Enterprises

For the enterprises piling into AI deployment in 2026 — financial services, healthcare, legal — the Cisco research lands at an uncomfortable moment. Most procurement teams currently lean on vendor safety reports and public benchmarks when picking a model. Those signals are, per Cisco, materially weaker than they look.

Practically, security teams should be asking vendors:

What's your multi-turn attack success rate, not just single-turn?
Do you have red-team data on persona attacks and gradual escalation?
How is the deployed system (not just the base model) tested under adversarial conversation?

The good news: defenses do exist. Conversation-level monitoring, guardrail systems that look at the trajectory of a chat rather than individual messages, and rate-limited persona challenges can all help. They're just not part of most baseline AI deployments.

What Industry Watchers Are Saying

Security industry coverage has been blunt. SiliconANGLE summarized the report as finding that "no closed frontier AI model is safe from multi-turn attacks." Network World framed it as Cisco showing standard AI safety benchmarks "miss the real threat."

Within AI safety circles, the research is being received as confirmation of what red teamers already suspected. Researchers have warned for months that single-turn benchmarks were the easy version of the problem and that real attack patterns look conversational.

What's Next

Cisco's data is likely to feed into the next round of AI safety benchmarking work, and pressure on vendors to publish multi-turn attack success rates will probably grow. Expect at least some of the major labs to respond — either by publishing their own multi-turn numbers or by funding independent third-party evaluations.

Regulators are paying attention too. The EU AI Act's high-risk system requirements will eventually need adversarial testing standards, and "multi-turn resilience" is the obvious gap to close.

The Bottom Line

The takeaway for anyone building on top of frontier AI: don't trust single-turn safety numbers as a proxy for deployment safety. Real attackers don't send one prompt — they have a conversation. And per Cisco's data, every major frontier model is meaningfully more vulnerable to that than the public benchmarks suggest.

If your AI deployment depends on the model refusing harmful requests, you need conversation-level monitoring, not just a good safety score.

Cisco: Frontier AI Models Fail Multi-Turn Attacks, Benchmarks Miss It

Cisco: Frontier AI Models Fail Multi-Turn Attacks, Benchmarks Miss It

What Cisco Tested

The Numbers Are Brutal

The Attack Techniques

Why the Benchmarks Miss This

What This Means for Enterprises

What Industry Watchers Are Saying

What's Next

The Bottom Line

Sources

Don't fall behind

Related Articles

Anthropic Launches Claude Science and Enters Drug Discovery

AI Uncovers Squidbleed, a 29-Year-Old Squid Proxy Bug

Anthropic Launches Claude Fable 5: Its Most Capable Model Yet