NSA Tests Anthropic's Mythos AI to Find Microsoft Security Flaws

The National Security Agency is running Anthropic's flagship AI model, Claude Mythos, against Microsoft software to hunt for cybersecurity vulnerabilities — and according to a Bloomberg report citing a U.S. official and a second person familiar with the matter, the agency's cyber team has been "impressed" with what the model can do.

It's the most concrete signal yet that frontier AI is moving from research demos into the operational core of how the U.S. government finds and fixes software flaws.

What the NSA is actually doing

The NSA was one of roughly 40 organizations Anthropic granted early access to Mythos, the same model that currently leads the GPQA Diamond reasoning benchmark with a 94.6% score. The agency's cybersecurity directorate has been pointing Mythos at popular software — Microsoft products are the named example — and benchmarking its bug-finding output against the sovereign cyber tools the NSA already uses.

The work is not a one-off experiment. It's a structured comparison. NSA staff feed code and binaries into the model, ask it to identify potential security flaws, and then measure how its findings stack up against the agency's existing pipeline. The early read, per Bloomberg's sources: Mythos is fast, efficient, and surfaces leads at a scale that would be hard to match with human researchers alone.

What the agency has not disclosed is whether Mythos has actually turned up any new vulnerabilities. Officials told Bloomberg they did not know what, if any, security bugs the testing had produced — which could mean nothing material has been found, or simply that those findings live behind classification rules that prevent disclosure.

Why Microsoft is the test target

Microsoft's footprint inside U.S. government IT is enormous. Windows, Office, Exchange, Active Directory, Azure, and a long list of enterprise services are deployed at virtually every federal agency. A single high-severity vulnerability in any of those products is a national security issue, which is why the NSA has historically maintained dedicated teams that hunt for flaws in Microsoft code before adversaries do.

Microsoft itself is in on the work. In an April 22 security blog post, the company said it is collaborating with Anthropic through "Project Glasswing" and plans to incorporate Claude Mythos into its Security Development Lifecycle. Anything the AI finds inside Microsoft's own scanning will move through the Microsoft Security Response Center process and ship as either Update Tuesday patches or out-of-band fixes.

So there are two parallel programs running here: Microsoft using Mythos defensively to harden its own code, and the NSA using Mythos to independently probe Microsoft software the U.S. government depends on. Both can be true, and both raise the bar.

Why this matters

AI-driven vulnerability discovery has been a holy grail for the cybersecurity industry for a decade. The promise: a tool that can read code at machine speed, reason about how data flows through it, and surface the subtle bugs — race conditions, logic errors, memory mishandling — that humans miss. The risk, of course, is that the same tool can be turned around and used by attackers.

The NSA test is significant because it answers a real question. Frontier reasoning models like Mythos appear to have crossed a threshold where they can do useful work on real code at scale. That reframes how every software vendor needs to think about security. If a U.S. intelligence agency can run an LLM against your binaries and surface flaws faster than you can patch them, so can a well-resourced adversary.

For Anthropic, this is also commercially significant. The company is currently locked out of Department of Defense contracts after refusing to sign "any lawful purpose" terms — yet the NSA, a separate agency, is happily running its model on classified problems. That's because Anthropic has not blocked national security or cybersecurity use of Claude. The company's red lines are around fully autonomous lethal weapons and domestic mass surveillance, not defensive cyber. The NSA work is a clean fit.

What experts are saying

Cybersecurity researchers tracking the Bloomberg report flagged two reactions. The first is straightforward: this validates a wave of AI-for-vulnerability-research startups that have raised significant funding over the past 18 months. If the NSA finds Mythos useful for code analysis, smaller security teams almost certainly will too.

The second reaction is more cautious. Vulnerability researchers note that "the model surfaces leads" is not the same as "the model finds real, exploitable bugs at scale." LLMs are still prone to hallucinated findings — confidently flagging code as buggy when it isn't — and triaging false positives is itself expensive. The NSA's benchmarking work will eventually settle that question publicly, but for now most of the data sits behind classification.

What's next

Three things to watch.

First, whether any of the eight Pentagon-cleared AI vendors that just got IL6 and IL7 access (a separate Friday announcement) will be tasked with similar cyber roles. The NSA work suggests demand is real.

Second, whether Microsoft's Project Glasswing produces a public timeline for AI-assisted patches. If the company can credibly say "Mythos found this CVE, we patched it in N days," it becomes a template the rest of the software industry will copy.

Third, whether Anthropic's broader government story shifts. Today the company is a defense outsider but a cyber insider. That's a workable position, but it depends on the lines holding.

The bottom line

A frontier AI model is now part of how the U.S. government hunts for security flaws in the software it runs every day. That's not a hypothetical — it's reportedly happening at the NSA right now. Whether you build software, run a security team, or just use Microsoft products at work, the calculus around how vulnerabilities get found and fixed has just changed. The era of AI-assisted offensive and defensive cyber is already here.

NSA Tests Anthropic's Mythos AI to Find Microsoft Security Flaws

NSA Tests Anthropic's Mythos AI to Find Microsoft Security Flaws

What the NSA is actually doing

Why Microsoft is the test target

Why this matters

What experts are saying

What's next

The bottom line

Sources

Don't fall behind

Related Articles

OpenAI's GeneBench-Pro Tests AI on Real Biology Research

Anthropic Launches Claude Fable 5: Its Most Capable Model Yet

China Plans $295B AI Data Center Buildout to Rival the US