Microsoft's AI Found 16 Windows Flaws Humans Missed

Microsoft revealed today that an internal AI system called MDASH — built from more than 100 specialized agents working together — found 16 Windows vulnerabilities that were patched in this month's Patch Tuesday release. Four of those flaws are rated critical remote code execution bugs, meaning an attacker could take over an unpatched machine over the network with no user interaction required.

What Is MDASH?

MDASH stands for Multi-model Agentic Scanning Harness. Microsoft's Autonomous Code Security team built it to do something that's extremely hard to do at scale: find exploitable bugs in a massive, decades-old codebase like Windows before attackers do.

The system orchestrates more than 100 specialized AI agents, each focused on a different vulnerability class. Some agents hunt for memory corruption bugs; others look for race conditions; others specialize in authentication flaws. They run in parallel, comparing notes and debating findings before escalating anything to a human reviewer. The goal is to stop a vulnerability from entering the wild by finding it first, faster, and at a volume no security team could match alone.

Microsoft says MDASH is "model-agnostic," meaning it mixes frontier models (the biggest, most capable AI systems) with smaller distilled models — so it can run both deep analysis and rapid, low-cost screening at the same time.

The May 2026 Findings

This month's MDASH discoveries center on two Windows subsystems: IKEv2 (the protocol Windows uses for VPN and IPSec connections) and TCP/IP's handling of IPv6 traffic. The two most severe findings include:

CVE-2026-33824 (CVSS 9.8): A double-free vulnerability in ikeext.dll. An unauthenticated attacker can send a specially crafted IKEv2 packet to any Windows machine with IPSec enabled, triggering remote code execution — no login required, no target interaction needed.

CVE-2026-33827 (CVSS 8.1): A race condition in tcpip.sys. An attacker on the same network segment can send a malformed IPv6 packet to cause remote code execution when IPSec is active.

Both flaws exist deep in Windows networking code that runs at high privilege. The fact that MDASH found them — before any external researcher or attacker reported them — is notable. Microsoft says 100% of cases in its private test driver were caught with zero false positives.

Why This Matters for Security

Security teams have long dreamed of automated tools that can find real, exploitable bugs rather than just flagging code patterns. Traditional static analysis tools generate enormous amounts of noise — thousands of warnings, most of which aren't actually exploitable. Bug bounty programs rely on human researchers who are creative but expensive and slow at scale.

MDASH changes the math. On the public CyberGym benchmark — a suite of 1,507 real-world vulnerability reproduction tasks drawn from open-source projects — MDASH scored 88.45%, about five percentage points ahead of the next-best system. (Anthropic's Claude Mythos, which the NSA has also been testing for security research, held the previous benchmark record before MDASH's debut.)

The real-world signal is even more striking: in retrospective testing against five years of confirmed vulnerabilities in Windows networking components, MDASH caught 96% of known bugs in one driver and 100% in another. Humans reviewing the same code had missed them.

What This Means for Attackers and Defenders

There are two ways to read this announcement. For defenders, MDASH represents a genuine leap: a system that can continuously audit a massive codebase and surface critical flaws in days or weeks rather than the months or years it might take a human team to stumble across the same issue.

For attackers, the same technology points in an unsettling direction. If Microsoft's defenders can use 100 AI agents to find bugs, well-resourced adversaries can deploy similar systems offensively. The same agentic techniques that make vulnerability research faster for the good guys also lower the barrier for sophisticated threat actors to find zero-days of their own.

Microsoft's security team is clearly aware of this arms race dynamic — the blog post announcing MDASH explicitly frames the system as "defense at AI speed," a phrase that acknowledges attackers are accelerating too.

Expert Reaction

Security researchers have flagged the benchmark numbers as a landmark moment. "88% on CyberGym isn't a toy demo — that's production-grade bug hunting," wrote security researcher Katie Moussouris on X. "This is the first time I've seen an AI system outperform the collective output of a major bug bounty program on a reproducible benchmark."

Others pointed to the governance implications. The system's findings went through human validation before being patched — Microsoft says reviewers confirmed every MDASH finding before it made it into Patch Tuesday. That human-in-the-loop step matters: an AI that flags false positives at scale could consume more engineering time than it saves.

What's Next

Microsoft says MDASH is now running continuously across Windows, scanning new code commits and flagging potential issues before they ship. The company hasn't announced plans to sell or license the technology externally, though security-as-a-service applications seem like a natural next step.

Patch Tuesday for May 2026 addressed 138 vulnerabilities in total across Microsoft products, with 30 rated critical — a large batch by recent standards. Security teams should prioritize the IKEv2 and TCP/IP fixes (CVE-2026-33824 and CVE-2026-33827 in particular) given their high CVSS scores and their location in code that's exposed on any network-connected Windows machine running IPSec.

The bottom line: AI has crossed a meaningful threshold in security research. Microsoft's MDASH isn't a research prototype — it's finding real bugs that shipped in real patches. The question for every security team now isn't whether AI will change vulnerability research. It already has.

Microsoft's AI Found 16 Windows Flaws Humans Missed

Microsoft's AI Found 16 Windows Flaws Humans Missed

What Is MDASH?

The May 2026 Findings

Why This Matters for Security

What This Means for Attackers and Defenders

Expert Reaction

What's Next

Sources

Don't fall behind

Related Articles

Anthropic Launches Claude Science and Enters Drug Discovery

AI Uncovers Squidbleed, a 29-Year-Old Squid Proxy Bug

Anthropic Launches Claude Fable 5: Its Most Capable Model Yet