Google, Microsoft & xAI Now Submit AI Models for US Government Testing Before Launch

The US government just gained significantly more oversight of frontier AI. Google DeepMind, Microsoft, and xAI have all signed agreements with the Commerce Department's Center for AI Standards and Innovation (CAISI) — giving federal evaluators access to powerful AI models before those systems ever reach the public.

This announcement, confirmed this week, means five of the most powerful AI labs in the world are now formally part of the US government's pre-deployment review program. OpenAI and Anthropic joined earlier, and now the program covers virtually the entire frontier AI landscape.

Why This Is Happening Now

The timing isn't accidental. This expansion follows a wake-up call from Anthropic's Mythos model — a powerful system that demonstrated it could identify network vulnerabilities at scale. That disclosure pushed AI safety concerns from a theoretical debate into an urgent policy issue.

The White House has been weighing a formal executive order to vet new AI models, with National Economic Council Director Kevin Hassett comparing the potential process to FDA drug approval. The CAISI program expansion represents a significant step in that direction, even before any executive order is signed.

The agreements build on existing partnerships established in 2024, which have since been renegotiated to reflect directives from the Commerce Secretary and the administration's AI Action Plan.

What CAISI Actually Does

CAISI (the Center for AI Standards and Innovation, formerly part of NIST) serves as the government's primary interface for evaluating commercial AI systems. Its work isn't rubber-stamp approval — evaluators actively probe for risks before a model ships to the public.

Evaluations cover what the government calls "demonstrable risks": cybersecurity vulnerabilities, biosecurity implications, and the potential for AI to assist in chemical weapons development. So far, CAISI has completed more than 40 model assessments.

The scope also extends beyond pre-launch. CAISI can continue testing after deployment, meaning models remain under government scrutiny even after they reach users.

The program includes an interagency taskforce called TRAINS (Testing Risks of AI for National Security), which now draws participants from more than 10 federal agencies operating under CAISI leadership.

What the Companies Agreed To

Google DeepMind, Microsoft, and xAI will share unreleased versions of their AI models with government evaluators. The agreements cover "testing, collaborative research, and best practice development related to commercial AI systems."

That's a meaningful commitment. It requires companies to give government access to models that haven't been fine-tuned, safety-filtered, or otherwise prepared for public release — the raw, most capable versions of these systems.

CAISI also recently published its evaluation of DeepSeek V4 Pro, the most capable Chinese AI model to date. That assessment found the model lags behind the US frontier by roughly 8 months across domains including cyber, software engineering, natural sciences, and mathematics — a rare piece of public comparative intelligence on where the competition stands.

What This Means for the Industry

This is the most significant expansion of formal AI oversight in the United States since the Biden administration's 2023 executive order. But it's voluntary — none of these agreements are legally mandated. Companies are signing on because the political winds are clearly shifting, and getting ahead of regulation is typically better than reacting to it.

The question everyone in the industry is asking: does voluntary compliance become the template for mandatory compliance? If the program proves effective at identifying real risks without slowing development too much, the answer is probably yes.

For enterprises evaluating AI vendors, this creates a new signal worth tracking. A company's participation in — and transparency about — government testing will increasingly become a proxy for trustworthiness in regulated industries like finance, healthcare, and defense.

What's Next

The White House is still drafting an executive order to formalize AI vetting processes, though no timeline has been set. The CAISI program is likely to serve as the operational foundation for whatever framework emerges.

If you're building on top of frontier models from any of these five labs, the key takeaway is that pre-deployment government review is now the norm, not the exception. That's likely to make major model releases more deliberate — and potentially slower — but it also provides a degree of institutional confidence that hasn't existed before.

The frontier AI market just got a new gatekeeper. How that changes the pace of innovation remains to be seen.

Google, Microsoft & xAI Now Submit AI Models for US Gov Testing

Google, Microsoft & xAI Now Submit AI Models for US Government Testing Before Launch

Why This Is Happening Now

What CAISI Actually Does

What the Companies Agreed To

What This Means for the Industry

What's Next

Sources

Don't fall behind

Related Articles

Anthropic Launches Claude Science and Enters Drug Discovery

AI Uncovers Squidbleed, a 29-Year-Old Squid Proxy Bug

Anthropic Launches Claude Fable 5: Its Most Capable Model Yet