GLM-5.1 Tops SWE-Bench Pro, Beating GPT-5.4 and Claude

Z.ai (formerly Zhipu AI) just released GLM-5.1, an open-source AI model that has claimed the number one spot on SWE-Bench Pro — the gold standard for measuring how well AI can solve real-world software engineering tasks. The model outperforms both GPT-5.4 and Claude Opus 4.6, and it was trained entirely on Huawei chips with zero Nvidia involvement.

Why this matters: for the first time, an open-source model has definitively beaten every closed frontier model on the most demanding coding benchmark in the industry. That changes the economics of AI-powered software development for everyone.

What GLM-5.1 Actually Is

GLM-5.1 is a 754-billion-parameter Mixture-of-Experts model with 40 billion active parameters at any given time. It supports a 200,000-token context window and can generate outputs up to 131,000 tokens long. The model is a post-training upgrade to the earlier GLM-5, meaning Z.ai took an already strong base and refined it specifically for agentic coding tasks.

The key numbers: on SWE-Bench Pro, which tests a model's ability to resolve real GitHub issues using just an instruction prompt and a 200K context window, GLM-5.1 scored 58.4. That edges out GPT-5.4 at 57.7, Claude Opus 4.6 at 57.3, and Gemini 3.1 Pro at 54.2.

The 8-Hour Workday Feature

The most striking capability isn't a benchmark score — it's endurance. GLM-5.1 can work continuously and autonomously on a single task for up to eight hours. That's not a typo. Eight hours of sustained, autonomous execution on a complex software project.

Previous models would typically produce a basic skeleton and declare the task complete. GLM-5.1 takes a fundamentally different approach. In demonstrations, the model autonomously built out a complete web application including a file browser, terminal emulator, text editor, system monitor, and even functional games. It then iteratively polished the styling and interaction logic until it delivered a visually consistent, production-quality result.

Why this matters: this is the closest any AI model has come to replacing a full workday of developer effort on a single complex task. It moves AI coding assistants from "helpful autocomplete" to "autonomous teammate."

The Huawei Chip Story

GLM-5.1 was trained entirely on Huawei's Ascend chips — no Nvidia hardware was used at any point in the process. This is significant for two reasons.

First, it demonstrates that competitive frontier models can be built without access to Nvidia's dominant GPU ecosystem. Second, it shows that Chinese AI labs are making genuine progress on the domestic chip supply chain, despite ongoing US export restrictions designed to limit China's access to advanced AI hardware.

Z.ai hasn't disclosed detailed training costs, but the fact that they achieved frontier-competitive results on non-Nvidia hardware suggests the cost-performance gap between Huawei and Nvidia silicon may be narrowing faster than many analysts expected.

Open Source Under MIT License

GLM-5.1 is released under the MIT license, making it one of the most permissively licensed frontier-class models available. The weights are publicly available on both Hugging Face and ModelScope, meaning anyone can download, modify, and deploy the model commercially.

This stands in sharp contrast to the trend toward closed models. Meta's recent Muse Spark abandoned the company's open-source tradition entirely. OpenAI's GPT-5.4 remains proprietary. Against that backdrop, Z.ai releasing a benchmark-topping model as fully open source is a deliberate strategic statement.

What This Means for Developers

For individual developers and small teams, GLM-5.1 represents a genuine alternative to expensive API subscriptions. If you're running complex coding tasks, you can now self-host a model that matches or exceeds what you'd get from OpenAI or Anthropic — at the cost of your own compute.

For enterprises, the eight-hour autonomous execution capability opens up new workflows. Complex migration projects, large-scale refactoring, and multi-file feature implementations could potentially be delegated to GLM-5.1 with minimal human oversight.

The broader implication is competitive pressure. When an open-source model tops the leaderboard, it forces closed-model providers to justify their pricing with capabilities that go beyond raw benchmark performance — better safety, reliability, integration, and support.

The Bottom Line

GLM-5.1 is a milestone for open-source AI. A Chinese lab, using Chinese-made chips, has produced a model that outperforms every Western frontier model on the most rigorous coding benchmark available — and released it for free. Whether you're a developer evaluating coding assistants or an investor tracking the AI landscape, this is a development worth watching closely. The weights are on Hugging Face right now if you want to try it yourself.

GLM-5.1 Tops SWE-Bench Pro, Beating GPT-5.4 and Claude

GLM-5.1 Tops SWE-Bench Pro, Beating GPT-5.4 and Claude

What GLM-5.1 Actually Is

The 8-Hour Workday Feature

The Huawei Chip Story

Open Source Under MIT License

What This Means for Developers

The Bottom Line

Sources

Don't fall behind

Related Articles

Digg Is Back — This Time as an AI-Powered News Aggregator

76% of Companies Now Have a Chief AI Officer, IBM Study Finds

Monday.com Launches AI Work Platform With Native Agents