GLM-5.1 Tops SWE-Bench Pro, Beating GPT-5.4 and Claude
Krasa AI
2026-04-12
4 minute read
GLM-5.1 Tops SWE-Bench Pro, Beating GPT-5.4 and Claude
Z.ai (formerly Zhipu AI) just released GLM-5.1, an open-source AI model that has claimed the number one spot on SWE-Bench Pro — the gold standard for measuring how well AI can solve real-world software engineering tasks. The model outperforms both GPT-5.4 and Claude Opus 4.6, and it was trained entirely on Huawei chips with zero Nvidia involvement.
Why this matters: for the first time, an open-source model has definitively beaten every closed frontier model on the most demanding coding benchmark in the industry. That changes the economics of AI-powered software development for everyone.
What GLM-5.1 Actually Is
GLM-5.1 is a 754-billion-parameter Mixture-of-Experts model with 40 billion active parameters at any given time. It supports a 200,000-token context window and can generate outputs up to 131,000 tokens long. The model is a post-training upgrade to the earlier GLM-5, meaning Z.ai took an already strong base and refined it specifically for agentic coding tasks.
The key numbers: on SWE-Bench Pro, which tests a model's ability to resolve real GitHub issues using just an instruction prompt and a 200K context window, GLM-5.1 scored 58.4. That edges out GPT-5.4 at 57.7, Claude Opus 4.6 at 57.3, and Gemini 3.1 Pro at 54.2.
The 8-Hour Workday Feature
The most striking capability isn't a benchmark score — it's endurance. GLM-5.1 can work continuously and autonomously on a single task for up to eight hours. That's not a typo. Eight hours of sustained, autonomous execution on a complex software project.
Previous models would typically produce a basic skeleton and declare the task complete. GLM-5.1 takes a fundamentally different approach. In demonstrations, the model autonomously built out a complete web application including a file browser, terminal emulator, text editor, system monitor, and even functional games. It then iteratively polished the styling and interaction logic until it delivered a visually consistent, production-quality result.
Why this matters: this is the closest any AI model has come to replacing a full workday of developer effort on a single complex task. It moves AI coding assistants from "helpful autocomplete" to "autonomous teammate."
The Huawei Chip Story
GLM-5.1 was trained entirely on Huawei's Ascend chips — no Nvidia hardware was used at any point in the process. This is significant for two reasons.
First, it demonstrates that competitive frontier models can be built without access to Nvidia's dominant GPU ecosystem. Second, it shows that Chinese AI labs are making genuine progress on the domestic chip supply chain, despite ongoing US export restrictions designed to limit China's access to advanced AI hardware.
Z.ai hasn't disclosed detailed training costs, but the fact that they achieved frontier-competitive results on non-Nvidia hardware suggests the cost-performance gap between Huawei and Nvidia silicon may be narrowing faster than many analysts expected.
Open Source Under MIT License
GLM-5.1 is released under the MIT license, making it one of the most permissively licensed frontier-class models available. The weights are publicly available on both Hugging Face and ModelScope, meaning anyone can download, modify, and deploy the model commercially.
This stands in sharp contrast to the trend toward closed models. Meta's recent Muse Spark abandoned the company's open-source tradition entirely. OpenAI's GPT-5.4 remains proprietary. Against that backdrop, Z.ai releasing a benchmark-topping model as fully open source is a deliberate strategic statement.
What This Means for Developers
For individual developers and small teams, GLM-5.1 represents a genuine alternative to expensive API subscriptions. If you're running complex coding tasks, you can now self-host a model that matches or exceeds what you'd get from OpenAI or Anthropic — at the cost of your own compute.
For enterprises, the eight-hour autonomous execution capability opens up new workflows. Complex migration projects, large-scale refactoring, and multi-file feature implementations could potentially be delegated to GLM-5.1 with minimal human oversight.
The broader implication is competitive pressure. When an open-source model tops the leaderboard, it forces closed-model providers to justify their pricing with capabilities that go beyond raw benchmark performance — better safety, reliability, integration, and support.
The Bottom Line
GLM-5.1 is a milestone for open-source AI. A Chinese lab, using Chinese-made chips, has produced a model that outperforms every Western frontier model on the most rigorous coding benchmark available — and released it for free. Whether you're a developer evaluating coding assistants or an investor tracking the AI landscape, this is a development worth watching closely. The weights are on Hugging Face right now if you want to try it yourself.
Don't fall behind
Expert AI Implementation →Related Articles
Anthropic Starts Checking IDs: Claude Now Asks for a Passport
Anthropic quietly rolled out passport and selfie verification for select Claude users via Persona — a first among major AI labs and a jolt to its privacy brand.
min read
Google Puts AI Mode Inside Chrome: Side-by-Side Browsing Goes Live
Google's AI Mode now opens web pages next to the chat in Chrome, pulls multi-tab context, and embeds directly in the New Tab page — starting today in the US.
min read
Google's Gemini 3.1 Flash TTS Lets You Direct AI Voices With Text
Google's Gemini 3.1 Flash TTS ships 200+ audio tags, 70+ languages, native multi-speaker dialogue, and SynthID watermarking — already #2 on TTS leaderboards.
min read