Huawei's Ascend 950 'Supernode' Powers DeepSeek V4 Without Nvidia
Krasa AI
2026-04-24
5 minute read
Huawei's Ascend 950 'Supernode' Powers DeepSeek V4 Without Nvidia
Huawei confirmed on Friday that it provided the compute backbone for DeepSeek's newly released V4 model, combining large clusters of its Ascend 950 AI accelerators under a proprietary "Supernode" interconnect fabric. The partnership is the most public evidence to date that China's domestic AI stack can train a frontier-class model without Nvidia silicon.
For U.S. policymakers and for Nvidia, that's a meaningful moment. For China, it's a validation that three years of aggressive investment in domestic chip capacity is starting to pay off at the frontier.
What Supernode is
Huawei's Supernode technology is a high-bandwidth interconnect system that stitches together thousands of Ascend 950 chips into a single logical training cluster. It's Huawei's answer to Nvidia's NVLink and InfiniBand-based fabrics — the engineering glue that lets GPUs act like one giant chip for training purposes.
Details on Supernode's bandwidth, topology, and latency are sparse. Huawei has disclosed the system exists and is being used in production, but hasn't published comparable benchmark data. Analysts outside China say the best guess is that Supernode is roughly competitive with Nvidia's H100-era interconnect but not yet at parity with the Blackwell generation.
The Ascend 950 itself is Huawei's latest AI accelerator, positioned as a direct competitor to Nvidia's H100 and H200 chips. It's been shipping in volume since late 2025 and is the current workhorse of China's domestic AI industry.
Why this matters
U.S. export controls have, since late 2022, progressively tightened the restrictions on what Nvidia silicon can be sold to Chinese customers. The H100, H200, and Blackwell generation are all off the table. The downgraded H20 variant was also restricted earlier this year.
The policy goal was to limit China's ability to train frontier models. The result, so far, has been to accelerate the domestic Chinese chip industry instead. DeepSeek's V4 — trained on Ascend 950 clusters — is the first frontier-class model to publicly acknowledge it was built entirely outside the Nvidia stack.
Why this matters: the policy assumption that restricting chip exports would slow China's AI frontier is now empirically testable. V4's benchmarks will decide whether that assumption holds.
The technical claim
DeepSeek has not published detailed training infrastructure numbers for V4, but the model itself — a 1.6T parameter mixture-of-experts with 49B active parameters and 1M token context — is by any measure a frontier-scale training run. Analysts estimate the training cost between $50M and $200M depending on assumptions about chip count and training time.
If that cost estimate holds, it would put V4's training cost roughly in line with what Chinese analysts have said the model costs — and substantially below what U.S. frontier labs spend on comparable training runs. Whether that's a real efficiency advantage or a reflection of state-subsidized pricing on Ascend chips is the open question.
Industry impact
For Nvidia, the strategic picture is complicated. The company's official position is that China represents a small and shrinking share of its revenue, and that its core markets in the U.S. and allied countries more than make up the difference. That's true at the revenue level. But the longer-term concern is whether a mature Chinese AI silicon ecosystem eventually exports to third countries — the Middle East, Southeast Asia, parts of Africa — where U.S. export controls don't apply and price sensitivity is higher.
For the U.S. policy community, V4 is going to reopen the debate about whether the chip export regime is working. The hawks will argue it's slowing China's AI frontier by some margin and should be tightened further. The skeptics will argue it's accelerating Chinese self-sufficiency and creating a long-term strategic competitor in a market where there wouldn't have been one otherwise.
For enterprise buyers, the practical effect is more immediate. V4 is open-source and the weights are downloadable now. Whether U.S. and European enterprises actually deploy it is a separate question — one that increasingly depends on geopolitics rather than pure technical merit.
What analysts are saying
Early reactions from chip analysts outside China were measured. The consensus read is that Supernode + Ascend 950 is a meaningful engineering accomplishment but is still roughly one generation behind the Nvidia state of the art. Whether that gap widens, narrows, or holds steady over the next two years is the single most important question for the long-term trajectory of U.S.-China AI competition.
Bloomberg reported that Huawei is privately telling Chinese AI labs that a successor to the Ascend 950 — informally referred to as the Ascend 960 — is expected to tape out later this year and deploy in 2027. If that timeline holds, it would be the first Chinese chip to credibly compete with Nvidia's Blackwell generation on per-chip performance.
What's next
Huawei has not confirmed the Ascend 960 timeline publicly. DeepSeek has said V4's general availability will follow the preview release within weeks, along with a technical paper that may or may not disclose training infrastructure details.
Watch for two things. First, how independent benchmark houses score V4 against U.S. frontier models in the coming weeks — the training hardware story only matters if the model itself is actually competitive. Second, whether other major Chinese AI labs (Alibaba's Qwen team, Moonshot, Zhipu) publicly commit to Huawei silicon for their next training runs. A broader shift would convert V4's individual achievement into an ecosystem shift.
Bottom line
The Ascend 950 + Supernode story is not about one model. It's about whether China can build and sustain a parallel AI hardware stack independent of U.S. export controls. Friday's launch is the strongest evidence yet that the answer is yes. How much that matters depends on whether Huawei can keep closing the gap with Nvidia over the next two generations — or whether the gap starts widening again.
Don't fall behind
Expert AI Implementation →Related Articles
NVIDIA Cosmos 3: First Open Physical AI Omnimodel Cuts Training Cycles to Days
NVIDIA's Cosmos 3 launches at Computex 2026 — a fully open foundation model that unifies vision, world generation, and action for robots and autonomous systems.
min read
Anthropic Adds Services Track and Partner Hub to Claude Network
Anthropic launches a 3-tier Services Track and a public Partner Hub. 40,000 firms have applied; 10,000 consultants are certified.
min read
Apoha Exits Stealth With $36M to Build 'Liquid Brain' AI for Materials
UK startup Apoha emerges with $36M Series A and a wild new data type: how materials vibrate in liquid. The pitch is AI for materials discovery.
min read