ShengShu Launches Vidu Q3, Lands $293M From Alibaba Cloud

Chinese AI video startup ShengShu Technology launched Vidu Q3 Reference-to-Video globally today and simultaneously confirmed a nearly RMB 2 billion (about $293 million) Series B round led by Alibaba Cloud. The one-two announcement puts ShengShu in direct competition with OpenAI's Sora and Google's Veo, and it signals that Alibaba is betting its cloud unit's future on owning the video layer of generative AI.

The launch is more than an incremental update. Vidu Q3 Reference-to-Video lets creators feed the model multiple reference inputs — subjects, environments, costumes, props, and visual styles — and stitch them into a single coherent video. That's a capability gap that has held back production use of Sora and Veo so far.

What Vidu Q3 Actually Does

Vidu Q3 Reference-to-Video is designed around story-driven creation rather than one-shot clips. Creators can combine reference images of a character, a setting, and a specific wardrobe, then generate video that keeps all of them consistent across shots.

The model supports six types of cinematic effects: particle systems, fluid simulation, dynamic motion, camera movement, transitions, and lighting. It generates up to 16 seconds of synchronized audio and video in a single pass, with multi-shot composition, camera control, background music, sound effects, and multilingual dialogue all handled natively in the model rather than bolted on through separate pipelines.

That native audio-video synchronization is the piece most video AI competitors still lack. Sora and Veo generate strong visuals, but audio is typically added through separate tools. Vidu Q3 treats the two as a single generation problem.

Why the Benchmark Ranking Matters

At launch, Vidu Q3 took the No. 1 global spot on the Artificial Analysis video model benchmark — the closest thing the field has to an objective scoreboard. That ranking is likely to shift as competitors respond, but it matters right now because it forces enterprise buyers to actually evaluate a Chinese model rather than defaulting to U.S. options.

The model plugs into ShengShu's broader product ecosystem: Vidu Agent (autonomous video creation), Vidu Claw (a pro creator tool), and the consumer Vidu App. The goal is end-to-end — a single system that takes a creative brief and produces finished video, instead of forcing teams to stitch together a half-dozen tools.

The Alibaba Cloud Bet

The Series B is arguably the bigger strategic story. Alibaba Cloud led the nearly RMB 2 billion round, with participation from Andon Haitang, China Internet Investment Fund, TAL Education Group, Luminous Ventures, and existing shareholders including Baidu Ventures. The round brings ShengShu's total funding to over $500 million since its 2023 founding.

Why this matters: Alibaba Cloud is Alibaba's most important growth engine, and its competitive position in China depends on having best-in-class generative AI to sell alongside compute. Leading the ShengShu round tells the market Alibaba expects Vidu to be the video model that runs on its cloud — and it's willing to underwrite the cost of getting there.

Tencent and ByteDance have their own video AI pushes (Hunyuan Video and Doubao Seaweed, respectively), but neither has landed a similar cloud-anchored enterprise partnership this cleanly.

The AGI Narrative

ShengShu's official framing is that the funding will advance its "Foundation World Model" — an AI system designed to simulate and understand both physical and digital environments. The company describes two pillars: a World Generation Model (WGM) powering the Vidu family, and a World Action Model (WAM) designed for physical-world interaction.

This mirrors the direction DeepMind and others are taking with world models as a potential route to AGI. Whether that framing holds up or whether Vidu ends up being, in practice, a very good video product is an open question. For now, the world-model language is probably most useful as recruiting and fundraising narrative rather than a concrete product roadmap.

What This Means for the U.S. Labs

Sora and Veo have had the U.S. enterprise video AI market largely to themselves for the past year. That changes today. An Alibaba-backed, benchmark-leading competitor with native audio-video generation and a real reference-to-video workflow means U.S. buyers now have a credible alternative — and a pricing check.

For creators and production studios, the practical upshot is that reference-to-video is now a feature competitors have to match. Consistent characters across shots, wardrobe fidelity, and native audio generation are the capabilities that separate demo-ware from production tools. Vidu Q3 is the first widely available model that bundles them.

How to Access It

Vidu Q3 Reference-to-Video is live today through Vidu.com, the Vidu App, and the Vidu API. Integrations with third-party platforms including WaveSpeedAI and Novita AI are also available, which lowers switching costs for teams already running pipelines on those providers.

Expect a pricing war. ShengShu will almost certainly aim to undercut Sora and Veo on cost-per-second to win share, and Alibaba Cloud has every reason to help. For any team currently paying for AI video generation, running a bake-off against Vidu Q3 in the next two weeks is worth the effort.

The bottom line: today's dual announcement makes ShengShu the most credible non-U.S. player in generative video, with the Alibaba partnership giving it both capital and distribution. The next question is how fast OpenAI and Google respond — and whether enterprise buyers start treating the Chinese model as a mainstream option rather than a curiosity.

Sources: ShengShu Vidu Q3 Launch | Bloomberg on Alibaba's $300M Bet | CNBC Coverage

ShengShu Launches Vidu Q3, Lands $293M From Alibaba Cloud

ShengShu Launches Vidu Q3, Lands $293M From Alibaba Cloud

What Vidu Q3 Actually Does

Why the Benchmark Ranking Matters

The Alibaba Cloud Bet

The AGI Narrative

What This Means for the U.S. Labs

How to Access It

Sources

Don't fall behind

Related Articles

Anthropic Launches Claude Science and Enters Drug Discovery

AI Uncovers Squidbleed, a 29-Year-Old Squid Proxy Bug

Anthropic Launches Claude Fable 5: Its Most Capable Model Yet