HappyHorse-1.0: What Creators Need to Know
Alibaba's anonymous model topped every benchmark. We break down the scores, the controversy, and what it means for your workflow.
What is HappyHorse-1.0?
HappyHorse-1.0 is a 15-billion-parameter video generation model built by the Future Life Lab inside Alibaba's Taotian Group. It processes text, image, video, and audio tokens together in a single unified transformer — a clean architecture that eliminates the multi-stage pipelines most competitors rely on.
The model appeared anonymously on the Artificial Analysis Video Arena on April 7, 2026 with no attribution. Three days later Alibaba confirmed ownership, BABA stock jumped 4%, and the AI video generation community had a new benchmark leader to evaluate.
The benchmark scores
HappyHorse dominates the Artificial Analysis leaderboard across every category:
- Text-to-video (no audio): Elo 1,365 — 95 points ahead of Seedance 2.0 (1,270) and 118 ahead of Kling 3.0 (1,247)
- Text-to-video (with audio): Elo 1,230 — narrower lead over Seedance 2.0 (1,221)
- Image-to-video: Elo 1,415 — the largest gap of any category
These are blind human-preference scores, not automated metrics. Real evaluators watched pairs of clips and picked the better one. That makes the margin significant — a 118-point Elo gap means HappyHorse was preferred roughly 66% of the time against the previous #1.
What it does well
Motion consistency is the headline. Characters and objects move through scenes without the flickering artifacts that still plague most models. Physical interactions — cloth draping, water splashing, objects with correct weight — approach the world-accurate simulation standard that Sora 2 set.
Audio is the other standout. HappyHorse generates synced dialogue in seven languages — Mandarin, Cantonese, English, Japanese, Korean, German, and French — in a single pass alongside video. Generation speed is competitive at roughly 38 seconds per 1080p clip on an H100, with support for 16:9, 9:16, and five other aspect ratios.
The stealth launch controversy
The anonymous submission was deliberate. By withholding Alibaba's name, HappyHorse earned its leaderboard position through blind preference alone — no brand halo, no marketing push. The strategy worked as intended.
The controversy came after. Alibaba described HappyHorse as "open," but as of mid-April no model weights were downloadable, no GitHub repository existed, and HuggingFace pages showed "Coming Soon." API testing through Alibaba Cloud Bailian began April 27, with commercial release planned for May 2026. The gap between "open" claims and actual availability drew criticism from the research community.
How it compares to models you can use today
Benchmarks measure preference in controlled tests. Production workflows care about availability, features, and ecosystem maturity. Here is the current picture:
- Kling 3.0 is the only model with multi-shot sequences — up to 6 camera cuts in a single generation with consistent characters across every cut. Max clip length is 15 seconds with native audio. Benchmark rank: #3.
- Sora 2 still sets the bar for photoreal physics and texture fidelity. 12-second clips, native audio, and the most natural lighting of any model. Not yet ranked on the current leaderboard.
- Veo 3.1 offers the most precise camera direction of any model — dolly, crane, tracking, and orbital movements execute faithfully from the prompt. 8-second clips with native audio.
- The speed-first option renders most clips in under 60 seconds — fast enough to iterate 10 times while a slower model finishes once. Benchmark rank: #2, just 9 points behind HappyHorse on audio-included tests.
HappyHorse leads on raw preference scores but lacks proven multi-shot support, has no public production track record, and its API is days old. The four models above have months of creator feedback, stable APIs, and established tooling around them.
Should you wait for HappyHorse?
If you need to ship content this week, the answer is no. Every major use case — narrative sequences, photoreal hero shots, fast social content, director-controlled camera work — is covered by models available on PonPon right now.
If you are evaluating the market, keep HappyHorse on your radar. The benchmark scores are real and the architecture is elegant. Once the API stabilizes and independent creators run it through real production workflows, we will know whether the leaderboard advantage translates to better finished work.
The practical move: open the multi-model workspace, generate the same prompt across Kling 3.0, Sora 2, Veo 3.1, and Seedance 2.0, and let the output decide. Benchmarks inform — your own eyes choose.