WAN 2.7: What Creators Need to Know
Alibaba's open-source video suite topped the leaderboard with a 27-billion-parameter MoE architecture. Here is what the model does, what it costs, and how it compares.
What Is WAN 2.7?
WAN 2.7 is a suite of four AI video models released by Alibaba's Wan research team in early April 2026 under the Apache 2.0 license. Unlike most commercial models that focus on a single generation mode, WAN 2.7 bundles text-to-video, image-to-video, reference-to-video with voice cloning, and instruction-based video editing into one package — each built on a shared 27-billion-parameter Mixture-of-Experts (MoE) transformer backbone.
The model debuted at the top of the Artificial Analysis text-to-video leaderboard with an Elo score of 1762, surpassing LTX-2.3 Pro (1484) and Alibaba's previous benchmark leader HappyHorse-1.0 (1446) by a wide margin. For creators watching the AI video generation landscape, WAN 2.7 marks the point where open-source models stopped trailing commercial ones and started leading.
The Four Models Inside WAN 2.7
Text-to-video (T2V) generates clips from natural language prompts with native 1080p output and no upscaling artifacts. The standout feature is Thinking Mode — a chain-of-thought reasoning system that analyzes your prompt before generation begins. It parses spatial relationships, plans composition, determines subject placement and lighting direction, then generates. The result is more spatially coherent output, particularly for complex multi-object scenes.
Image-to-video (I2V) animates reference images with strong identity preservation. The model supports a 9-grid layout approach that lets you provide multiple reference angles for more detailed scene composition — a practical improvement over single-image conditioning.
Reference-to-video with voice cloning (R2V) generates character video from a single reference image plus an audio sample. The model clones the voice from the audio and synchronizes lip movements to the generated speech. This puts character-driven content — testimonials, explainers, social media presenters — within reach of creators who have no filming setup.
Video editing (VideoEdit) takes an existing clip and applies instruction-based modifications. Describe the change in natural language — "change the background to a beach at sunset," "make the jacket red instead of blue" — and the model executes while preserving the original motion and timing.
Key Technical Details
- Architecture: 27B-parameter Mixture-of-Experts transformer with selective expert routing
- Output resolution: Native 1080p HD without upscaling
- Reference support: Up to 5 simultaneous reference images for complex multi-subject scenes
- Frame control: First-and-last-frame specification — you define both the starting and ending frames, and WAN 2.7 generates the motion between them
- License: Apache 2.0 (fully open for commercial use)
- API pricing: From $0.10 per second of generated video via Alibaba Cloud
- Self-hosting: Model weights available on GitHub (15,000+ stars); open weights follow 4-8 weeks after cloud launch per Alibaba's established release pattern
How It Compares to Models You Can Use Today
Leaderboard scores measure preference in controlled blind tests. Production workflows care about availability, features, ecosystem maturity, and time-to-output. Here is the current picture:
- Kling 3.0 remains the only consumer model with multi-shot sequences — up to 6 camera cuts with consistent characters across every cut. Max clip length is 15 seconds with native audio. WAN 2.7 does not offer equivalent multi-shot capability.
- Sora 2 set the standard for photoreal physics, skin texture, and lighting fidelity before its April 2026 shutdown. The quality benchmark it established is the one WAN 2.7 is now being measured against.
- Seedance 2.0 renders most clips in under 60 seconds — fast enough to iterate 10 times while a more complex model finishes once. WAN 2.7's generation speed on cloud API is competitive but self-hosted performance depends entirely on your hardware.
- Veo 3.1 offers the most precise camera direction of any model. Dolly, crane, tracking, and orbital movements execute faithfully from prompt descriptions.
WAN 2.7 leads on raw text-to-video preference scores and offers the broadest feature set of any single model (four modes in one). But it lacks proven multi-shot support, its API is weeks old, and self-hosting a 27B-parameter model requires significant GPU infrastructure.
The Open-Source Factor
The Apache 2.0 license is the real story. Previous open-source video models (Wan 2.1, HunyuanVideo) proved that researchers could build competitive models with open weights. WAN 2.7 proves they can build the best one.
For creators, the practical implications depend on technical comfort:
- If you self-host: Zero per-generation cost after hardware investment. Full control over model parameters, fine-tuning for specific styles or characters, no content moderation filters, no watermarks. Requires serious GPU hardware (minimum 24GB VRAM for the smallest configuration, 80GB+ recommended for full quality).
- If you use the API: $0.10 per second is competitive with commercial platforms. You get WAN 2.7's quality without the hardware overhead, but you also accept Alibaba Cloud's infrastructure and policies.
- If you use aggregation platforms: You get access to multiple commercial models — Kling 3.0, Veo 3.1, Seedance 2.0 — through one interface with shared credits, side-by-side comparison in the multi-model workspace, and the ability to pick the right model per task rather than committing to one.
Should You Switch to WAN 2.7?
If you need multi-shot narrative sequences, native lip-sync, or a production-tested ecosystem with months of creator feedback, the commercial models available today still have clear advantages. WAN 2.7's four modes are individually strong but lack the integration and workflow polish that dedicated commercial models provide.
If you are building custom video pipelines, training domain-specific models, or operating at a scale where per-generation costs matter significantly, WAN 2.7 is now the most capable open-source foundation available.
The practical move for most creators: open the video generation studio, generate the same prompt across the commercial models you have access to, and let the output quality — not the leaderboard position — drive your choice. If WAN 2.7 integration becomes available, it will be evaluated like any other model: on what it produces, not what its benchmark says.