BACH: AI Multi-Shot Films in 30 Seconds
Video Rebirth's $80M bet on industrial-grade AI video just shipped. Here's what BACH does, where it ranks, and what it means for creators already using top models.
Video Rebirth, the Singapore-based AI video company founded by former Tencent distinguished scientist Dr. Wei Liu, launched BACH on May 7, 2026. The company calls it the first industrial-grade AI video engine built around directorial intent — understanding what a director wants to see on screen, not just what words describe.
The timing is deliberate. The AI video generation field has consolidated around a handful of proven production models, and BACH is positioning itself as the next serious contender. Here is what the launch actually delivers, how the architecture works, what the benchmarks say, and how it compares to models creators are already shipping with.
What BACH does differently
BACH's headline feature is Montage — the ability to generate up to 30-second multi-shot films from text instructions and reference images in a single workflow. That means multiple camera angles, cuts, and transitions rendered together, not stitched after the fact. The output is native 1080p at 30fps — not interpolated, not upscaled.
Video Rebirth frames BACH around four dimensions of creative intent: character identity, emotional performance, camera language, and narrative structure. Where most models treat these as independent prompt parameters, BACH attempts to resolve them jointly so that a character's facial expression stays consistent with their dialogue while the camera executes a specific movement.
The engine also generates audio alongside video in the same pass — sound effects, voiceover, and background music are part of the output, not a post-production step. This is a capability shared by the current top-tier models, but BACH claims tighter integration between audio events and shot transitions within the Montage workflow.
The architecture: DDiT and PNA
BACH is built on two proprietary components. Dual Diffusion Transformer (DDiT) handles directorial control — it translates professional cinematographic instructions into shot composition, camera movement, and scene transitions. The architecture is designed to interpret standard film-production vocabulary (establishing shot, close-up, over-the-shoulder) rather than requiring users to describe camera behavior in abstract terms.
Physics-Native Attention (PNA) is the character and physics layer. Instead of learning character identity from surface-level features like hair color and clothing, PNA builds identity from skeletal structure, skin tone, proportional relationships, and the muscular dynamics that drive facial expressions. The goal is to maintain character consistency not just visually but physically — the same person moves the same way across every cut.
In practice, you describe a scene with shot-by-shot instructions and BACH produces a coherent multi-cut sequence. Character identity, wardrobe, and setting persist across cuts without the drift that plagues most multi-generation workflows.
How it ranks on the leaderboard
BACH 1.0 Preview was independently evaluated on the Artificial Analysis Video Arena — the largest blind human-preference benchmark for AI video, where real evaluators watch pairs of clips and pick the better one without knowing which model made each. BACH debuted at #6 globally on the text-to-video leaderboard (no audio category), with performance comparable to Vidu Q3 Pro and Kling 3.0 Omni.
For context, the current top five proprietary models with audio on the text-to-video leaderboard are:
- HappyHorse-1.0 (Alibaba) — Elo 1,355
- Seedance 2.0 (ByteDance) — Elo 1,272
- Kling 3.0 (Kuaishou) — Elo 1,250
- Kling 3.0 Omni (Kuaishou) — Elo 1,234
- Grok Imagine Video (xAI) — Elo 1,233
BACH enters a competitive field where the top models are separated by meaningful but not insurmountable gaps. The 105-point spread between #1 and #5 means blind evaluators preferred HappyHorse over Grok Imagine roughly 65% of the time — real but not overwhelming.
On image-to-video, the leaderboard is tighter. HappyHorse leads at Elo 1,397, with Seedance 2.0 at 1,348, Grok Imagine Video at 1,327, and PixVerse V6 at 1,323. BACH has not yet appeared on the image-to-video leaderboard — a gap that limits its usefulness for creators whose workflow starts from reference photos.
The $80 million thesis
Video Rebirth raised $80 million in March 2026 specifically to build what they call the first industrial-grade AI video engine. The funding went toward the DDiT and PNA architectures, with the explicit goal of moving AI video from creative experiments into repeatable production pipelines.
The company's thesis is that current models excel at single-shot generation but break down when creators need multi-shot narrative sequences with consistent characters across cuts. BACH's Montage feature is the direct answer — a single workflow that outputs a complete mini-film rather than isolated clips that need manual assembly.
Dr. Wei Liu, who led video AI research at Tencent before founding Video Rebirth, is betting that the bottleneck in professional AI video adoption is not visual quality — the top models are good enough — but workflow integration. A 30-second multi-shot film that ships from one tool is worth more to a production team than six perfect individual clips that need manual assembly in Premiere.
How BACH compares to models you can use today
BACH's multi-shot capability invites direct comparison with Kuaishou's flagship video model, which currently offers up to 6 camera cuts in a single generation with character consistency across every shot. Kling 3.0 also supports native lip-synced dialogue in five languages and 15-second clips at 1080p — mature features with months of production feedback behind them.
The 30-second output length is genuinely new. No current model generates single clips beyond 20 seconds. If BACH delivers on this consistently, it addresses a real pain point — creators currently chain shorter clips in pipeline builders to construct longer sequences, which works but requires manual continuity management.
Grok Imagine Video, xAI's entry that debuted at #1 on multiple leaderboards earlier this year, offers a different value proposition: strong single-shot quality at 720p with 15-second clips and a recently added "Extend from Frame" feature for chaining shots. Its Elo scores (1,233 text-to-video, 1,327 image-to-video) put it just above BACH's debut position.
For pure visual fidelity, Sora 2 still sets the standard for physics accuracy and texture realism. For precise directorial camera work — dolly, crane, tracking shots executed faithfully from prompt instructions — Veo 3.1 remains the benchmark. And for sheer iteration speed, the sub-60-second rendering option lets creators test ten directions while a premium model finishes one.
BACH's gap: it is a day-one product. The #6 ranking is respectable but not leading. It lacks the ecosystem maturity, the proven API stability, and the months of creator feedback that established models have built.
The broader landscape: specialization and compliance
BACH's launch fits a larger pattern emerging in May 2026. The AI video market is splitting into two tiers. General-purpose models compete on breadth — photorealism, speed, camera control, audio quality. Specialized engines like BACH compete on depth within a specific workflow, in this case multi-shot narrative production.
Another trend shaping the field is content provenance. The EU AI Act, Article 50, requires AI-generated content to carry machine-detectable provenance metadata by August 2, 2026. The C2PA standard (now ISO/IEC 22144) — backed by Adobe, Microsoft, Google, OpenAI, and Meta — is the de facto implementation. Professional distribution platforms increasingly reject AI video without C2PA metadata, making compliance a production requirement rather than a nice-to-have.
BACH's positioning as an "industrial-grade" engine suggests C2PA compliance is on the roadmap, though no specifics have been announced. For creators shipping professional work today, verifying that your generation platform supports C2PA metadata is worth adding to the pre-flight checklist.
What to do now
BACH is available at bach.art with complimentary credits for new users. It is worth testing if your workflow centers on multi-shot narrative sequences and you need clips longer than 15 seconds.
For single-shot hero content, fast social iteration, photoreal product shots, or precise camera direction, the models available today cover the full range. Open the multi-model workspace, run the same brief across Kling 3.0, Seedance 2.0, and Veo 3.1, and let the output decide.
The practical move for most creators: keep shipping with proven models, test BACH on a side project, and revisit once the engine has a few months of real-world feedback. The leaderboard tells you which model wins a blind comparison. Your production timeline tells you which model ships reliable footage on deadline. Those are not always the same answer.