Up to 5 reference images for guided edits
Supply reference images to guide the edit — a photo of a specific outfit, a target background, or a character face. HappyHorse maps each reference to the relevant element in the source video.
Video-to-video (v2v) editing takes an existing video clip as input and applies targeted modifications guided by text instructions and optional reference images. Unlike text-to-video (which generates from scratch), v2v preserves the original clip's motion, timing, and spatial layout while changing specified elements — a character's clothing, the background environment, or the visual style. The source video acts as a motion scaffold that constrains the generation.
Supply reference images to guide the edit — a photo of a specific outfit, a target background, or a character face. HappyHorse maps each reference to the relevant element in the source video.
Target individual elements without affecting the rest of the frame. Change a character's shirt color, replace a prop on a desk, or swap a logo on a wall — the surrounding scene stays untouched.
Apply scene-wide transformations: convert daytime footage to night, apply an anime or oil-painting style, or shift the entire color palette. The motion path and character actions remain identical.
The source clip's motion trajectory, camera movement, and character poses are extracted and used as constraints. Your edit changes the appearance but not the choreography of the original footage.
Describe edits in plain English: "Change the red dress to a blue business suit" or "Replace the office background with a tropical beach." No mask drawing, keyframing, or timeline editing required.
Go to PonPon Video and select HappyHorse from the model dropdown. Switch to the video-to-video editing mode.
Upload the video clip you want to edit. Supported formats include MP4 and MOV. Shorter clips (3–10 seconds) produce the most consistent results.
If your edit involves specific visual targets — a particular outfit, a character's face, a background photo — upload them as reference images. Each reference guides the model toward your intended result.
Describe what you want changed. Be specific about which elements to modify and which to keep: *"Replace the character's casual T-shirt with a formal navy blazer. Keep the background and lighting unchanged."*
Click Generate and compare the edited output with your source clip side by side. Check that the motion matches and the edit was applied consistently across all frames.
Whether you're a solo creator, an agency, or a brand — every model adapts to how you work.
Change the character's white T-shirt and jeans to a formal black tuxedo with a bow tie. Keep the same walking motion, background, and lighting. Maintain the character's face and hairstyle exactly.
Model: HappyHorse · Mode: Video-to-video · References: 1 (tuxedo photo) · Source: 6s clip
Replace the indoor office background with an outdoor rooftop terrace overlooking a city skyline at sunset. Keep the character's appearance, position, and gestures identical. Match the lighting to golden hour.
Model: HappyHorse · Mode: Video-to-video · References: 1 (rooftop photo) · Source: 8s clip
Convert this live-action clip to detailed anime style — Studio Ghibli-inspired with soft watercolor backgrounds and cel-shaded characters. Preserve all motion, facial expressions, and timing exactly.
Model: HappyHorse · Mode: Video-to-video · References: 0 · Source: 5s clip
Replace the person in this clip with the character from [person1] reference image. Keep the exact same body movement, gestures, and scene. Match the new character's clothing to the original scene's style.
Model: HappyHorse · Mode: Video-to-video · References: 1 (face photo) · Source: 6s clip
Fashion brands and content creators can show the same scene with different outfits. Film once, then use v2v editing to generate variations — a model walking in three different dresses from a single source clip.
Swap a plain studio background for a beach, a cityscape, or a branded set. Useful for real estate virtual staging videos, travel content previews, or shooting on a budget without location access.
Convert live-action footage to anime, watercolor, or cyberpunk aesthetics for music videos, social campaigns, or pitch decks. The original performance and timing stay intact.
Swap a character's face using a reference photo while preserving their body movement and scene context. Useful for personalizing template videos for different clients or audiences.
| HappyHorse V2V Editing | Other V2V Tools | |
|---|---|---|
| Reference images | Up to 5 reference images to guide the edit — outfit photos, face refs, background targets | Kling O3: v2v editing with reference support but fewer simultaneous refs |
| Edit precision | Both local (single object) and global (full scene) edits via natural language | Many tools require mask drawing for local edits — more precise but slower |
| Motion preservation | Source video motion fully preserved — edits change appearance only | Some tools partially re-generate motion, causing timing drift on longer clips |
| Mask requirement | No masks needed — describe the edit target in text | Photoshop/After Effects plugins require manual mask drawing per frame or region |
| Best for | Quick creative variations: wardrobe swaps, background changes, style transfers | Kling O3: finer control for complex compositing. Manual tools: pixel-precise corrections |
Longer source videos increase the chance of temporal inconsistencies in the edit. For best results, use 3–10 second clips. For longer sequences, split into segments and edit each separately.
Explicitly state which elements should stay the same: "Keep the character's face, hairstyle, and background unchanged." Without this, the model may apply broader changes than intended.
Changing both the wardrobe and background in one pass can reduce quality. For complex transformations, do one edit at a time — swap the outfit first, then use the output as input for the background change.
If the source clip has motion blur, compression artifacts, or low lighting, the edit will inherit those issues. Start with clean, well-lit source footage for the best edited output.
Join thousands of creators, agencies, and brands who use PonPon every day.