Prompting for video

A practical method for AI video prompts on PonPon: shot structure, the camera presets the models understand, pacing, model-specific tips, and fixing common failures.

A good video prompt reads like a shot description a director hands a camera operator. It names the subject, the action, the camera, and the light — and resists cramming three shots into one.

A reliable structure

Write in this order:

Subject — who or what, specific. "A young woman in a red raincoat."
Action — the single thing that changes during the clip. "walks toward the camera and looks up."
Setting — where, and what's around. "on a rain-slicked city street at night, neon reflected in puddles."
Camera — the move. "slow dolly in, eye level."
Light & mood — "cool blue light, cinematic, moody."

A young woman in a red raincoat walks toward the camera and looks up, on a rain-slicked city street at night with neon reflections, slow dolly in at eye level, cool cinematic light. 9:16, 5 seconds.

Camera language the models understand

PonPon's Studio timeline exposes the exact camera moves the models respond to — use these terms in any prompt:

Push In / Pull Out — move toward or away from the subject.
Pan Left / Right, Tilt Up / Down — rotate the camera in place.
Tracking — follow alongside a moving subject.
Orbit — circle around the subject.
Crane Up, Aerial — rise above the scene.
Handheld — loose, organic movement.
Dolly Zoom — the vertigo effect.
Static — a locked-off shot.

Tip

Name one primary move per clip. Asking for a "pan and zoom and orbit" all at once usually produces mush — the model can't honor three directions in a few seconds.

One action per shot

The most common mistake is describing a whole scene with multiple events. A clip is only a few seconds — give it one beat. If you need a sequence, generate each shot separately and assemble in Flow, or use the multi-shot timeline in Studio on Kling 3.0 to direct several cuts in one generation.

Note

Prompts have an upper length limit (it varies by model), and PonPon won't trim an over-long one — it fails instead of running. Put the essentials first; naming one clear beat beats piling on detail anyway.

Pacing and length

Keep clips short while iterating; judge the motion, then commit to a longer render.
Words like "slow", "unhurried", "gentle" vs "quick", "snappy", "energetic" genuinely change the result.

Match the model to the shot

Veo 3.1 — the most precise camera direction, plus native audio. Reach for it when the move matters.
Kling 3.0 — best for dialogue (lip-sync) and multi-shot sequences.
Sora 2 — when physics and texture realism carry the shot.
Seedance 2.0 — fast, expressive, vertical-first social clips.

Note

A working prompt structure transfers across models — the same shot description runs on any of them. Try one prompt on two or three models and keep the best take rather than rewriting per model.

Fixing common problems

Problem	Try this
Warping faces or hands	Simpler action, slower motion, or start from a clean image via image-to-video
Camera ignores your direction	Name one explicit move from the list above; drop competing directions
Too much happening	Cut to a single action; split into multiple shots
Off-brand look	Provide a Start Frame instead of describing the style in words
Wrong subject emphasis	Put the subject first; remove background clutter

Lock the look with a first frame

When the *style* matters more than the surprise, generate or upload a still and animate it with a Start Frame in the video generator. You stop gambling on the look and only ask the model to handle motion. For the fundamentals, revisit Text-to-video basics.

Prompting for video

A practical method for AI video prompts on PonPon: shot structure, the camera presets the models understand, pacing, model-specific tips, and fixing common failures.

A good video prompt reads like a shot description a director hands a camera operator. It names the subject, the action, the camera, and the light — and resists cramming three shots into one.

A reliable structure

Write in this order:

Subject — who or what, specific. "A young woman in a red raincoat."
Action — the single thing that changes during the clip. "walks toward the camera and looks up."
Setting — where, and what's around. "on a rain-slicked city street at night, neon reflected in puddles."
Camera — the move. "slow dolly in, eye level."
Light & mood — "cool blue light, cinematic, moody."

A young woman in a red raincoat walks toward the camera and looks up, on a rain-slicked city street at night with neon reflections, slow dolly in at eye level, cool cinematic light. 9:16, 5 seconds.

Camera language the models understand

PonPon's Studio timeline exposes the exact camera moves the models respond to — use these terms in any prompt:

Push In / Pull Out — move toward or away from the subject.
Pan Left / Right, Tilt Up / Down — rotate the camera in place.
Tracking — follow alongside a moving subject.
Orbit — circle around the subject.
Crane Up, Aerial — rise above the scene.
Handheld — loose, organic movement.
Dolly Zoom — the vertigo effect.
Static — a locked-off shot.

Tip

Name one primary move per clip. Asking for a "pan and zoom and orbit" all at once usually produces mush — the model can't honor three directions in a few seconds.

One action per shot

Note

Pacing and length

Keep clips short while iterating; judge the motion, then commit to a longer render.
Words like "slow", "unhurried", "gentle" vs "quick", "snappy", "energetic" genuinely change the result.

Match the model to the shot

Veo 3.1 — the most precise camera direction, plus native audio. Reach for it when the move matters.
Kling 3.0 — best for dialogue (lip-sync) and multi-shot sequences.
Sora 2 — when physics and texture realism carry the shot.
Seedance 2.0 — fast, expressive, vertical-first social clips.

Note

A working prompt structure transfers across models — the same shot description runs on any of them. Try one prompt on two or three models and keep the best take rather than rewriting per model.

Fixing common problems

Problem	Try this
Warping faces or hands	Simpler action, slower motion, or start from a clean image via image-to-video
Camera ignores your direction	Name one explicit move from the list above; drop competing directions
Too much happening	Cut to a single action; split into multiple shots
Off-brand look	Provide a Start Frame instead of describing the style in words
Wrong subject emphasis	Put the subject first; remove background clutter

Prompting for video

A reliable structure

Camera language the models understand

One action per shot

Pacing and length

Match the model to the shot

Fixing common problems

Lock the look with a first frame

Related articles

Prompting for video

A reliable structure

Camera language the models understand

One action per shot

Pacing and length

Match the model to the shot

Fixing common problems

Lock the look with a first frame

Related articles