How to Write AI Video Prompts
The difference between a mediocre AI video and a cinematic one is the prompt. Here's how to write prompts that get the result you want.
The prompt is the single biggest factor in AI video quality. The same model can produce garbage or cinema depending on how you write the prompt. After testing thousands of prompts across Sora 2, Kling 3.0, Veo 3.1, and Seedance 2.0, here's what works.
The four-part prompt formula
Every good video prompt has four components:
1. Subject — Who or what is in the frame
- Bad: "a person"
- Good: "a woman in her 30s with short black hair, wearing a navy linen blazer"
2. Action — What's happening
- Bad: "walking"
- Good: "walks briskly through a crowded farmer's market, picking up a peach and inspecting it"
3. Setting — Where and when
- Bad: "outside"
- Good: "outdoor farmer's market at golden hour, string lights overhead, blurred crowd in background"
4. Camera — How it's shot
- Bad: (omitted entirely)
- Good: "medium tracking shot from the right side, shallow depth of field, slight handheld movement"
Put them together: *"A woman in her 30s with short black hair, wearing a navy linen blazer, walks briskly through a crowded farmer's market at golden hour. She picks up a peach and inspects it. String lights overhead, blurred crowd in background. Medium tracking shot from the right side, shallow depth of field, slight handheld movement."*
Lighting keywords that work
Lighting is the fastest way to improve output quality. Models respond strongly to these terms:
- Golden hour — warm, directional, long shadows
- Overcast — soft, even, no harsh shadows
- Neon-lit — saturated color, urban night
- Backlit / rim light — subject silhouetted against light
- Candlelit — warm, flickering, intimate
- Studio lighting — clean, controlled, commercial look
- Dappled sunlight — light filtered through leaves
Camera movement vocabulary
Veo 3.1 has the best camera control, but all models respond to these terms:
- Dolly in/out — camera moves toward or away from subject
- Pan left/right — camera rotates horizontally
- Tilt up/down — camera rotates vertically
- Crane up/down — camera rises or lowers vertically
- Tracking shot — camera follows subject movement
- Steadicam / smooth — stable, floating movement
- Handheld — slight natural shake
- Drone / aerial — overhead perspective
- Whip pan — fast rotational movement
Model-specific tips
Sora 2 responds best to detailed environment descriptions. Spend extra words on materials, textures, and lighting. Mention specific camera lenses ("shot on 85mm", "anamorphic lens") for style control.
Kling 3.0 excels when you write multi-shot prompts. Structure them as a shot list: "Shot 1: wide establishing shot of... Shot 2: close-up of... Shot 3: over-the-shoulder..." The model will generate cuts with consistent characters.
Veo 3.1 rewards precise camera language. Use specific cinematography terms rather than vague descriptions. "Slow 180-degree orbital shot" works better than "camera moves around the subject."
Seedance 2.0 works best with shorter, punchier prompts. Don't overload it with detail — focus on the one key action and mood. It's optimized for social-ready vertical content.
Common mistakes to avoid
1. Too vague — "a cool video of a city" gives the model nothing to work with 2. Too long — 500-word prompts confuse models. Keep it under 150 words. 3. Contradictory instructions — "dark moody lighting in a bright sunny field" — pick one 4. Describing what NOT to do — "no blur, no grain" doesn't work. Describe what you want, not what you don't want. 5. Ignoring camera — if you don't specify camera movement, the model picks randomly
Iteration strategy
Your first generation is a draft, not a final product. The workflow:
1. Start with a simple prompt covering all four parts 2. Generate and review 3. Add detail where the model missed your intent 4. Remove instructions the model seems to ignore 5. Regenerate — usually 3–5 iterations to get a great result
Use Canvas to generate multiple variations at once and compare them side by side. This is faster than iterating one at a time.