Output formats and limits
What you can set on PonPon outputs — aspect ratio, resolution, clip length, batch count, reference and file limits — with the concrete ranges and why they depend on the model.
Most output limits come from the model, not PonPon — so the exact options change when you switch models. The picker always shows what the current model supports; this page gives you the ranges and the file facts.
Aspect ratio
| Output | Options |
|---|---|
| Images | From 21:9 to 2:3 — including 1:1 (avatars, feed), 16:9 (banners), 9:16 (stories), plus "auto" to match a reference image |
| Video | 16:9 (YouTube), 9:16 (TikTok / Reels / Shorts), 1:1 (feed) — 1:1 is hidden when you start from an image, since the frame sets the ratio |
Image resolution
Images run from 0.5K to 4K, in this ladder — which steps a given model offers depends on the model:
| Key | Pixels |
|---|---|
| 0.5K | 512 |
| 1K | 1024 |
| 2K | 2048 |
| 4K | 4096 |
For example, GPT Image 2 exposes 1K / 2K / 4K. Higher resolution costs more credits.
Video resolution & length
Both are set by the model. As a concrete reference, Sora 2 outputs 1080p at up to 24 fps, in clips up to 12 seconds per generation; Sora 2 Pro raises the ceiling, and Kling 3.0 generates native 4K. Keep clips short while dialing in the shot, then commit to a longer render. For longer pieces, sequence several clips in Flow or Studio.
Count, batches & references
| Limit | Value |
|---|---|
| Reference images per generation | up to 10 |
| Images per batch | your choice — pick the best |
| Midjourney V8 outputs | always 4 per generation |
| Concurrent generations per account | up to 10 (images, video, and audio combined) |
Audio
In the audio studio, sound-effect and music clips let you set the length directly; voiceover and dubbing length follow your script or source. Speech covers 31 languages.
File formats
| Type | Download |
|---|---|
| Image | PNG or JPG (transparent PNG from background removal) |
| Video | standard MP4 |
| Audio | MP3 |
The pattern
If an option you want isn't there, it's almost always because the current model doesn't offer it — switch models and it may appear. Choosing a model maps which models do what, and Text-to-video basics and Image generation basics cover these controls in context.
Related articles
- Choosing a modelHow to pick the right AI model on PonPon: what each image and video model is best at, a quick decision table, a worked comparison, head-to-head matchups, and Fast vs Pro tiers.
- Text-to-video basicsHow video generation works on PonPon: text-to-video vs image-to-video, choosing models like Veo 3.1, Sora 2 and Kling 3.0, and the Edit and Motion Control tabs.
- Image generation basicsWrite a good image prompt, choose between models like GPT Image 2, Nano Banana Pro and Seedream 5.0, use reference images, and edit results with the annotate tools.