Is Runway Gen-4.5 better than Kling 3.0?

Gen-4.5 leads on Artificial Analysis's blind preference benchmark. Kling 3.0 leads on features — multi-shot storyboards, 15-second clips, and native audio in six languages. The better model depends on what your project requires.

Can I use Runway Agent on PonPon?

Runway Agent is exclusive to Runway's platform. PonPon offers its own AI agent that orchestrates across 30+ models, routing each generation step to the best-suited engine rather than limiting output to a single model family.

What are Runway Characters used for?

Real-time conversational avatars for customer support, training, advertising, and gaming. Any image becomes a live video agent that lip-syncs and gestures during conversation at 24fps with under two seconds of latency.

Should I switch to Runway or stay multi-model?

If you want workflow simplicity and only need Runway's model quality, the integrated stack is compelling. If you want the best output for each specific shot type, access to all leading models through one interface gives you more flexibility.

← 모든 글

2026년 5월 19일 · PonPon Team

Runway's May 2026 Triple Launch

Gen-4.5 claimed #1 on Video Arena. Agent turns conversations into finished films. Characters makes any image a live avatar.

Runway shipped three distinct products in May 2026, each targeting a different layer of the video creation stack. Gen-4.5 competes on raw generation quality. Runway Agent competes on workflow automation. Runway Characters competes on real-time interactive media. Together, they represent the most aggressive single-month expansion any AI video company has attempted.

The timing is not accidental. With Sora gone and Google's Gemini Omni still months from public access, the competitive window is wide. Runway is using it to redefine what an AI video platform is — from a generation tool to an autonomous creative partner.

Gen-4.5: Number One on Video Arena

Gen-4.5 launched to immediate benchmark dominance, claiming the #1 position on Artificial Analysis's Video Arena with an Elo score of 1,247. The model beat every competitor in blind human-preference testing — evaluators watched pairs of clips and selected the output they preferred, with no knowledge of which model produced which clip.

The quality improvements are specific and measurable. Objects move with realistic weight and momentum. Liquids flow with accurate fluid dynamics. Surface materials — metal, fabric, glass, skin — render with texture detail that previous models approximated but never matched. Physical causality tracks correctly: a ball hitting water produces a splash proportional to its mass and velocity, not a generic splash effect.

Gen-4.5 supports text-to-video, image-to-video, and video-to-video generation. Pricing runs from $15/month on Standard to $95/month for Unlimited, positioning it as a premium tool rather than a mass-market play.

The benchmark lead, however, comes with context. Kling 3.0 still offers multi-shot storyboard sequences that Gen-4.5 lacks — the ability to generate six consistent camera cuts in a single pass. Veo 3.1 still leads on camera control fidelity, where directorial prompts like "slow dolly left while tracking the subject" execute with more precision. Benchmark scores measure preference on isolated clips; production workflows care about features beyond single-clip quality.

Runway Agent: From Conversation to Finished Video

Released May 13, Runway Agent is a conversational interface that handles the full video production pipeline: concept development, story beats, visual direction, multi-scene generation, voiceover, dialogue, music, and final assembly. The user describes what they need. The agent proposes a direction. Together they iterate until the agent generates a ready-to-publish video.

This is not prompt-to-video. It is a multi-turn creative collaboration where the agent maintains context across the entire conversation — remembering the brand guidelines you specified in message one while adjusting the color grading you requested in message twelve. The output is high-resolution, multi-shot video assembled with transitions, audio, and pacing that would previously require a human editor.

The target users are brand teams, marketing agencies, and filmmakers who currently spend hours in editing software assembling clips into sequences. Runway Agent collapses that assembly step into a conversation that takes minutes.

For multi-model users, the comparison point is pipeline tools like multi-model pipeline orchestration, which orchestrates across multiple generation engines rather than locking creators into a single model. The architectural difference matters: Runway Agent produces output exclusively from Runway's models, while multi-model agents can route each scene to the engine best suited for that specific shot — cinematic sequences through one model, speed-critical social cuts through another.

Runway Characters: Real-Time Video Avatars

The third launch, introduced May 4, is the most technically distinctive. Runway Characters transforms any single image — a photograph, a cartoon mascot, a stylized illustration — into a fully expressive video avatar that responds in real time during live conversations.

Built on Runway's GWM-1 (General World Model), the system produces natural lip-sync, facial expressions, eye movements, and head gestures at 24 frames per second in HD. End-to-end latency from when a user stops speaking to when the character begins responding is 1.75 seconds. Model inference time per frame is 37 milliseconds.

The enterprise API is live at dev.runwayml.com. Fortune 10 technology companies, Hollywood studios, global advertising agencies, and gaming companies are already using Characters for customer support avatars, internal training, experiential advertising, and immersive game worlds.

This is a different market from video generation. Characters does not compete with Kling or Veo — it competes with virtual avatar platforms and interactive media tools. The bet is that the same world-model architecture that generates cinematic video can also power real-time conversational agents, and that combining both under one platform creates a defensible ecosystem.

The Strategic Pattern

Runway's May launch sequence follows a clear logic: own the quality benchmark (Gen-4.5), own the workflow (Agent), and own the interactive layer (Characters). Each product reinforces the others. Characters uses the same GWM-1 architecture that powers Gen-4.5. Agent assembles its output using Gen-4.5 as the generation engine. The moat is not any single product — it is the integrated stack.

This strategy has a well-documented trade-off. Integrated stacks optimize for internal coherence at the cost of flexibility. Every output comes from Runway's models, which means every output carries Runway's specific strengths and limitations. When Gen-4.5 struggles with a particular scene type, there is no fallback to a model that handles it better.

The alternative architecture — a multi-model workspace that routes each task to the best available engine — sacrifices stack integration for output optimization. Neither approach is universally superior. The right choice depends on whether a creator values workflow simplicity (Runway) or output flexibility (multi-model).

What Multi-Model Users Should Watch

Three observations for creators who work across platforms:

Gen-4.5's benchmark lead will be challenged quickly. Kling 3.5 launched in the same month. Google's Veo upgrades are rolling out post-I/O. Benchmark leadership in AI video has never lasted more than one quarter. The generation quality gap between top models is narrowing to the point where model selection matters less than prompt craft and workflow design.

Agentic video creation is the real battleground. Runway Agent is the first polished implementation of conversational video production, but it will not be the last. The value of automated pipeline tools increases every time a new agent ships — creators who already work in multi-step orchestration workflows will adapt fastest.

Real-time avatars open a market that generation tools cannot address. Characters is not a feature — it is a new category. Customer support, training, gaming, and live commerce all need conversational video agents. This market will grow independently of the generation quality race.