Can Viewers Tell AI-Generated Video from Real?
We showed AI-generated and real video clips to 200 people and asked them to identify which was which. The results were not what we expected.
There is a persistent question in every conversation about AI-generated content: can people tell? The assumption used to be obvious yes — AI video had a distinct uncanny quality that made it immediately identifiable. But models have improved dramatically. We decided to test the current state of things.
The setup
We compiled 20 video clips: 10 generated with current AI models (Sora 2, Kling 3.0, Veo 3.1, and Seedance 2.0) and 10 shot with traditional cameras. We matched the clips across similar categories — landscapes, product shots, urban scenes, nature, and abstract motion — so the subject matter would not be a giveaway.
Each clip was 5-8 seconds long. We presented them to approximately 200 participants through an online survey, asking them to identify each clip as "AI-generated" or "real footage." We also asked them to explain what influenced their decision.
This was an informal test, not a peer-reviewed study. The sample was self-selected and skewed toward people interested in technology. But the results are informative.
The headline results
Overall accuracy: 58 percent. Participants correctly identified AI-generated versus real footage 58 percent of the time. Random guessing would produce 50 percent accuracy. The margin above chance is narrow.
AI identification accuracy: 52 percent. When shown an AI-generated clip, participants correctly identified it as AI only 52 percent of the time — essentially a coin flip.
Real footage identification: 64 percent. Participants were better at correctly identifying real footage, likely because they defaulted to "real" when uncertain.
Confidence was misplaced. Participants who expressed high confidence in their answers were not significantly more accurate than those who expressed low confidence. Feeling certain did not correlate with being correct.
What gives AI video away
The 48 percent of correct AI identifications clustered around specific visual cues. Here is what tipped off the participants who got it right.
Physics anomalies. Water that does not quite flow naturally. Fabric that moves slightly wrong. Objects that interact with their environment in ways that feel subtly off. These are the most reliable tells, but they require careful viewing — most people do not scrutinize casual video for physics accuracy.
Hand and finger artifacts. Human hands remain a weak point for generative models. Extra fingers, merged fingers, or unnatural hand positions appeared in some character clips. This is improving rapidly — Kling 3.0 handles hands better than previous models — but it remains an occasional giveaway.
Temporal inconsistency. Objects that change subtly between frames. A pattern on a surface that shifts. An edge that flickers. These are most visible in clips with static elements where the eye has time to notice inconsistency.
Too-perfect lighting. Several participants noted that AI-generated clips sometimes had "too perfect" lighting — uniformly beautiful in a way that real footage rarely achieves without extensive setup. This is an ironic tell: the AI produces lighting that looks too good.
Uncanny texture. A subtle smoothness to surfaces that lack the micro-detail of real-world textures. This is most noticeable in close-up shots where viewers expect to see pores, grain, and imperfection.
What does not give it away
Equally interesting is what participants expected to be tells but were not.
Camera movement. Participants expected AI video to have unnatural camera movement. In practice, models like Veo 3.1 produce camera movements that are indistinguishable from physical camera rigs. Several participants incorrectly identified smooth Veo 3.1 camera moves as "obviously real because the camera work is too good for AI."
Color and grading. AI-generated video matches the color science and grading of professional footage. Participants could not reliably use color as a distinguishing factor.
Composition. The compositional quality of AI-generated frames is comparable to professional cinematography. Several participants incorrectly labeled well-composed AI clips as real and poorly composed real footage as AI.
Resolution and sharpness. At the viewing sizes typical for online video, resolution differences are not apparent. AI-generated video at 720p-1080p is visually indistinguishable from camera footage at the same resolution.
Results by content category
Landscape and nature: 54 percent accuracy. The hardest category to distinguish. AI models generate convincing natural environments. Participants performed barely above chance.
Urban scenes: 56 percent accuracy. Street scenes, buildings, and city environments. AI models handle these well, though occasional text rendering issues on signs provided clues.
Product shots: 55 percent accuracy. Close-up product visuals. The controlled lighting and simple compositions of product shots play to AI's strengths.
Human subjects: 63 percent accuracy. The easiest category to identify correctly, driven primarily by hand artifacts and subtle facial inconsistencies. This is the category where AI generation still has the most room to improve.
Abstract motion: 53 percent accuracy. Abstract visual content — flowing liquids, geometric motion, particle effects. Virtually indistinguishable from real footage of similar subjects.
What this means for content creators
The practical implication is clear: for most content applications, viewers cannot reliably distinguish AI-generated video from real footage. This has several consequences.
Quality is sufficient for professional use. If your audience cannot tell the difference, the functional quality is equivalent. AI-generated content can serve the same communication purpose as traditionally produced content in most contexts.
Transparency matters more than detection. Since viewers cannot reliably identify AI content through viewing, transparency depends on disclosure rather than visual detection. If your use case requires audiences to know the content is AI-generated, you need to tell them — they will not figure it out on their own.
The uncanny valley is behind us. For the categories we tested, AI video has crossed the uncanny valley. The output does not trigger the "something is wrong" response that earlier AI video provoked.
Human subjects remain the frontier. Content featuring human characters and faces is the area where AI generation is most likely to produce identifiable artifacts. This is improving with each model generation, but it is the area to be most careful with today.
The detection arms race
As AI-generated video becomes indistinguishable from real footage, detection tools become important. Several organizations are developing AI detection systems, and some models embed invisible watermarks in their output.
But detection is an arms race. As detection improves, generation improves to evade detection. The long-term solution is not technological detection but cultural practices: clear labeling, transparent disclosure, and ethical guidelines for when AI-generated content is appropriate.
Our takeaway
The question "can viewers tell" already has a different answer than it did a year ago, and the answer a year from now will be different again. The trajectory is clear: AI-generated video is becoming indistinguishable from real footage across an expanding range of content categories.
For content creators, this means the technology is ready for production use. For audiences, it means developing media literacy around AI-generated content is increasingly important. For the industry, it means establishing disclosure norms now, before detection becomes impossible.
The models available on PonPon — Sora 2, Kling 3.0, Veo 3.1, Seedance 2.0 — represent the current state of the art. The visual quality is there. The question is no longer whether AI can produce convincing video, but how to use that capability responsibly.