From Case Studies to Prompts: A Step-by-Step Guide to Make a Viral Veo3 Video

June 19, 2025 | Jasper

Hello everyone, today I will teach you how to make the most popular AI video category.

Whether you pay attention to AI or not, you must have seen this kind of video on different social media recently.

A gorilla or other animal chatters to the camera, or interviews about various fictional historical events.

Each one has a high number of likes and views, both in China and abroad.

Although it is boring, I can't stop watching it once I see it, even for people like me who often come into contact with AI content.

This category has even penetrated into the field of advertising and marketing. There is a dentist clinic on Instagram that posts ads every day, and each ad only gets a few thousand views. But after switching to "Big Foot Boy", his ads got 560,000 views!

With Veo3, the production cost of AI videos has dropped a lot. Now may be a very good opportunity for you to get started with AI video production. You only need to generate two simple videos twice, and your work is already completed. You may think that the ideas of the above videos are all from the creators themselves, but I want to tell you that it is not. Most of the popular AI videos above are almost all AI-based from conception to prompt words to generation. The only thing humans need to do is to select ideas, generate and copy prompt words.

So in this article, I will not only teach you how to use video models, but also teach you how to use various tools to analyze videos to generate new ideas, give you prompt word templates, and let you automate everything from creativity to generation. My process is actually close to engineering, and you can even use it to make video agent products.

Let’s take a look at two similar AI videos I made using this process.

Isn’t that awesome? Okay, from now on throw away your brain and get started!!!

How to analyze viral videos

First of all, we are going to teach you how to analyze videos and expand your creativity. Here I use NotebookLM. Do you only use NotebookLM to analyze YouTube videos with voiceovers? But I want to tell you that Gemini can actually analyze video images, so you can let Noteboolm help you analyze the creative form of any YouTube hit video. You can even add multiple hit videos to it for cross-analysis.

The first step you need to do is to open the page, create a new NotebookLM notebook, and enter the URL of the YouTube hit video you found in the Add Source.

You can continue to add sources here. I added four popular AI videos made by Veo3 at once. Two are in the form of first-person Vlogs and two are in the form of interviews. Then you can write prompts to let NotebookLM start analyzing. Here I put my prompts and change the previous part to apply it to any creative video.

These four videos were generated and edited using Google's newly released Veo 3 video model. They are very popular on Youtube. We analyze the lines and screen content of each frame of each video in detail, and then summarize the reasons for their popularity.

You can see that NotebookLM's analysis is very detailed. The storyboards and lines of each video are output, and the reasons why these videos became popular are also analyzed very well. After watching them, I have some ideas about the reasons why these videos became popular, but I can't do it as detailed and complete as NotebookLM's analysis.

Veo3 Universal formula for pseudo-documentary hit videos: four core elements

Core engine: huge "contrast" (Contrast Engine) This is the fundamental source of all the laughs. The success of the video lies in the forced collision of two completely unrelated elements, resulting in an absurd comedy effect. Time contrast: Use the most modern form (Vlog, street interviews) to present ancient or fictional content (Titanic, Vikings, Bigfoot, Stormtroopers). Identity contrast: Let the characters that should be mysterious, serious or evil (Bigfoot, Vikings, Stormtroopers) show the side of ordinary people (even "losers"), full of life and human weaknesses. Situational contrast: In an extremely dangerous or grand background, the characters are concerned about trivial daily trivia. For example, when about to hit the iceberg, the passengers are concerned about the toast tomorrow morning; on the battlefield with artillery fire, the Stormtroopers are making snowmen.

Expression: The immersive sense of "pseudo-documentary" (Authentic Format) videos all use the shooting method of simulating real records, which makes the audience have the illusion that "this seems to be real", thus making the contrast more intense. Pseudo-Interview (Pseudo-Interview): Such as "Titanic" and "Vikings", using a serious news interview format to make the absurd answers more funny. First-Person Vlog (First-Person Vlog): Such as "Bigfoot Boy" and "Stormtroopers", using selfie sticks and subjective lenses, greatly enhancing the sense of substitution, as if watching the "circle of friends" videos of these characters.

Content cornerstone: Using "shared knowledge" (Shared Knowledge) These videos never create a worldview out of thin air, but cleverly stand on the shoulders of "giants" and use the audience's existing knowledge reserves and stereotypes. Historical events: The audience knows that the Titanic will sink and the Vikings are warlike. Pop culture/IP: The audience knows the Stormtroopers and Vader in "Star Wars". Cultural memes: Audiences are familiar with the legend of Bigfoot or the stereotypes of certain groups of people (such as outdoor enthusiasts who drive Subaru). This greatly reduces the audience's understanding cost, and the jokes don't need to be laid out, and they will be understood at once.

Key to communication: Strong "workplace/life resonance" (Relatable Complaints) The most "heartbreaking" and funny part of the video is to pull all the grand narratives back to the complaints and complaints of ordinary people. "Workers' mouthpiece": Stormtroopers call Vader "bastard boss", complaining about the poor working environment, dangerous tasks, and unreliable colleagues. This makes all office workers feel the same. Daily troubles: The elevator workers on the Titanic complained about the tedious work, and the Viking women complained about always washing bloody clothes. These details full of life make the characters instantly vivid and resonate strongly with the audience.

How to expand video creativity

We already have the creation logic and storyboard description of this type of popular video above. Next, we need to let AI help us expand our creativity based on these contexts. You can open any AI model you are used to, and I am using Gemini here.

Send him the results of the Notebooklm analysis just now, and then tell him: I will send you some recently popular video content generated by AI video models and the reasons for the popularity. You need to combine these contents to give me some ideas similar to the first-person Vlog format, describing in detail the environment and characters corresponding to each storyboard (8 seconds) as well as the content and tone of speech, and appropriately insert ways of speaking that break the fourth wall, such as asking for likes.

Here we first make a first-person Vlog video. The copy and content here may not be used when we generate prompts, but we still need to let him output it because we need to use detailed storyboard content and oral copy to judge the quality of creativity, and we can't just rely on titles and simple descriptions. After that, you can choose according to the ideas he outputs. The first choice is easier to implement and more realistic themes, so that the generated effect is better. Here I chose the goblin cow horse, which feels very contrasting, and the perspective of a small character in the grand visual view of "Dungeons and Dragons".

In the fictional interview, I chose the idea of the pirate annual meeting and the jargon of the financial industry. The contrast is strong and will resonate with working people.

Prompt word generation

If you feel that a certain part of a storyboard needs to be modified, you can ask it to revise it again until the content is correct. I was lazy here and went straight to the next step without making any changes, starting to generate prompt words. This part is relatively simple. I will give you prompt word templates for the first-person Vlog and interview scenarios. Let the AI output the prompt words for each storyboard based on the results just discussed and the prompt word template.

First-person Vlog-style video prompt generation: [Goblin Cleaner], this is a good idea, put all the prompt descriptions of each storyboard in one paragraph, including audio-related content, each storyboard is 8 seconds, pay attention to the length of the lines, and don't exceed the time limit.

Use this template to generate the prompt: A cinematic, handheld selfie-style shot of [a detailed character description, e.g., a sci-fi explorer in a sleek silver spacesuit]. They hold the camera at arm's length, and their [specific arm/hand description, e.g., armored silver gauntlet] is clearly visible in the frame as they show a [specific emotional expression, e.g., look of pure awe]. The scene is a [detailed location and time of day, e.g., bioluminescent alien jungle at twilight], and behind them, [describe the key background element, e.g., massive, pulsating mushroom-like trees] cast a [specific lighting quality, e.g., vibrant purple and blue light] across the landscape. The character looks directly into the camera and speaks in a [specific tone of voice, e.g., breathless, excited whisper]: "[Your Dialogue Here]". (Optional: For extra control, add specs like Lens: wide-angle with shallow focus or describe a camera pan).

Prompt word generation for fictional interview type [Pirate Meeting], this is a good idea, put all the prompt description content of each storyboard in one paragraph, including audio related content, each storyboard is 8 seconds, pay attention to the length of the line, do not exceed the time limit.

Use this template to generate prompts: A cinematic, medium handheld interview shot featuring [a detailed character description, e.g., a fearsome pirate captain in a captain's coat with a Bluetooth earpiece]. They display a [specific emotional expression, e.g., look of confident authority] as they speak. pirates mingling near a makeshift bar] visible in the slightly out-of-focus background. The atmosphere is thick with [describe environmental sounds, e.g., the murmur of distant conversations and the clinking of tankards]. Flickering [specific lighting quality, e.g., torchlight] illuminates the character, casting dynamic shadows. Crucially, the character looks slightly off-camera, addressing an unseen interviewer. They speak in a [specific tone of voice, e.g., a fast-talking, confident finance-bro voice]: "[Your Dialogue Here]". (Optional: For extra control, specify lens details like 'shot on a 50mm lens with a shallow depth of field' or describe camera movement like 'a slow push-in during the dialogue').

At this point, our preliminary preparations are basically complete. Because of Veo3's powerful cue word compliance, stability, and audio generation capabilities, we can directly skip the steps of image generation, voice generation, lip sync, and sound effect matching for image-generated videos. If you have made similar videos in the past, you can imagine how troublesome it was. Every step skipped above may cause problems, resulting in several times more work.

Generate Video

Next we can generate the video. If you want to do it more simply, I recommend you to get a Gemini Pro membership and generate it in Gemini. Go directly to the Gemini APP, select the video button under the input box, enter the prompt word and press Enter.

If you don’t mind the hassle, you can use FLOW (labs.google/fx/zh/tools/flow/), a product from Google specifically for video generation.

After entering, create a project first, then adjust the model to Veo3 Fast model in the input box settings. This is very cheap. Don't ask me why the video has no sound because you didn't switch the model! If you pursue quality, you can use Quality model, but it is very expensive. Then enter the prompt word and wait for it to be generated.

If you can choose to super-resolve the generated result to 1080P when downloading here, the video will be clearer.

Video merging and post-processing

Finally, you need to combine the videos. Since the videos generated by Veo3 are basically complete, you only need to use Jianying or other tools to stitch the videos from multiple terminals and then export them. This should be done by everyone. If it is in English, you can click the subtitles above to let Jianying automatically generate a subtitle for you.

Every step that is omitted in the AI video production process will expand the base of creators by more than 10 times. Many people have good ideas and traffic sense, but they cannot produce content because they do not have enough technology and understanding of AI. Veo3 production costs are already very low. If video agents are available to package video subtitles, the number of AI video producers will be more than 100 times. We may see this day coming this year.