Waiting for your creations!
KLING 2.6 is the native audio-video model: one click yields a 5–10-second clip that pairs visuals with lip-synced narration, dialogue, singing, and ambient sound—no post-production needed. Text-to-video and image-to-video paths, bilingual Chinese-English support, and a credits-based pricing model compress video creation from hours to seconds.
On the beach, the waves crash against the shore. [Young Caucasian male] wearing a backward baseball cap, holding a camera and taking a selfie, with a smile at the corner of his mouth. [Young Caucasian male, sunny voice] says: "The weather is amazing today! All my worries feel totally gone. I've been needing a day like this—sun, breeze, just the sound of the waves." The camera is in vlog close-up style.
Visual: In a tidy living room, a white robotic vacuum sits in the center, with no clutter around it. Dialog: [Narrator, soft female voice] accompanied by the gentle sound of vacuuming: "Are you still troubled by dust in hard-to-reach corners? This robotic vacuum features edge-to-edge cleaning, leaving no gaps behind—making your life easier and effortless!" The camera closely follows the vacuum's path as it cleans.
In a bright rehearsal room, sunlight streams through the window, and a standing microphone is placed in the center of the room. [Campus band female lead singer] stands in front of the microphone with her eyes closed, while the other members stand around her. [Campus band female lead singer, full voice] leads: "I will try to fix you, with all my heart and soul..." The background is an a cappella harmony, and the camera slowly circles around the band members.
Visual: In front of an outdoor shopping mall, a crowd gathers, cheering. Dialog: [African-American male reporter] stands next to the crowd, holding a microphone, his body slightly turned. [African-American male reporter, steady voice] says: "Now we can see the atmosphere here is absolutely electric. Let's go check it out together! There's so much happening all at once." Background: Cheerful crowd noises and event BGM, with occasional close-ups of the event.
Visual: On a comedy stage, the spotlight is focused on the center, while the audience remains in the shadows. Dialog: [Stand-up comedian] holds a microphone on stage, slightly swaying his body. [Stand-up comedian, humorous male voice]: "My gym trainer said the first step is the hardest... Lies! The first step is easy. It's the 5,000th step that's trying to murder you!" After finishing, the comedian shrugs and raises his hands. Background: Laughter and applause from the audience, with the camera focused on the comedian's face.
A scene in Antarctica with towering ice formations, the overall tone being a cold, white, frigid color palette. The glacier cracks with a loud noise, followed by the sound of ice shattering, as the engines of the research team's snowmobiles roar. The camera follows the retreating research team and the collapsing ice towers.
In a sports news studio, the screen behind the sports anchor is showing a basketball game replay.[Sports anchor] sits behind the news desk, tapping his fingers lightly on the table. [Sports anchor, clear and strong voice] says: "Look at this clutch play! He stepped up when it mattered most, hitting the shot that decided the championship! This game-winning shot sealed the victory outright." Background: Cheers from the live game, with the camera focusing on the sports anchor's face.
On a street stage, the audience stands around. [Young rapper] wears a silver chain and a black hoodie, swaying his body to the beat. [Young rapper, dynamic male voice] raps: "Yo, pavement to stage, flow lit, crowd goin’ wild! Mic in my grip, dreams unchained, let the rhythm ride! Raw vibe, sharp rhymes, keep the energy high—this is how we fly, no need to deny! Grind hard, spit fire, make the moment mine, street-born rhythm, let times shine!" The camera focuses on the young Caucasian rapper's movements.
In a cinematic rainy-day café, rain splashes against the window, with a cool, blue-green tone overall. [Blonde French woman] walks in and sits down, her hair slightly damp, gazing directly at the camera. [Blonde French woman, low voice]: "You don't remember the moment, you just remember the feeling." The camera then focuses on a bottle of golden perfume that appears in the center, zooming in on the blonde French woman's face.
Documentary vignettes, e-commerce explainers, highlight reels—lock the frame and let the model pace the narration, ambience and micro-sound design for you.

Documentary vignettes, e-commerce explainers, highlight reels—lock the frame and let the model pace the narration, ambience and micro-sound design for you.

Interviews, sketches, sitcom beats—whoever speaks gets the right face, voice and timing; switch roles without cross-talk or voice bleed.

Documentary vignettes, e-commerce explainers, highlight reels—lock the frame and let the model pace the narration, ambience and micro-sound design for you.

ASMR whispers, glossy ads, art-house shorts—drop impossible visuals, mood-matched SFX and micro-narrative into the same prompt and watch the surreal become real.




To get the best performance out of Kling 2.6 when creating talking-head or music-driven videos, treat the prompt like a miniature screenplay: tell the engine where we are, who is there, what they do, how they sound, and how you want it shot.
Stick to the order and punctuation shown below—Kling 2.6 was trained on exactly this syntax.
Scene (place & time) + Cast (who) + Action (what moves) + Audio (voice / music / SFX) + Look (style, mood, camera).
Generate perfect audio. You can design your prompt with reference to the following solutions.
Jessica, 27, Micro-Influencer, Austin
I typed ‘sun-kissed rooftop brunch, [F] “Is it too early for mimosas?” + playful + fast + high’ and Kling 2.6 spat out a reel where my own avatar clinked glasses to a fizzy pop—lips locked to every syllable. Posted at 9 a.m., hit 120 k views by lunch. My sponsor DM’d ‘more of those, please.I’m literally shaking.
Marcus, 34, Indie Rapper, Berlin
Fed the bar “Speed so fast, rhyme so sharp” + trap + confident, added a grimy U-Bahn backdrop. Kling 2.6 gave me back a clip where my virtual self spits inside a rattling carriage, hi-hats rattling the poles. Dropped it on TikTok—first 10 k streams in an hour, no studio, no engineer. My label guy just texted: ‘We’re shelving the MV budget.
Luna, 22, ASMRtist, Montreal
Prompted pine forest + birds & wind + wide outdoor reverb plus a whisper track. The video opens on macro dew, my lips brushing the mic, every breath synced to a distant woodpecker. Upload hit #1 on the ASMR tag overnight. Fans swear they smelled pine. I filmed it in pajamas at 2 a.m.”
Ethan, 41, SaaS Founder, San Jose
Needed a product demo by EOD. Typed: “bright loft office, [M] “Cut onboarding time to minutes” + confident + medium + normal,” dropped in our UI mock-up. Kling 2.6 returned a walk-through—cursor moves, voice lands, subtle whoosh on every click. Board loved it; our CAC dropped 18 %.
Chloe & Omar, 30/32, Travel-Vlog Couple, Valencia
Gave Kling 2.6 a honeymoon selfie on the beach, added “[Chloe, giggling] “We eloped!” [Omar, proud] “And saved you a seat at the after-party.” Output: waves crash in rhythm with their laughter, her veil flaps exactly on beat. Posted as our wedding announcement—Mom cried in Michigan, friends spammed heart emojis. We shot our love story in flip-flops. Kling 2.6 is the third person in this marriage, and we’re thrilled.
Dmitri, 45, Film-Teacher & Dad, Portland
My 12-year-old wanted a sci-fi short. We wrote: “neon garage lab, [Dad, excited] “Initiate warp drive!” [SFX: thrumming engine]. Kling 2.6 rendered a lens-flare masterpiece—our faces glowing, voices robotic, exhaust whooshing in 5.1. We premiered it on the living-room wall; popcorn everywhere.”
Kling 2.6 is the world’s first native audio-video diffusion model. Type one line or upload one image and it returns a 5–10-second, broadcast-ready clip where lip-synced speech, singing, ambient sound and on-screen motion are locked together—no editing suite, Foley session or re-recording required.
Every phoneme, mouth shape and micro-gesture is predicted in the same latent space as the soundtrack. The model stamps a frame-by-frame “audio-visual hash” so cadence, facial emotion and camera move never drift—even if you switch languages or swap voices mid-clip.
Yes. Use the mini-screenplay syntax:
M/F “Line.” + emotion + speed + pitch + “camera push-in 15 %.”
Kling 2.6 reads the order and punctuation you type and translates it straight into performance—no keyframes, no prompt engineering hacks.
Absolutely. Label each speaker or lyric line with [CN] or [EN] and the model will auto-switch phoneme sets, keeping lip shapes, accent color and rhyme scheme intact—perfect for cross-market ads or Chinese-English duets without manual dubbing.
Write it like a script:
Alex,angry “How could you!” Sam,calm “I told the truth.”
“Lyrics” + belting + joyful + pop
Object:door Action:slam + SFX:bang
Kling 2.6 renders dialogue, vocals and spot effects in a single pass, each source isolated yet phase-locked—no manual mixing required.
100 %. Every Kling 2.6 render comes with a worldwide, royalty-free license—ads, client pitches, streaming docs, resale, NFTs, no extra fees, no attribution needed.
