Wan2.5 - Ali Tongyi’s New Multimodal Generative Model Officially Launched
September 25, 2025 | Zoey
What is Wan2.5?
Wan2.5 (Tongyi Wanxiang 2.5) is Alibaba's newest multimodal generative model, currently existing as the Wan2.5-Preview version.
The model has four essential functions that vary in complexity, including the following: text-to-video, image-to-video, text-to-image, and image editing. Additionally, for the first time, it can generate video content based on the audio and video simultaneously. It even supports 1080P HD, 24 frames per second (fps) video creation and automatically generates voiceovers, sound effects, or background music to accompany the image.
Wan2.5 can also generate content from simple Chinese and English text, complex charts, and art posters using one-click image editing, thereby further lowering the barrier to entry. Its natural native multimodal architecture also allows users to drive creation simply via text prompts or audio.
With is powerful generative capabilities, Wan2.5 has captured the attention of advertisers, platforms like e-commerce, and other media, such as film and television, and can now be easily access via the Tongyi Wanxiang and Alibaba Cloud Bailian Platform.
Main functions of Wan2.5
1. Video Generation: Wan2.5 has native support for audio-visual synchronization, producing high-fidelity audio (including human voice, ASMR, ambient sound, and music), supporting multiple languages, and allowing for audio-driven video generation. Video quality has seen a major upgrade, with support now for cinematic-quality video up to 10 seconds, at 1080P and 24fps. This expands capacity for temporal and spatial information, improves narrative capabilities, and optimizes dynamic expression and structural stability.
2. Image Generation: The model enhances text and image integration even further, accurately presenting visual effects/styles, producing more stable text output, and producing complex non-pictorial structures such as charts, flow charts, and architecture diagrams.
3. Image Editing: Supporting conversational interaction, users can modify/create single/multiple images. Natural language understanding and command-following capabilities have improved quite a bit, combining input images and prompts to generate images with reasoning abilities or effects (either within images or video).
How to use Wan2.5?
- Go to the official website: Open the official Tongyi Wanxiang website/ viddo ai wan-2.5, complete account registration, and log in.
- Choose a module: After logging in, visit the homepage and choose a module, such as "Video Generation," "Image Generation," or "Image Editing," based on your needs.
- Add instructions or upload materials: After the module is open, follow the prompts to enter a text description or upload images, audio, or other materials.
- Generate content: Click the "Generate" button, and wait as the system processes your request and generates your content. Once the system has finished generating your videos, images, etc., you can view, download, or share your final results.
Application scenarios of Wan2.5
- Advertising Production: Advertising agencies can generate creative videos and imagery that closely mimic standard advertising themes, greatly increasing efficiency and diversity in content production at reduced cost.
- E-commerce Content Creation: E-commerce platforms and merchants can generate dynamic product exhibit videos and posters which increase the attractiveness of the product and increase user purchase intent.
- Film and Television Production: Film and Television production teams are able to generate preliminary video scripts, scene designs, and special effects previews thus aiding directors and screenwriters in validating their ideas in a short amount of time while mitigating risk aversion/enhancing marketing communications.
- Educational Content Creation: Educational institutions and teachers can generate lively instructional content such as videos, scientific diagrams, and flow charts which increase the ease of understanding content while enhancing the learner experience.
Generate Case
Video generation cases:
Prompt: Examine the details of the lipstick and zoom in. The camera needs to have macro-level zoom and rotation, smoothly focusing on the subtle shimmer and texture of the lipstick. The soundtrack also needs to be strong, with rhythmic, drum beats, and a high-gloss atmosphere that creates an advertising-like sound effect.
Prompt: Solo Panda gives a tour of the Golden Gate Bridge
Image generation/image editing cases:
It can change hairstyle, change hair color, change background, 3D figure generation, one-click style transfer, etc.
Conclusion
In summary, Wan 2.5 in tandem with Midjourney V7 allows Quark "Create" AI to offer not only beautiful image generation but also brilliant success and capability for video creation, including audio-video sync and sound effects. This combines well for commercial spots, creative shorts, or fun imaginative scenes: for each project, it can easily adapt! Additionally, with a modest price point and the ability to try for free, creators can let their imagination flow and journey into possibilities. Here at Quark, we can hardly wait to see what your AI assist can do for your originality and creativity, or your brand new creativity and fresh unique creation!