What is HappyHorse-1.0? A Preview of the Next Generation AI Video Model
April 8, 2026 | Zoey
HappyHorse-1.0 is the latest winner of the AI analytics leaderboard. However, there are currently no known details regarding the development team which created HappyHorse. All of the confirmation has provided confirmation but there are many unknowns about the use of HappyHorse-1.0.
The AI Video Model competition has received a lot of attention due to the fact that one of the submissions, HappyHorse - 1.0, was previously unreported. It is surprising that HappyHorse - 1.0 with no known development team or brand could score highly enough and win the competition for the use of AI video. This has created numerous discussions about the AI Video Industry that were informative and interesting.
The scoring of this competition is unlike any other competition in existence, in that it is based solely upon anonymous user votes and the Elo rating system, rather than by manufacturers' self-assessments or released data. Thus, it is seen as somewhat less centralised than comparable reference indicators, but it is also subject to increased sample bias or short-term fluctuations.
What was the starting point for HappyHorse-1.0's sudden popularity?
Recently there has been a lot of interest in the AI Video Model Competition, mainly due to the surprise winning entry 'HappyHorse - 1.0' which had previously received no publicity whatsoever. It was a real surprise for an entry created without an identified development team or brand to rate high enough to win the competition. This has resulted in numerous interesting and informative conversations within the AI video industry.
The scoring of this competition is unlike any other competition in existence, in that it is based solely upon anonymous user votes and the Elo rating system, rather than by manufacturers' self-assessments or released data. Thus, it is seen as somewhat less centralised than comparable reference indicators, but it is also subject to increased sample bias or short-term fluctuations.
It was under these evaluation conditions that HappyHorse-1.0 suddenly topped the list:
· Text to video (without audio): Elo 1333, ranked #1;
· Image to video (without audio): Elo 1392, ranked #1;
· Tasks with audio: all ranked #2.
HappyHorse has scored 30 to 60 more points than Seedance 2.0, and these differences were obtained without using any audio in the tasks; this is a considerable lead in Elo (60 points is a significant point differential; winning at that level would produce a win percentage of approximately 58 to 59%).
However, this sudden surge in popularity still needs to be viewed rationally:
Fewer respondents were part of the recent survey which causes scores to fluctuate more widely. We do not know how stable this new voting scale is, since it has not been released yet. Adding audio reduced this model's advantage considerably, and we actually lost ground to it after that addition.
Thus far, HappyHorse-1.0 has been shown to be an outstanding performer within the context of a very reliable evaluation platform, in case you are not aware of it, and can be referenced.
Regarding HappyHorse-1.0: Currently known information
Much of this information has come from the respective brands' websites ( happyhorse-ai.com ) and should be noted; currently (as of 4/8/2022), all technical data is unverified as to exact accuracy by any independent company.
1. Model Architecture (Unverified)
Happy Horse 1.0 consists of 40 Transformer self—attention layers, as reported by another website. The first four and last four layers are dedicated to processing different modalities. There are 32 middle layers where parameters are shared across text, images, video, and audio. All 4 modalities are modeled jointly via one unified token sequence. No classic cross—attention techniques have been used. Another website estimates the parameter size at about 15 billion (15B), but this has not been verified consistently.
2. Multilingual audio and video generation capability (unverified)
Accordingly, the official documentation lists many languages that can be generated using multiple formats of sound and image such as those detailed below, including but not limited to:
Chinese, English, Japanese, Korean, German, and French. Some page descriptions also include these additional languages for Cantonese, as well as improved lip-synchronization.
As of this writing, there are no existing publicly available models/program interfaces or replicable tests for the above-described capabilities; therefore the only evidence that these abilities do exist is the presence of an official document stating so.
3. Uniformly generated pipelines (partially consistent with observed results)
The HappyHorse-1.0 project consists of a single architecture, called "the HappyHorse," which can perform T2V, I2V, and generate both image + video in a combined method. There is clear evidence of how well it has performed on the leaderboard by being listed multiple times with the same model name for many different tasks. As such, there is a strong likelihood that there are not multiple independent models which were combined to create the HappyHorse; but instead, that it was one complete system from the very beginning.
Additionally, there is still evidence of some audio generation capabilities from the HappyHorse, given that it has ranked 2nd so far in the leaderboard in "Including Audio." Thus, it has not yet reached total control over audio generation.
Unverified Information
AI Model Team: The team responsible for developing this AI model is not currently known and may be based somewhere in Asia. However, no organization has provided evidence of ownership or any claim to the model.
Open Source: The official web-site of the AI model, states that "weights and inference code are completely open" however the GitHub and Hugging Face links are currently displaying "Coming Soon" at this time.
Hardware Requirements and Performance: Inference for a 5sec clip took 1.8 seconds at a resolution of 256p and took approximately 36s at a resolution of 1080p (the data has not yet been validated by others).
Speculation on WAN 3.0: Why Does It Remain Unresolved?
There are many rumors going around the community about PhantomLens-2.3 being a test version in development (i.e. not for public use) of the next generation (V2) model of Alibaba’s WAN series, based on the premise that the release pattern appears to mimic the previous "Pony Alpha/GLM-5" release; however:
Below are the items that refute that theory; therefore this remains speculation with no evidence or confirmation by any internal sources:
Architecture design looks nothing like that of any previously released WAN models.
No model weight or any API fingerprints have ever been released.
There has been no information confirming such from any internal sources.
Why "Mysterious Origins" Are Irrelevant to Quality Signals
Elo ratings are determined by the quality of blind testing outputs and not by the developer team's reputation or background. Voters do not know what exact model they are voting on. If a model does well in blind testing, then its ability to create videos has been proven regardless of who made it.
No Access: Current State Undefined
As of April 2026, PhantomLens-2.3:
· Has no public API
· Has no downloadable weights
· Has no pricing information or Service Level Agreements (SLA).
Performs well on leaderboard results but is not yet available for production use.
Signs of Progress
1. Downloadable weights and inference code
2. Hugging Face model card and license
3. Publicly available API with documented pricing.
If any of the above become available, PhantomLens-2.3 could be a legitimate solution.
Positioning within the Current Video Generation Landscape
As of early April 2026, the top five providers on the T2V (Text-to-Video) leaderboard (excluding audio) are ranked as follows:
(1) PhantomLens-2.3, with an Elo score of 1340 (no public API)
(2) Seedance 2.0 720p, with an Elo score of 1273 (no public API)
(3) SkyReels V4, with an Elo score of 1245 (offers a public API; $7.20/minute)
(4) Kling 3.0 1080p Pro, with an Elo score of 1241 (offers a public API; $13.44/minute)
(5) PixVerse V6, with an Elo score of 1240 (offers a public API; $5.40/minute).
Although PhantomLens-2.3 boasts the highest overall quality, SkyReels V4, Kling 3.0, and PixVerse V6 currently offer users the most user-friendly and convenient solutions.
Frequently Asked Questions
Who is the creator of "PhantomLens-2.3" in the community?
"I don't know." Some people think that it was created by an Asian development team; however, there has been no public confirmation of this.
Is "PhantomLens-2.3" currently available for use?
"Not yet!" Both the GitHub and HuggingFace pages still say "coming soon."
Is "PhantomLens-2.3" the same product as "WAN version 3?"
"We cannot confirm at this point." The speculation comes from both releases being similar in terms of their release processes.
How will the leaderboards be calculated?
"Blind votes + Elo Scores."
When will the weights be released for "PhantomLens-2.3?"
"I cannot confirm any details regarding when the weights will be available." There are no public commitments at this time.
However, the numbers on the leaderboard represent valid/accurate data; the current remaining information has yet to be confirmed.

