Alibaba's AI video model that topped the global Artificial Analysis leaderboard on an anonymous debut.
Last verified · 2026-06-23 · by Moe Ameen
HappyHorse-1.0 is an AI video generation model from Alibaba. It first drew attention in early April 2026 when it appeared anonymously on the Artificial Analysis Video Arena — a blind, head-to-head leaderboard where people vote on which of two clips better matches a prompt — and climbed to the top of the text-to-video and image-to-video rankings without naming its maker. On April 10, 2026, Alibaba confirmed it built the model, according to reporting from CNBC, Bloomberg, and Caixin. The work is attributed to a team inside Alibaba's Taotian Group (its Future Life Lab), led by Zhang Di, a video-generation veteran from the Chinese AI scene.
The model handles both text-to-video and image-to-video in one pipeline. Its standout technical claim is native, single-pass audio: instead of generating silent footage and adding sound later, it produces video and synchronized audio together, including dialogue with on-screen lip-sync across several languages. Reporting and early access notes describe short clips — on the order of five to eight seconds — at up to 1080p in standard vertical and landscape aspect ratios. Treat any single benchmark score, clip length, or resolution figure as a snapshot; these shift as the model updates.
It helps to keep two Alibaba video efforts separate. HappyHorse is a hosted model that topped the leaderboard. Wan (Tongyi Wanxiang) is Alibaba's open-weight video line, also highly ranked but distinct — different weights, different access, different version numbers. When someone says "Alibaba's AI video model," they may mean either, so confirm which one a tool or article is referring to.
Access has been rolling out rather than launching all at once: limited testing first, then broader availability through partners like fal.ai and API access expected via Alibaba Cloud's Model Studio. Pricing had not been officially fixed at the time of writing and is metered by usage (per second of generated video) rather than a flat subscription, so check the provider you use for current rates.
HappyHorse gets you a striking few-second clip — increasingly with its own audio and lip-sync. What it does not get you is a finished, on-brand post sized for each platform, or the supporting content a launch needs around it. That last mile is the whole job, and it is where Kompozy takes over. Drop a HappyHorse export into Kompozy and it burns in branded captions, reframes the clip to each destination's aspect ratio, and composites it with b-roll or a music bed into a Clipped Short or Marketing Short. Then it fans the same idea into a carousel, a quote card, and platform-native captions written in your voice through the Persona Brief — and schedules and publishes the whole set across the nine connected platforms (TikTok, Reels, YouTube Shorts, Facebook, LinkedIn, X, Pinterest, Threads) from one queue.
The pairing also hedges against model churn. A leaderboard No. 1 can change month to month, so building your recurring format on whichever generator is ranked first is fragile. Kompozy generates the video HappyHorse can't — a HeyGen talking-head Persona Short with auto-captions, an avatar composited into a brand-exact HyperFrames template, a listicle video over a portrait clip — so your on-brand identity stays put while you swap in HappyHorse clips as accent footage whenever it earns the shot. The model makes the scene; Kompozy makes the schedule.
It is an AI video generation model from Alibaba that handles text-to-video and image-to-video. It appeared anonymously on the Artificial Analysis Video Arena in early April 2026 and climbed to No. 1; Alibaba confirmed on April 10, 2026 that it built the model, with hosted access rolling out via partners and Alibaba Cloud Model Studio.
No. HappyHorse is a separate, hosted model that topped the leaderboard. Wan (Tongyi Wanxiang) is Alibaba's open-weight video line — also strong but distinct, with different weights, versions, and access. Confirm which one a tool or article means before relying on its specs.
Yes. Its notable feature is native, single-pass audio — it generates video and synchronized sound together, including spoken dialogue with on-screen lip-sync in several languages, rather than adding audio after the fact. Capabilities evolve, so verify the current language list and limits with your provider.
Pricing had not been officially fixed at the time of writing. Access is metered by usage — per second of generated video through providers like fal.ai or Alibaba Cloud — rather than a flat subscription, and comparable models run roughly a few cents to about half a dollar per second. Check the provider you use for current rates.
HappyHorse generates the clip but does not publish it. Bring the export into Kompozy to add branded captions, reframe it per platform, composite it into a Clipped Short or Marketing Short, fan it into a carousel and captions in your voice, and schedule and publish across TikTok, Reels, YouTube Shorts, X, LinkedIn, and more from one queue.