OimiAI
BlogPricing
·5 min read·Oimi AI

Oimi AI Launches Alibaba HappyHorse: Text-to-Video, First/Last Frame & Reference Image Generation

Alibaba HappyHorseOimi AIAI VideoText to VideoFirst/Last FrameReference ImageVideo Generation

Alibaba's HappyHorse has become one of the most talked-about AI video models. The anonymous model that topped the Artificial Analysis Video Arena with an ELO of 1332 has been confirmed to come from Alibaba.

Today, Oimi AI officially launches all three HappyHorse video generation modes, allowing users to experience them with one click in the infinite Canvas:

Try HappyHorse Video Generation Now

Combine image, video, music, and text workflows seamlessly in a single creative space.

Open Oimi Canvas

What is Alibaba HappyHorse?

HappyHorse-1.0 is an AI video generation model from Alibaba that entered the Artificial Analysis Video Arena under a pseudonym in April 2026. It quickly claimed the #1 spot on the Text to Video leaderboard with an ELO score of 1332 — a 59-point lead over the runner-up. It also ranks #1 in Image to Video (no audio).

With exceptional motion consistency, physical realism, and long-take stability, HappyHorse is widely regarded as one of the strongest AI video models of 2026, surpassing Seedance 2.0, Kling 3.0, and Sora 2 in certain scenarios.

Three HappyHorse Modes on Oimi AI

Oimi AI integrates all three HappyHorse generation modes in its infinite Canvas, each designed for different creative workflows:

1

Text-to-Video (T2V)

Describe what you want in natural language, and HappyHorse generates a high-quality video from scratch. Perfect for rapid ideation and creative exploration.

  • Aspect ratios: 16:9, 9:16, 1:1, 4:3, 3:4
  • Duration: 3–15 seconds
  • No images required — pure text-driven generation
2

First & Last Frame Image-to-Video (I2V)

Upload a first frame or first + last frame pair, and HappyHorse generates a smooth transition video between them. This is one of the most popular AI video workflows — you get precise control over the start and end of your video.

  • Supports 1–2 images (first frame, last frame)
  • Aspect ratio follows the first frame image
  • Duration: 3–15 seconds

Typical workflow: generate start and end frames using GPT-Image-2 or Nano Banana Pro in the Oimi Canvas, then connect them to a HappyHorse video node for one-click animation.

3

Reference Image-to-Video (R2V)

Upload up to 9 reference images along with a text prompt, and HappyHorse generates a new video that matches the style and content of your references. Ideal for maintaining visual consistency across shots.

  • Supports up to 9 reference images
  • Aspect ratios: 16:9, 9:16, 3:4, 4:3, 1:1
  • Duration: 3–15 seconds

How to Use HappyHorse on Oimi AI

The Oimi Canvas workflow makes HappyHorse incredibly intuitive. Three steps:

  1. Open Oimi Canvas — go to Oimi Canvas, double-click to add a video node
  2. Select HappyHorse from the model dropdown — choose from three generation modes: HappyHorse T2V, HappyHorse I2V (First/Last Frame), or HappyHorse R2V (Reference Image)
  3. Configure and generate — enter your prompt, select duration (3–15s) and aspect ratio, then hit generate

For I2V or R2V modes, simply drag image nodes into the canvas and connect them to the video node. Oimi Canvas automatically detects upstream images — no manual uploading needed.

HappyHorse vs Seedance 2.0 vs Sora 2: Which to Choose?

Oimi AI offers HappyHorse, Seedance 2.0, Sora 2, Grok3, Veo 3.1 and other top-tier video models. Switch freely based on your needs:

FeatureHappyHorseSeedance 2.0Sora 2
Text-to-Video
First/Last Frame✅ 2 frames✅ 2 frames
Reference Images✅ up to 9✅ up to 4
Max Duration15s15s15s
ELO Ranking#1#2#20
Aspect Ratios5 options7 options1 option

For the highest video quality, HappyHorse is the top choice. For audio-enabled generation, Seedance 2.0 with audio mode is the way to go. In the Oimi Canvas, you can mix and match models freely to build a complete image-to-video production pipeline.

Why Use HappyHorse on Oimi AI Canvas?

  • Multi-model aggregation — HappyHorse + Seedance 2.0 + Sora 2 + Grok3 + Veo 3.1 + GPT-Image-2 + NanoBananaPro and more, all in one place
  • Visual workflow — Drag-and-drop infinite canvas supporting video, image, audio, and text nodes. Connect them to build complete workflows.
  • First/Last Frame workflow — Generate start and end frames with image models, then pipe them into HappyHorse for smooth transitions
  • Up to 9 reference images — Fully leverage HappyHorse's multi-reference R2V capability

Experience HappyHorse on Oimi Canvas

Combine image, video, music, and text workflows seamlessly in a single creative space.

Open Oimi Canvas

Frequently Asked Questions

What video generation modes does HappyHorse support?

HappyHorse supports three modes: Text-to-Video (T2V, generate video from text alone), Image-to-Video with first/last frame control (I2V, upload 1-2 images as start/end frames), and Reference Image-to-Video (R2V, upload up to 9 reference images combined with text prompts).

How long can HappyHorse videos be?

On Oimi AI, HappyHorse supports generating videos from 3 to 15 seconds. You can choose the duration freely.

What aspect ratios does HappyHorse support?

Text-to-Video (T2V) and Reference Image mode (R2V) support 16:9, 9:16, 1:1, 4:3, and 3:4. First/Last Frame mode (I2V) automatically follows the aspect ratio of the first frame image.

Is HappyHorse better than Seedance 2.0?

According to the Artificial Analysis blind-test leaderboard, HappyHorse ranks #1 in non-audio video generation (ELO 1332), with Seedance 2.0 at #2. HappyHorse has a slight edge in visual quality and motion consistency, while Seedance 2.0 performs better in audio-included generation.

How do I use HappyHorse on Oimi AI?

Go to the Oimi Canvas page, add a video node, and select "HappyHorse" from the model dropdown. You can drag in image nodes and connect them to use the first/last frame and reference image features.

Recommended Reading