ACM Multimedia 2025 Grand Challenge

Advancing the frontiers of identity-preserving generative video models

Challenge Overview

In this grand challenge, we introduce Identity-Preserving Video Generation (IPVG) task, which maintains the consistency of given reference identity along text-to-video generation process. This year's IPVG grand challenge includes two tracks:

Facial Identity-Preserving Video Generation
Full-body Identity-Preserving Video Generation

To further motivate and challenge the academic and industrial research communities, we have released a new dataset: Identity-Preserving Video Benchmark (VIP-200K), consisting of approximately 500,000 video-prompt pairs with 200,000 unique identities.

Task Description

This year we will focus on two tasks:

01

Facial identity-preserving text-to-video generation

Given videos and corresponding prompts plus reference identity facial images, the goal is to synthesize temporally consistent videos that align with prompts while maintaining identity preservation.

02

Full-body identity-preserving text-to-video generation

This track extends the first by enforcing identity-preserving constraints on hairstyle, face, clothing, and other attributes across generated frames.

Contestants must develop an identity-preserving video generation system based on the VIP-200K dataset. For evaluation, systems must generate at least one video per pair in the test set.

Datasets

To formalize the task of identity-preserving text-to-video generation, we provide the following datasets:

Training Dataset

500,000 videos in VIP-200K, each coupled with a textual prompt and one or more identity images.

Testing Dataset

200 unseen person IDs. Each ID has portrait images and five textual prompts for video generation, totaling 1,000 test pairs.

Dataset Context Source #Video #IDs #Hours #Prompt
VIP-200K Video-Prompt-identity triplets Automatic crawling from web 500,000 200,000 1,700 500,000

To access the dataset, please register via this form and download from HuggingFace Dataset

Submission Format

Each team can submit up to three runs, with one primary run evaluated for performance comparison. Submissions must follow this structure:

submission/
├── id001/
│        ├── prompt1.mp4
│        ├── prompt2.mp4
│        ├── prompt3.mp4
│        ├── prompt4.mp4
│        └── prompt5.mp4
...
├── id200/
│        ├── prompt1.mp4
│        ├── prompt2.mp4
│        ├── prompt3.mp4
│        ├── prompt4.mp4
│        └── prompt5.mp4

Videos must be encoded in H.264 and saved in MP4 format. Non-compliant submissions may be disqualified.

Evaluation Metric

Videos will be assessed based on:

01

Identity Preservation

Feature similarity with the reference identity image and manual annotation scores.

02

Video Quality

Evaluated via visual quality, motion dynamics, and text alignment using both objective metrics and human assessment.

Final scores combine objective and subjective evaluation results.

Participation

The challenge is team-based. Participants can enter one or both tracks. Teams can have multiple members, but individuals cannot be in multiple teams.

Top three teams per track will receive awards. Accepted submissions qualify for the conference's grand challenge award.

Timeline

March 8, 2025

Website & Call for Participation

March 15, 2025

Dataset release

June 5, 2025

Testing set release

June 20, 2025

Results submission

June 26, 2025

Evaluation results announcement

June 30, 2025

Paper submission deadline

Paper Submission

Please follow the ACM Multimedia 2025 Grand Challenge guidelines for paper submission.