Challenge Overview
In this grand challenge, we introduce Identity-Preserving Video Generation (IPVG) task, which maintains the consistency of given reference identity along text-to-video generation process. This year's IPVG grand challenge includes two tracks:
To further motivate and challenge the academic and industrial research communities, we have released a new dataset: Identity-Preserving Video Benchmark (VIP-200K), consisting of approximately 500,000 video-prompt pairs with 200,000 unique identities.
Task Description
This year we will focus on two tasks:
Facial identity-preserving text-to-video generation
Given videos and corresponding prompts plus reference identity facial images, the goal is to synthesize temporally consistent videos that align with prompts while maintaining identity preservation.
Full-body identity-preserving text-to-video generation
This track extends the first by enforcing identity-preserving constraints on hairstyle, face, clothing, and other attributes across generated frames.
Contestants must develop an identity-preserving video generation system based on the VIP-200K dataset. For evaluation, systems must generate at least one video per
Datasets
To formalize the task of identity-preserving text-to-video generation, we provide the following datasets:
Training Dataset
500,000 videos in VIP-200K, each coupled with a textual prompt and one or more identity images.
Testing Dataset
200 unseen person IDs. Each ID has portrait images and five textual prompts for video generation, totaling 1,000 test pairs.
Dataset | Context | Source | #Video | #IDs | #Hours | #Prompt |
---|---|---|---|---|---|---|
VIP-200K | Video-Prompt-identity triplets | Automatic crawling from web | 500,000 | 200,000 | 1,700 | 500,000 |
To access the dataset, please register via this form and download from
Submission Format
Each team can submit up to three runs, with one primary run evaluated for performance comparison. Submissions must follow this structure:
submission/ ├── id001/ │ ├── prompt1.mp4 │ ├── prompt2.mp4 │ ├── prompt3.mp4 │ ├── prompt4.mp4 │ └── prompt5.mp4 ... ├── id200/ │ ├── prompt1.mp4 │ ├── prompt2.mp4 │ ├── prompt3.mp4 │ ├── prompt4.mp4 │ └── prompt5.mp4
Videos must be encoded in H.264 and saved in MP4 format. Non-compliant submissions may be disqualified.
Evaluation Metric
Videos will be assessed based on:
Identity Preservation
Feature similarity with the reference identity image and manual annotation scores.
Video Quality
Evaluated via visual quality, motion dynamics, and text alignment using both objective metrics and human assessment.
Final scores combine objective and subjective evaluation results.
Participation
The challenge is team-based. Participants can enter one or both tracks. Teams can have multiple members, but individuals cannot be in multiple teams.
Top three teams per track will receive awards. Accepted submissions qualify for the conference's grand challenge award.
Timeline
Website & Call for Participation
Dataset release
Testing set release
Results submission
Evaluation results announcement
Paper submission deadline
Paper Submission
Please follow the ACM Multimedia 2025 Grand Challenge guidelines for paper submission.