PVChat: Personalized Video Chat with One-Shot Learning
Abstract
PVChat introduces a novel approach to personalized video chat with one-shot learning, enabling accurate identity recognition and content understanding from minimal examples. Traditional video chat systems struggle with personalized information recognition, requiring extensive training data and often failing to distinguish between similar identities.
Our method leverages one-shot learning to achieve robust personalized video understanding, where the system can accurately recognize and respond to queries about specific individuals after seeing just a single example. The approach incorporates specialized data collection pipelines, identity-preserving video generation, and the novel ReMoH (Representation Modeling with Hierarchical features) technique for enhanced characteristic learning.
PVChat demonstrates superior performance on personalized video understanding benchmarks, significantly outperforming existing models in accurately answering questions about personalized information while maintaining strong general video understanding capabilities.
Key Contributions
- One-Shot Personalized Learning: First video chat system capable of accurate personalized understanding with just one example
- Systematic Data Collection: Comprehensive pipeline for generating high-quality personalized video data with identity preservation
- ReMoH Technique: Novel Representation Modeling with Hierarchical features for better specialized characteristic learning
- Robust Identity Recognition: Hard negative sampling strategy ensures accurate discrimination between similar identities
Method Overview
PVChat employs a systematic data collection and training pipeline specifically designed for personalized video understanding:
The framework incorporates the following key components:
- Identity-Preserving Generation: Uses DeepFaceLab and InterVideo2 for high-quality face processing and demographic characteristics
- Video Variation Synthesis: Employs ConsisID, LivePortrait, and PhotoMaker to generate diverse backgrounds and expressions
- Hard Negative Sampling: Selects challenging similar faces to improve discrimination capabilities
- ReMoH Training: Hierarchical feature modeling for enhanced characteristic learning
The framework addresses key challenges in personalized video chat: limited training examples, identity confusion with similar faces, and maintaining general video understanding. By leveraging one-shot learning and sophisticated data augmentation, PVChat achieves robust personalized video understanding capabilities.
Results & Performance
PVChat demonstrates state-of-the-art performance on personalized video understanding tasks:
- One-Shot Learning: Accurately recognizes and responds to queries about specific individuals from single examples
- Identity Discrimination: Successfully distinguishes between similar-looking individuals where other models fail
- Robust Understanding: Maintains strong performance across diverse backgrounds and expressions
- General Capability Retention: Preserves excellent general video understanding while adding personalized capabilities
The method's ability to achieve accurate personalized video understanding with minimal examples makes it particularly valuable for applications requiring personalized AI assistants, custom video analytics, and adaptive human-computer interaction systems.
Technical Innovation
PVChat introduces several technical innovations to address the unique challenges of personalized video chat:
- One-Shot Learning Framework: Enables rapid personalization from minimal examples
- Identity-Preserving Data Generation: Maintains consistent identity while varying context and expression
- Hard Negative Mining: Improves discrimination through strategic selection of challenging similar identities
- ReMoH Technique: Hierarchical representation modeling for enhanced characteristic learning
- Balanced Training Strategy: Maintains general video understanding while adding personalized capabilities
Conference Poster
Presented at ICCV 2025:
Impact & Applications
PVChat represents a significant advancement in personalized AI systems, with applications including:
- Personalized AI Assistants: Enable AI systems to recognize and respond to specific individuals with minimal training
- Video Analytics: Custom video understanding for security, monitoring, and content analysis
- Education Technology: Adaptive learning systems that recognize individual students and their progress
- Healthcare Applications: Patient-specific video monitoring and interaction systems
- Human-Computer Interaction: Natural and personalized interactions with AI systems
Citation
@inproceedings{shi2025pvchat,
title={PVChat: Personalized Video Chat with One-Shot Learning},
author={Shi, Yufei and Yan, Weilong and Xu, Gang and Li, Yumeng and Chen, Yucheng and Li, Zhenxi and Yu, Fei Richard and Li, Ming and Yeo, Si Yong},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
pages={23321--23331},
year={2025}
}
Published in ICCV 2025
IEEE/CVF International Conference on Computer Vision