PVChat: Personalized Video Chat with One-Shot Learning

Yufei Shi1,5† Weilong Yan2† Gang Xu4 Yumeng Li3 Yucheng Chen1,5 Zhenxi Li1,5 Fei Richard Yu4 Ming Li4(✉) Si Yong Yeo1,5(✉)
1MedVisAI Lab    2National University of Singapore    3Nankai University    4Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ)    5Lee Kong Chian School of Medicine, Nanyang Technological University
Equal contribution.    (✉) Email: yufei005@e.ntu.edu.sg, yanweilong@u.nus.edu
Video demonstration of PVChat's personalized video chat capabilities

Abstract

PVChat introduces a novel approach to personalized video chat with one-shot learning, enabling accurate identity recognition and content understanding from minimal examples. Traditional video chat systems struggle with personalized information recognition, requiring extensive training data and often failing to distinguish between similar identities.

Our method leverages one-shot learning to achieve robust personalized video understanding, where the system can accurately recognize and respond to queries about specific individuals after seeing just a single example. The approach incorporates specialized data collection pipelines, identity-preserving video generation, and the novel ReMoH (Representation Modeling with Hierarchical features) technique for enhanced characteristic learning.

PVChat demonstrates superior performance on personalized video understanding benchmarks, significantly outperforming existing models in accurately answering questions about personalized information while maintaining strong general video understanding capabilities.

PVChat One-Shot Learning Examples
Figure 1. Examples of PVChat's ability with one-shot learning (e.g., <Nz> and <Ab>). PVChat can answer questions about the personalized information correctly while other models fail.

Key Contributions

  • One-Shot Personalized Learning: First video chat system capable of accurate personalized understanding with just one example
  • Systematic Data Collection: Comprehensive pipeline for generating high-quality personalized video data with identity preservation
  • ReMoH Technique: Novel Representation Modeling with Hierarchical features for better specialized characteristic learning
  • Robust Identity Recognition: Hard negative sampling strategy ensures accurate discrimination between similar identities

Method Overview

PVChat employs a systematic data collection and training pipeline specifically designed for personalized video understanding:

PVChat Data Collection Pipeline
Figure 2. The systematic data collection pipeline. For positive data collection, the original videos are processed by DeepFaceLab for high-quality face and InterVideo2 for demographic characteristics, which boost identity preservation. ConsisID and LivePortrait with PhotoMaker utilize the identity information to generate videos of various backgrounds or different motion/expression, respectively. For model's robust perception, hard negative samples are selected from either similar face retrieval to generate negative videos, or sampled from the CelebV-HQ dataset. These negative samples guarantee the model's accurate recognition of both identity and content.

The framework incorporates the following key components:

  • Identity-Preserving Generation: Uses DeepFaceLab and InterVideo2 for high-quality face processing and demographic characteristics
  • Video Variation Synthesis: Employs ConsisID, LivePortrait, and PhotoMaker to generate diverse backgrounds and expressions
  • Hard Negative Sampling: Selects challenging similar faces to improve discrimination capabilities
  • ReMoH Training: Hierarchical feature modeling for enhanced characteristic learning
PVChat Training Pipeline and ReMoH
Figure 3. (a) The training pipeline of our method. (b) The proposed ReMoH technique for better specialized characteristics learning.

The framework addresses key challenges in personalized video chat: limited training examples, identity confusion with similar faces, and maintaining general video understanding. By leveraging one-shot learning and sophisticated data augmentation, PVChat achieves robust personalized video understanding capabilities.

Results & Performance

PVChat demonstrates state-of-the-art performance on personalized video understanding tasks:

  • One-Shot Learning: Accurately recognizes and responds to queries about specific individuals from single examples
  • Identity Discrimination: Successfully distinguishes between similar-looking individuals where other models fail
  • Robust Understanding: Maintains strong performance across diverse backgrounds and expressions
  • General Capability Retention: Preserves excellent general video understanding while adding personalized capabilities

The method's ability to achieve accurate personalized video understanding with minimal examples makes it particularly valuable for applications requiring personalized AI assistants, custom video analytics, and adaptive human-computer interaction systems.

Technical Innovation

PVChat introduces several technical innovations to address the unique challenges of personalized video chat:

  • One-Shot Learning Framework: Enables rapid personalization from minimal examples
  • Identity-Preserving Data Generation: Maintains consistent identity while varying context and expression
  • Hard Negative Mining: Improves discrimination through strategic selection of challenging similar identities
  • ReMoH Technique: Hierarchical representation modeling for enhanced characteristic learning
  • Balanced Training Strategy: Maintains general video understanding while adding personalized capabilities

Conference Poster

Presented at ICCV 2025:

Impact & Applications

PVChat represents a significant advancement in personalized AI systems, with applications including:

  • Personalized AI Assistants: Enable AI systems to recognize and respond to specific individuals with minimal training
  • Video Analytics: Custom video understanding for security, monitoring, and content analysis
  • Education Technology: Adaptive learning systems that recognize individual students and their progress
  • Healthcare Applications: Patient-specific video monitoring and interaction systems
  • Human-Computer Interaction: Natural and personalized interactions with AI systems

Citation

@inproceedings{shi2025pvchat,
  title={PVChat: Personalized Video Chat with One-Shot Learning},
  author={Shi, Yufei and Yan, Weilong and Xu, Gang and Li, Yumeng and Chen, Yucheng and Li, Zhenxi and Yu, Fei Richard and Li, Ming and Yeo, Si Yong},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  pages={23321--23331},
  year={2025}
}

Published in ICCV 2025

IEEE/CVF International Conference on Computer Vision