Skip to content

Official repository for 3DXTalker: An Integrated Framework for Expressive 3D Talking Avatars

License

Notifications You must be signed in to change notification settings

EngineeringAI-LAB/3DXTalker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

3DXTalker: Unifying Identity, Lip Sync, Emotion, and Spatial Dynamics in Expressive 3D Talking Avatars

PyTorch arXiv Project Page HF Dataset

teaser

3DXTalker generates identity-consistent, expressive 3D talking avatars from a single reference image and speech audio, achieving accurate lip synchronization, expressive emotion control, and natural head-pose dynamics. It achieves expressive facial animation through data-curated identity modeling, audio-rich representations, and spatial dynamics controllability. By introducing frame-wise amplitude and emotional cues beyond standard speech embeddings, 3DXTalker delivers superior lip synchronization and nuanced expression modulation. Built on a flow-matching transformer architecture, it enables natural head-pose motion generation while supporting stylized control, integrating lip synchronization, emotional expression, and head-pose dynamics within a unified framework.

TODO

  • Release the 3DTalking benchmark dataset
    • Release the raw dataset
    • Release the processed dataset
  • Release the data processing code
  • Release the training and inference code
  • Release the pretrained models

Installation

  • Python 3.10
  • Pytorch 2.2.2
  • CUDA 12.1
  • Pytorch3d 0.7.7
conda create -n env_3DXTalker python==3.10
conda activate env_3DXTalker
pip install -r requirements.txt

For some people the compilation fails during requirements install and works after. Try running the following separately:

pip install "git+https://github.com/facebookresearch/pytorch3d.git@v0.7.7"

Download Pretrained Audio Encoders

  1. Download emotion2vec_plus_base model and place it in ./pretrained_models/:

    # Create directory
    mkdir -p pretrained_models/emotion2vec_plus_base
    
    # Option 1: Using git-lfs (recommended)
    cd pretrained_models
    git lfs install
    git clone https://huggingface.co/iic/emotion2vec_plus_base
    
    # Option 2: Manual download from https://huggingface.co/iic/emotion2vec_plus_base
    # Download all files to ./pretrained_models/emotion2vec_plus_base/
  2. Download microsoft/wavlm-base-plus (audio encoder):

    # Option 1: Auto-download on first run (recommended)
    # The model will be automatically downloaded from HuggingFace when you run training
    
    # Option 2: Pre-download manually
    cd pretrained_models
    git lfs install
    git clone https://huggingface.co/microsoft/wavlm-base-plus
    
    # Then update config/default_config.yaml:
    # audio_encoder_repo: './pretrained_models/wavlm-base-plus'

    Expected directory structure:

    pretrained_models/
    ├── emotion2vec_plus_base/
    │   ├── config.json
    │   ├── pytorch_model.bin
    │   └── ...
    └── wavlm-base-plus/          # Optional (auto-downloads if not present)
        ├── config.json
        ├── pytorch_model.bin
        └── ...
    

Data Preparation and Preprocess

  1. Download raw video datasets following these links: V0-GRID; v1-RAVDESS; V2-MEAD; V3-VoxCeleb2; V4-HDTF; V5-Celebv-HQ

    If you don't want to process the data manually, we also provide processed data at Hugging Face.

  2. Run data curation (duration, noise, language, sync, resolution normalization).

  • Edit raw_video_dir in data_prepare/data_curation_pipeline.py to your raw video folder.
cd data_prepare
python data_curation_pipeline.py

Output will be in data_prepare/final_curated_videos/.

  1. Rename videos for dataset indexing.
  • Edit dataset_name, input_dir, and output_dir in data_prepare/rename.py if needed.
  • By default it expects input at data_prepare/Scaled_videos and outputs to data_prepare/Renamed_videos.
cd data_prepare
python rename.py
  1. Download EMOCA-related assets (models and FLAME files).
bash gdl_apps/EMOCA/demos/download_assets.sh
  1. Run EMOCA reconstruction to extract FLAME parameters.
  • Edit data_root_dir and dataset_name in gdl_apps/EMOCA/demos/my_recons_video.py.
  • data_root_dir should contain <dataset_name>/all_videos_path.txt.
python gdl_apps/EMOCA/demos/my_recons_video.py \
  --dataset_name VoxCeleb2 \
  --output_folder video_output \
  --model_name EMOCA_v2_lr_mse_20
  1. Data structures are provided in DATASET_STRUCTURE.md

About

Official repository for 3DXTalker: An Integrated Framework for Expressive 3D Talking Avatars

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages