MTID: Masked Temporal Interpolation Diffusion For Procedure Planning

MTID: Masked Temporal Interpolation Diffusion For Procedure Planning

Yufan Zhou, Zhaobo Qi, Lingshuai Lin, Junqi Jing, Tingting Chai, Beichen Zhang, Shuhui Wang, Weigang Zhang

Paper Teaser

We propose the Masked Temporal Interpolation Diffusion (MTID) model for procedure planning in instructional videos. The core concept is utilizing intermediate latent visual features, generated by a latent space temporal interpolation module, to provide comprehensive visual information for mid-state supervision. These generated visual features are directly fed into the action reasoning model, ensuring effective application of intermediate supervision to the current action reasoning task through end-to-end training.

Environment Setup

In a conda env with cuda available, run:

conda create --name MTID python==3.10
conda activate MTID
pip install -r requirements.txt

Data Preparation

CrossTask&COIN&NIV

Download datasets&features

cd ./dataset/{dataset_name}
bash download.sh

dataset_name = crosstask, coin, NIV

Or you can find the datasets from the huggingface.

Train

Train transformer for task category prediction wiht single GPU.

python train_mlp.py --name=traim_mlp_test --dataset=crosstask_how --gpu=0 --horizon=3

The trained transformer will be saved in ./save_max_mlp and json files for training and testing data will be generated. Then run temp.py to generate json files with predicted task class for testing:

Then run temp.py :

python temp.py --num_thread_reader=1 --resume --batch_size=32 --gpu=0 --batch_size_val=32 --ckpt_path=/path

Train MTID: Move the file generated by temp.py to the specified location in dataset/environments_config.json and run:

python main_distributed.py --dataset=crosstask_how --name=main_test --gpu=0 --base_model=predictor --horizon=3

To train the $Deterministic$ and $Noise$ baselines, you need to modify temporalPredictor.py to remove 'time_mlp' modules and modify diffusion.py to change the initial noise, 'training' functions and p_sample_loop process.

Inference

Note: Numbers may vary from runs to runs for PDPP and $Noise$ baseline, due to probalistic sampling.

For Metrics

All results have been printed to the log files in the out folder. If you want to perform inference separately, you can use the following command:

python inference.py --resume --base_model=predictor --gpu=0 --ckpt_path=/path

For probabilistic modeling

To evaluate the $Deterministic$ and $Noise$ baselines, you need to modify temporalPredictor.py to remove 'time_mlp' modules and modify diffusion.py to change the initial noise and p_sample_loop process. For $Deterministic$ baseline, num_sampling(L26) in uncertain.py should be 1.

Modify the checkpoint path(L348) as the evaluated model in uncertain.py and run:

nohup python uncertain.py --gpu=1 --num_thread_reader=1 --cudnn_benchmark=1 --pin_memory --base_model=predictor --resume --batch_size=32 --batch_size_val=32 --evaluate > out/result.log 2>&1 &

Citation

@inproceedings{
  zhou2025masked,
  title={Masked Temporal Interpolation Diffusion for Procedure Planning in Instructional Videos},
  author={Yufan Zhou and Zhaobo Qi and Lingshuai Lin and Junqi Jing and Tingting Chai and Beichen Zhang and Shuhui Wang and Weigang Zhang},
  booktitle={ICLR},
  year={2025},
}

Acknowledgement

We appreciate the authors of PDPP, diffusers to share their code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MTID: Masked Temporal Interpolation Diffusion For Procedure Planning

MTID: Masked Temporal Interpolation Diffusion For Procedure Planning

Yufan Zhou, Zhaobo Qi, Lingshuai Lin, Junqi Jing, Tingting Chai, Beichen Zhang, Shuhui Wang, Weigang Zhang

Paper Teaser

Environment Setup

Data Preparation

CrossTask&COIN&NIV

Train

Inference

For Metrics

For probabilistic modeling

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
dataloader		dataloader
dataset		dataset
figure		figure
model		model
utils		utils
.gitignore		.gitignore
README.md		README.md
inference.py		inference.py
main_distributed.py		main_distributed.py
requirements.txt		requirements.txt
temp.py		temp.py
train_mlp.py		train_mlp.py
uncertain.py		uncertain.py

WiserZhou/MTID

Folders and files

Latest commit

History

Repository files navigation

MTID: Masked Temporal Interpolation Diffusion For Procedure Planning

MTID: Masked Temporal Interpolation Diffusion For Procedure Planning

Yufan Zhou, Zhaobo Qi, Lingshuai Lin, Junqi Jing, Tingting Chai, Beichen Zhang, Shuhui Wang, Weigang Zhang

Paper Teaser

Environment Setup

Data Preparation

CrossTask&COIN&NIV

Train

Inference

For Metrics

For probabilistic modeling

Citation

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages