Official Repo for our CVPR 2025 long-temporal grounding paper.
- Upload data and model checkpoints
- Release code for side-kick encoder training.
-
Set up the environment with conda:
conda env create -f environment.yml conda activate decafnet
-
Install NMS
cd ./libs/nms python setup_nms.py install --user
-
Download the released data and checkpoints from Google Drive.
-
Update data paths in config YAMLs under
libs/core/to match your local filesystem. Common fields to check are:data.anno_filedata.vid_feat_dirdata.text_feat_dirdata.text_cls_fnamedata.clip_token_fnamedata.sidekick_vid_feat_dir/data.sidekick_vid_loaddata.video_direncoder.pretrain
We release the pre-extracted features used by this codebase.
If you want to regenerate the Ego4D-NLQ features, the corresponding feature-generation code is provided in the egovlp branch. That branch contains the code path for generating:
- EgoVLP video features
- sentence-level EgoVLP text features
- packaged
nlq_32x8_d2.json - CLIP token features
Goalstep encoder training / extraction requires resized raw Ego4D videos in addition to the released processed features.
To create them:
- download the official Ego4D source videos,
- resize each video so the shortest side is
256, - place the resized videos under the directory used by
data.video_dir.
The encoder configs assume videos are stored as <video_uid>.mp4 under data.video_dir.
- Ego4D-NLQ:
libs/core/ego4d_nlq_30.yaml,libs/core/ego4d_nlq_50.yaml,libs/core/ego4d_nlq_100.yaml - GoalStep:
libs/core/goalstep_30.yaml,libs/core/goalstep_50.yaml,libs/core/goalstep_100.yaml - MAD:
libs/core/mad.yaml - Charades-STA:
libs/core/charades_i3d.yaml - TACoS:
libs/core/tacos.yaml
Launch experiments using train.py and config YAMLs under libs/core/.
# example, train decaf-grounder on Ego4D-NLQ dataset with 30% saliency ratio
CUDA_VISIBLE_DEVICES=0 PYTHONPATH='.' python train.py --cfg libs/core/ego4d_nlq_30.yamlYou can also override config options from command line (optional):
CUDA_VISIBLE_DEVICES=0 PYTHONPATH='.' python train.py \
--cfg libs/core/ego4d_nlq_30.yaml \
--set train.num_workers 4 aux.wandb_enable True aux.wandb_project downstreamTraining logs/checkpoints are saved under log folder.
Download grounder checkpoints from Google Drive.
Use eval_grounder.py with a checkpoint file path (.pth). The script automatically loads the matching opt.yaml from the same experiment directory.
CUDA_VISIBLE_DEVICES=0 PYTHONPATH='.' python eval_grounder.py --ckpt /path/to/ckpts/nlq_30/models/6-36000.pth
CUDA_VISIBLE_DEVICES=0 PYTHONPATH='.' python eval_grounder.py --ckpt /path/to/ckpts/nlq_50/models/7-38000.pth
CUDA_VISIBLE_DEVICES=0 PYTHONPATH='.' python eval_grounder.py --ckpt /path/to/ckpts/nlq_100/models/6-34000.pth
CUDA_VISIBLE_DEVICES=0 PYTHONPATH='.' python eval_grounder.py --ckpt /path/to/ckpts/goalstep_30/models/13-208000.pth
CUDA_VISIBLE_DEVICES=0 PYTHONPATH='.' python eval_grounder.py --ckpt /path/to/ckpts/goalstep_50/models/12-204000.pth
CUDA_VISIBLE_DEVICES=0 PYTHONPATH='.' python eval_grounder.py --ckpt /path/to/ckpts/goalstep_100/models/11-190000.pth
CUDA_VISIBLE_DEVICES=0 PYTHONPATH='.' python eval_grounder.py --ckpt /path/to/ckpts/mad/models/9-27000.pthUseful optional flags:
--dryrunfor quick checks--set ...for evaluation-time config overrides
- NLQ:
libs/core/sidekick_nlq.yaml - Goalstep:
libs/core/sidekick_goalstep.yaml
Use train.py with task: encoder configs.
CUDA_VISIBLE_DEVICES=0 PYTHONPATH='.' python train.py --cfg libs/core/sidekick_nlq.yaml
CUDA_VISIBLE_DEVICES=0 PYTHONPATH='.' python train.py --cfg libs/core/sidekick_goalstep.yamlFor release usage, prefer train.py / eval_grounder.py / extract_sidekick_feature.py entrypoints (not run.py).
The release code can directly load a packaged encoder checkpoint root and run feature extraction through extract_sidekick_feature.py.
CUDA_VISIBLE_DEVICES=0 PYTHONPATH='.' python extract_sidekick_feature.py \
--root /path/to/encoder_exp_or_ckpts_encoder/nlq \
--ckpt 10-15000 \
--split val
CUDA_VISIBLE_DEVICES=0 PYTHONPATH='.' python extract_sidekick_feature.py \
--root /path/to/encoder_exp_or_ckpts_encoder/goalstep \
--ckpt 2-36000 \
--split val
# multi-gpu
PYTHONPATH='.' torchrun --standalone --nproc_per_node=8 extract_sidekick_feature.py \
--root /path/to/encoder_exp_or_ckpts_encoder/nlq \
--ckpt 10-15000 \
--split val@inproceedings{
Lu2025DeCafNet,
title={DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in Long Videos},
author={Zijia Lu and A S M Iftekhar and Gaurav Mittal and Tianjian Meng and Xiawei Wang and Cheng Zhao and Rohith Kukkala and Ehsan Elhamifar and Mei Chen},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2025},
}

