We release ChronoEdit’s full training infrastructure and codebase, enabling distributed inference and large-scale fine-tuning of pretrained video diffusion models, including WAN 2.1 and Cosmos 2.5.
The training framework and system design are directly inherited from the Cosmos 2.5 release.
Note
The codebase has only been tested on NVIDIA A100 and NVIDIA H100 GPUs with CUDA 12.9 environments.
Download dcp checkpoint from HuggingFace:
hf download nvidia/ChronoEdit-14B --local-dir checkpoints/ChronoEdit-14BTo enable full model training, ensure transformer-engine (with patch fixes) is properly installed:
conda env create -f environment.yml -n chronoedit_full
conda activate chronoedit_full
pip install torch==2.7.1 torchvision==0.22.1
pip install -r requirements.txt
# Set a limit to the number of threads to prevent OOM for flash attention compilation
export MAX_JOBS=16
pip install flash-attn==2.6.3
# Patch Transformer engine linking issues in conda environments.
ln -sf $CONDA_PREFIX/lib/python3.10/site-packages/nvidia/*/include/* $CONDA_PREFIX/include/
ln -sf $CONDA_PREFIX/lib/python3.10/site-packages/nvidia/*/include/* $CONDA_PREFIX/include/python3.10
pip install transformer-engine[pytorch]
conda install -y -c conda-forge libstdcxx-ng>=13.2.0 libgcc-ng>=13.2.0Verify the installation:
python -c "import transformer_engine as te; print('TE import OK')"Note
If you encounter the following error:
Command 'ldconfig -p | grep 'libnvrtc'' returned non-zero exit status 1.It indicates that CUDA is installed inside the Conda environment rather than in the default system path. To resolve this, manually link the CUDA environment:
source scripts/TE_patch.shThe following script support inference script with only one GPU.
PYTHONPATH=$(pwd) python -m torch.distributed.run --nproc_per_node=1 --master_port=12340 -m scripts.run_inference \
--experiment edit_14B_skip_pe8 \
--checkpoint_path ./checkpoints/ChronoEdit-14B/nvidia/chronoedit_14B/model.pth \
--save_root outputs/0903_wan_edit/image_edit_video_prior \
--num_frames 2 \
--resolution "720p" \
--guidance 5.0 \
--prompt "Add a person wearing red jacket" \
--input_image_fp assets/images/input.jpgNote
If you encounter the following error:
[rank1]: ImportError: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by ~/.triton/cache/{...}/cuda_utils.soThis indicates a higher libc version was previously used to compile a library on the same device. To resolve this, simply delete the triton cache:
rm -rf ~/.triton/cacheThe following script support distributed inference with two GPUs with Context Parallel Ring Attention.
PYTHONPATH=$(pwd) python -m torch.distributed.run --nproc_per_node=2 --master_port=12340 -m scripts.run_inference \
--experiment edit_14B_skip_pe8 \
--checkpoint_path ./checkpoints/ChronoEdit-14B/nvidia/chronoedit_14B/model.pth \
--save_root outputs/0903_wan_edit/image_edit_video_prior \
--num_frames 2 \
--resolution "720p" \
--guidance 5.0 \
--prompt "Add a person wearing red jacket" \
--input_image_fp assets/images/input.jpgTest training with context parallel. Trainined was tested on NVIDIA A100 and NVIDIA H100 GPUs with 80G memory.
torchrun --nproc_per_node=4 --master_port=12341 -m scripts.train --config=chronoedit/_src/configs/chronoedit/config.py -- experiment="edit_14B_skip_pe8_mock"ChronoEdit supports three checkpoint formats, each optimized for a different use case. See checkpoint guidance doc.