Skip to content

selina-lii/598_diffusion_lm_project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EECS 498 Foundations of Large Language Models

Final Project: Diffusion-LM with Bigger Models and Smaller Steps

Group members: Zesen Zhao, Yaoxin (Selina) Li, Keqian Wang, Kevin Chen

Reference

This code is forked from XiangLi1999/Diffusion-LM, which is the repo for Diffusion-LM Improves Controllable Text Generation NeurIPS 2022.

Code contributions

We cleaned up a lot of files in the code, either to accomodate our testing or to make tweaks and fixes.

Some of the files we created or made key changes to:

Among many others.


Conda Setup:

# setup on U-M GreatLakes
conda create -n env python=3.9.7
conda activate env
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
pip install -e improved-diffusion/ transformers/
pip install spacy==3.2.4 datasets==1.8.0 huggingface_hub==0.4.0 wandb
# magic
pip install --upgrade stanza stanza-spacy pydantic
# solves the mpi4py issue!!
module load gcc/8.2.0
module load openmpi/3.1.6
which mpicc # put your mpicc path here
env MPICC=/home/<username>/.conda/envs/498/bin/mpicc pip install --no-cache-dir mpi4py
# original version
conda install mpi4py
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
pip install -e improved-diffusion/ 
pip install -e transformers/
pip install spacy==3.2.4
pip install datasets==1.8.0 
pip install huggingface_hub==0.4.0 
pip install wandb

Train Diffusion-LM:

cd improved-diffusion; mkdir diffusion_models;

python scripts/run_train.py --diff_steps 2000 --model_arch transformer --lr 0.0001 --lr_anneal_steps 200000 --seed 102 --noise_schedule sqrt --in_channel 16 --modality e2e-tgt --submit no --padding_mode block --app "--predict_xstart True --training_mode e2e --vocab_size 821 --e2e_train ../datasets/e2e_data " --notes xstart_e2e

python scripts/run_train.py --diff_steps 2000 --model_arch transformer --lr 0.0001 --lr_anneal_steps 400000 --seed 101 --noise_schedule sqrt --in_channel 128 --modality roc --submit no --padding_mode pad --app "--predict_xstart True --training_mode e2e --vocab_size 11043 --roc_train ../datasets/ROCstory " --notes xstart_e2e --bsz 64


Decode Diffusion-LM:

mkdir generation_outputs

python scripts/batch_decode.py {path-to-diffusion-lm} -1.0 ema


Controllable Text Generation

First, train the classsifier used to guide the generation (e.g. a syntactic parser)

python train_run.py --experiment e2e-tgt-tree --app "--init_emb {path-to-diffusion-lm} --n_embd {16} --learned_emb yes " --pretrained_model bert-base-uncased --epoch 6 --bsz 10

Then, we can use the trained classifier to guide generation. (currently, need to update the classifier directory in scripts/infill.py. I will clean this up in the next release.)

python scripts/infill.py --model_path {path-to-diffusion-lm} --eval_task_ 'control_tree' --use_ddim True --notes "tree_adagrad" --eta 1. --verbose pipe


Real Command feed into train.py:

OPENAI_LOGDIR=diffusion_models/diff_e2e-tgt_block_rand16_transformer_lr0.0001_0.0_2000_sqrt_Lsimple_h128_s2_d0.1_sd102_xstart_e2e  TOKENIZERS_PARALLELISM=false python scripts/train.py   --checkpoint_path diffusion_models/diff_e2e-tgt_block_rand16_transformer_lr0.0001_0.0_2000_sqrt_Lsimple_h128_s2_d0.1_sd102_xstart_e2e --model_arch transformer --modality e2e-tgt --save_interval 50000 --lr 0.0001 --batch_size 64  --diffusion_steps 2000 --noise_schedule sqrt  --use_kl False --learn_sigma False  --image_size 8 --num_channels 128 --seed 102 --dropout 0.1 --in_channel 16 --out_channel 16 --padding_mode block --experiment random  --lr_anneal_steps 200000 --weight_decay 0.0 --num_res_blocks 2  --predict_xstart True --training_mode e2e --vocab_size 821  --e2e_train ../datasets/e2e_data

Our commands

BERT-large

OPENAI_LOGDIR=diffusion_models/diff_e2e-tgt_block_rand16_transformer_lr0.0001_0.0_2000_sqrt_Lsimple_h128_s2_d0.1_sd102_Bert_large TOKENIZERS_PARALLELISM=false python scripts/train.py --checkpoint_path diffusion_models/diff_e2e-tgt_block_rand16_transformer_lr0.0001_0.0_2000_sqrt_Lsimple_h128_s2_d0.1_sd102_Bert_large --model_arch transformer --modality e2e-tgt --save_interval 50000 --lr 0.0001 --batch_size 64 --diffusion_steps 2000 --noise_schedule sqrt --use_kl False --learn_sigma False --image_size 8 --num_channels 128 --seed 102 --dropout 0.1 --in_channel 16 --out_channel 16 --padding_mode block --experiment random --lr_anneal_steps 200000 --weight_decay 0.0 --num_res_blocks 2 --predict_xstart True --training_mode e2e --vocab_size 821 --e2e_train ../datasets/e2e_data

*Didn't converge tho

Infilling

python3 diffusion_lm/improved-diffusion/scripts/infill_parrot.py --model_path diffusion_lm/diffusion-models/large/large.pt --eval_task_ 'infill' --use_ddim True --eta 1. --verbose pipe --diffusion_steps 2 --partial_seq_file diffusion_lm/improved-diffusion/out_gen/sample.json --notes 3

python3 diffusion_lm/improved-diffusion/scripts/infill_parrot.py --model_path diffusion_lm/checkpoints/ema_0.9999_200000_base.pt --eval_task_ 'infill' --use_ddim True --notes '1' --eta 1. --verbose yes --diffusion_steps 20 --train_args_name base --partial_seq 'PAD PAD PAD , PAD PAD PAD PAD , PAD PAD PAD PAD PAD PAD PAD .' --out_dir diffusion_lm/improved-diffusion/out_gen/diff_20

presentation link for the paper: link

About

Experimenting Diffusion Language Model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors