Hi, thank you for releasing dParallel! The idea of learnable parallel decoding for diffusion LLMs is very exciting!
I have one technical question:
Since the paper does not report FLOPs, I’m trying to estimate the computational overhead of dParallel compared to the baseline diffusion LLM decoding.
May I ask:
1. How should FLOPs be counted for dParallel?
2. If possible, is there any script / config you used to profile FLOPs?
I’d like to reproduce similar measurements.
Thanks again for the great work. Looking forward to your guidance!
Hi, thank you for releasing dParallel! The idea of learnable parallel decoding for diffusion LLMs is very exciting!
I have one technical question:
Since the paper does not report FLOPs, I’m trying to estimate the computational overhead of dParallel compared to the baseline diffusion LLM decoding.
May I ask:
1. How should FLOPs be counted for dParallel?
2. If possible, is there any script / config you used to profile FLOPs?
I’d like to reproduce similar measurements.
Thanks again for the great work. Looking forward to your guidance!