Fix DeepSeek related bugs (#724, #548, #633) by Nyquist24 · Pull Request #731 · alibaba/Pai-Megatron-Patch

Nyquist24 · 2026-03-14T07:28:54Z

Fix pretrain_deepseek.py 136行有一个没有定义的函数，搜索仓库根本没有 get_gpt_decoder_layer_specs #724: Remove undefined function get_gpt_decoder_layer_specs call in pretrain_deepseek.py, reuse already-computed transformer_layer_spec
Fix example deepseek_v2 flash_attn option is useless #548: Remove non-functional flash_attn option in deepseek_v2 since MLA does not support flash attention, always use TE Fused Attention
Fix hf2mcore转换后，模型缺少local_experts #633: Replace unsafe str.replace with re.sub(count=1) in hf2mcore_deepseek_v2_moe.py to prevent substring mismatch during local_experts key renaming

Changes

File	Change
`examples/deepseek_v3/pretrain_deepseek.py`	Remove 10 lines of redundant/broken MTP spec code
`examples/deepseek_v2/run_mcore_deepseek.sh`	Replace FL conditional with fixed `NVTE_FUSED_ATTN=1`
`toolkits/.../hf2mcore_deepseek_v2_moe.py`	3x `str.replace` → `re.sub(..., count=1)`

Test plan

Verify DeepSeek-V3 pretraining with --mtp-num-layers runs without NameError
Verify DeepSeek-V2 training uses fused attention correctly
Verify HF→mcore MoE checkpoint conversion with EP>1 produces correct local_experts keys

- Fix alibaba#724: Remove undefined function `get_gpt_decoder_layer_specs` call in pretrain_deepseek.py, reuse already-computed `transformer_layer_spec` - Fix alibaba#548: Remove non-functional flash_attn option in deepseek_v2 since MLA does not support flash attention, always use TE Fused Attention - Fix alibaba#633: Replace unsafe `str.replace` with `re.sub(count=1)` in hf2mcore_deepseek_v2_moe.py to prevent substring mismatch during local_experts key renaming

CLAassistant · 2026-03-14T07:29:01Z

All committers have signed the CLA.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix DeepSeek related bugs (#724, #548, #633)#731

Fix DeepSeek related bugs (#724, #548, #633)#731
Nyquist24 wants to merge 1 commit intoalibaba:mainfrom
Nyquist24:fix/deepseek-bugs-724-548-633

Nyquist24 commented Mar 14, 2026

Uh oh!

CLAassistant commented Mar 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Nyquist24 commented Mar 14, 2026

Changes

Test plan

Uh oh!

CLAassistant commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CLAassistant commented Mar 14, 2026 •

edited

Loading