Skip to content

Fix DeepSeek related bugs (#724, #548, #633)#731

Open
Nyquist24 wants to merge 1 commit intoalibaba:mainfrom
Nyquist24:fix/deepseek-bugs-724-548-633
Open

Fix DeepSeek related bugs (#724, #548, #633)#731
Nyquist24 wants to merge 1 commit intoalibaba:mainfrom
Nyquist24:fix/deepseek-bugs-724-548-633

Conversation

@Nyquist24
Copy link
Copy Markdown

Changes

File Change
examples/deepseek_v3/pretrain_deepseek.py Remove 10 lines of redundant/broken MTP spec code
examples/deepseek_v2/run_mcore_deepseek.sh Replace FL conditional with fixed NVTE_FUSED_ATTN=1
toolkits/.../hf2mcore_deepseek_v2_moe.py 3x str.replacere.sub(..., count=1)

Test plan

  • Verify DeepSeek-V3 pretraining with --mtp-num-layers runs without NameError
  • Verify DeepSeek-V2 training uses fused attention correctly
  • Verify HF→mcore MoE checkpoint conversion with EP>1 produces correct local_experts keys

- Fix alibaba#724: Remove undefined function `get_gpt_decoder_layer_specs` call
  in pretrain_deepseek.py, reuse already-computed `transformer_layer_spec`
- Fix alibaba#548: Remove non-functional flash_attn option in deepseek_v2 since
  MLA does not support flash attention, always use TE Fused Attention
- Fix alibaba#633: Replace unsafe `str.replace` with `re.sub(count=1)` in
  hf2mcore_deepseek_v2_moe.py to prevent substring mismatch during
  local_experts key renaming
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 14, 2026

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants