Skip to content

Commit 53ecdf2

Browse files
jerryli1981Peng Li
andauthored
Enhance Moonlight&DeepSeek-R1 ChatLearn/Verl Best Practice (#683)
* enhance verl-patch for moonlight and deepseek * fix moonlight convergence with mcore0908 * fix verl patch dsr1 * update chatlearn --------- Co-authored-by: Peng Li <[email protected]>
1 parent ac71c8b commit 53ecdf2

15 files changed

Lines changed: 1807 additions & 232 deletions

File tree

README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,9 @@
77
|Qwen3 |[ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen3/README.md)|[ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen3/README_chatlearn.md) | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen3/README_verl.md) |
88
|Qwen3-VL |Coming Soon| Coming Soon | Coming Soon |
99
|Qwen2.5-VL |[ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2_5_vl/README.md)| [ReadMe](https://github.com/alibaba/ChatLearn/blob/main/docs/en/tutorial/tutorial_grpo_mcore_qwenvl.md) | Coming Soon |
10-
|Moonlight |[ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/moonlight/README.md)|[ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/moonlight/README_chatlearn.md)| [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/moonlight/README_verl.md) |
10+
|Moonlight |[ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/moonlight/README.md)|[ReadMe](https://github.com/alibaba/ChatLearn/blob/main/docs/zh/tutorial/tutorial_grpo_mcore_moonlight_and_deepseek.md)| [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/moonlight/README_verl.md) |
1111
|DeepSeek-V3 |[ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/deepseek_v3/README.md)| N/A | N/A |
12-
|DeepSeek-R1 | N/A |[ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/deepseek_v3/README_grpo.md)| [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/deepseek_v3/README_verl.md) |
12+
|DeepSeek-R1 | N/A |[ReadMe](https://github.com/alibaba/ChatLearn/blob/main/docs/zh/tutorial/tutorial_grpo_mcore_moonlight_and_deepseek.md)| [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/deepseek_v3/README_verl.md) |
1313

1414

1515

@@ -19,6 +19,7 @@ English | [简体中文](./README_zh-CN.md)
1919
Pai-Megatron-Patch (https://github.com/alibaba/Pai-Megatron-Patch) is a deep learning training toolkit built for developers to train and predict LLMs & VLMs by using Megatron framework easily. With the continuous development of LLMs, the model structure and scale are rapidly evolving. Although these models can be conveniently manufactured using Transformers or DeepSpeed training framework, the training efficiency is comparably low. This phenomenon becomes even severer when the model scale exceeds 10 billion. The primary objective of Pai-Megatron-Patch is to effectively utilize the computational power of GPUs for LLM. This tool allows convenient training of commonly used LLM with all the accelerating techniques provided by Megatron-LM.
2020

2121
What's New:
22+
- **Improve MLA models such as Moonlight/DeepSeek-V3 RL Training Stability and Efficiency with Context Parallel and Sequence Packing** [🔥🔥 2025.10.10]
2223
- **[Experimental]Support Qwen3-Next-80B-A3B Pre-Training using Megatron-Core** [🔥🔥 2025.09.22]
2324
- **Support Qwen3 & DeepSeek-R1 GRPO Reinforcement Training using Megatron-Core and Verl** [🔥🔥 2025.09.19]
2425
- **Support Moonlight GRPO Reinforcement Training using Megatron-Core and Verl** [🔥🔥 2025.09.11]

README_zh-CN.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ Pai-Megatron-Patch是各类开源大模型和Megatron训练加速引擎之间的
4141
- [阿里云PAI获得FewCLUE基于大模型的小样本学习双料冠军](https://developer.aliyun.com/article/788081?spm=a2c6h.12873639.article-detail.17.11c5383cHpFZks&tlog=yuekan_8)
4242

4343
新功能:
44+
- **通过上下文并行(Context Parallel)与序列打包(Sequence Packing)提升Moonlight/DeepSeek-V3等MLA模型的强化学习训练稳定性和效率** [🔥🔥 2025.10.10]
4445
- **[实验性]支持Qwen3-Next-80B-A3B使用Mcore进行预训练** [🔥🔥 2025.09.22]
4546
- **支持Qwen3和DeepSeek-R1模型使用Mcore+Verl进行强化学习GRPO训练** [🔥🔥 2025.09.19]
4647
- **支持Moonlight模型使用Mcore+Verl进行强化学习GRPO训练** [🔥🔥 2025.09.11]

backends/rl/ChatLearn

Submodule ChatLearn updated 135 files

examples/deepseek_v3/run_mcore_deepseek_verl.sh

Lines changed: 38 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,44 +1,55 @@
11
#!/bin/bash
22

33
ray stop
4-
CURRENT_DIR="$( cd "$( dirname "$0" )" && pwd )"
5-
VERL_PATCH_PATH=$( dirname $( dirname ${CURRENT_DIR}))
6-
export PYTHONPATH=${VERL_PATCH_PATH}:${VERL_PATCH_PATH}/backends/megatron/Megatron-LM-250624:${VERL_PATCH_PATH}/backends/rl/verl:$PYTHONPATH
7-
export HYDRA_FULL_ERROR=1
8-
gsm8k_train_path=/mnt/data/datasets/gsm8k/train.parquet
9-
gsm8k_test_path=/mnt/data/datasets/gsm8k/test.parquet
10-
11-
train_files="['$gsm8k_train_path']"
12-
test_files="['$gsm8k_test_path']"
13-
14-
15-
MODEL_PATH=/mnt/data/ckpts/huggingface/DeepSeek-V3-0324
16-
MCORE_MODEL_PATH=/mnt/data/ckpts/mcore/DeepSeek-R1-BF16-to-mcore
17-
18-
19-
# If you are using vllm<=0.6.3, you might need to set the following environment variable to avoid bugs:
20-
# export VLLM_ATTENTION_BACKEND=XFORMERS
21-
export CUDA_DEVICE_MAX_CONNECTIONS=1 # For megatron communication/computation overlapping
4+
rm -rf /tmp/ray/*
225

6+
export CUDA_DEVICE_MAX_CONNECTIONS=1
237
export GPUS_PER_NODE=${MLP_WORKER_GPU:-${KUBERNETES_CONTAINER_RESOURCE_GPU:-8}}
248
export RAY_num_server_call_thread=1
259
export NNODES=${MLP_WORKER_NUM:-${WORLD_SIZE:-1}}
2610
export NODE_RANK=${MLP_WORKER_RACK_RANK_INDEX:-${MLP_ROLE_INDEX:-${RANK:-0}}}
2711
export MASTER_ADDR=${MLP_WORKER_0_HOST:-${MASTER_ADDR:-127.0.0.1}}
2812
export MASTER_PORT=${MLP_WORKER_0_PORT:-${MASTER_PORT:-1234}}
2913

14+
CURRENT_DIR=$(pwd)
15+
MEGATRON_PATCH_PATH=$( dirname $( dirname ${CURRENT_DIR}))
16+
VERL_ROOT_PATH=${MEGATRON_PATCH_PATH}/backends/rl/verl
17+
export PYTHONPATH=${MEGATRON_PATCH_PATH}:${MEGATRON_PATCH_PATH}/backends/megatron/Megatron-LM-250624:${VERL_ROOT_PATH}:$PYTHONPATH
18+
19+
export RAY_CGRAPH_get_timeout=200
20+
export CUDA_DEVICE_MAX_CONNECTIONS=1
21+
export RAY_num_server_call_thread=1
22+
export RAY_DEDUP_LOGS=0
23+
export VLLM_USE_RAY_SPMD_WORKER=1
24+
export VLLM_USE_RAY_COMPILED_DAG=1
25+
26+
train_path=/mnt/data/datasets/MATH-lighteval/train.parquet
27+
test_path=/mnt/data/datasets/MATH-lighteval/test.parquet
28+
29+
train_files="['$train_path']"
30+
test_files="['$test_path']"
31+
32+
hf_ckpt_path=/mnt/data/ckpts/huggingface/DeepSeek-V3-0324-BF16
33+
mcore_ckpt_path=/mnt/data/ckpts/mcore/DeepSeek-V3-0324-BF16-to-mcore
34+
proj_name="jerry_debug"
35+
exp_name="test_deepseek_verl"
36+
export output_dir=${CURRENT_DIR}/verl_outputs/${exp_name}
37+
export WANDB_DIR=${output_dir}
38+
mkdir -p $output_dir/
39+
export log_dir=${output_dir}/logs
40+
mkdir -p $log_dir
41+
log_file=$log_dir/${exp_name}_rank${NODE_RANK}.log
42+
3043

31-
project_name='DAPO'
32-
exp_name='Test_Verl_Mcore_DeepSeek671b_Loss'
3344
adv_estimator=grpo
3445
use_kl_in_reward=True
3546
kl_coef=0.0
3647
use_kl_loss=True
3748
kl_loss_coef=0.0
3849
clip_ratio_low=0.2
3950
clip_ratio_high=0.28
40-
max_prompt_length=$((1024 * 2))
41-
max_response_length=$((1024 * 4))
51+
max_prompt_length=1536
52+
max_response_length=2048
4253
enable_overlong_buffer=True
4354
overlong_buffer_len=$((1024 * 4))
4455
overlong_penalty_factor=0.1
@@ -135,17 +146,17 @@ python ../qwen3/verl_entrypoint.py --config-path=../qwen3/verl_configs \
135146
+reward_model.reward_kwargs.overlong_buffer_cfg.log=False \
136147
+reward_model.reward_kwargs.max_resp_len=${max_response_length} \
137148
trainer.logger=['console'] \
138-
trainer.project_name="${project_name}" \
139-
trainer.experiment_name="${exp_name}" \
149+
trainer.project_name=${proj_name} \
150+
trainer.experiment_name=${exp_name} \
140151
trainer.n_gpus_per_node=${GPUS_PER_NODE} \
141152
trainer.nnodes=${NNODES} \
142153
trainer.val_before_train=False \
143-
trainer.test_freq=50000000 \
154+
trainer.test_freq=5 \
144155
trainer.save_freq=50000000 \
145-
trainer.total_epochs=10 \
156+
trainer.total_epochs=200 \
146157
trainer.total_training_steps=1000 \
147158
trainer.resume_mode=auto \
148-
trainer.log_val_generations=10 2>&1 | tee ${NNODES}nodes_verl_debug.log
159+
2>&1 | tee ${log_file} ; exit ${PIPESTATUS[0]}
149160

150161
else
151162
ray start --block --address=${MASTER_ADDR}:6379

examples/moonlight/README_verl.md

Lines changed: 40 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -2,39 +2,60 @@
22

33
本文档提供使用Verl、Mcore 和 vLLM 框架来对Moonlight模型进行GRPO训练的快速开始指南。
44

5-
## 环境配置
6-
1. Docker镜像准备
7-
我们建议在PAI [DSW](https://help.aliyun.com/zh/pai/user-guide/create-and-manage-dsw-instances/)/[DLC](https://help.aliyun.com/zh/pai/user-guide/create-a-training-task?spm=a2c4g.11186623.help-menu-30347.d_3_3_5_5.2dfb1925l3QjwG)中运行该示例,你需要填写如下镜像地址来启动实例:
5+
## 开发环境配置
6+
建议在PAI平台DSW环境中基于nvcr.io/nvidia/pytorch:24.12-py3来构建镜像。
87
```bash
9-
dsw-registry.cn-shanghai.cr.aliyuncs.com/pai-training-algorithm/chatlearn:torch2.6.0-vllm0.8.5-ubuntu24.04-cuda12.6-py312
10-
```
8+
#安装VLLM, Transformers等Chatlearn的依赖包
9+
pip install modelscope==1.30.0 tensordict==0.10.0 torchdata==0.11.0 codetiming==1.4.0 vllm==0.8.5 transformers==4.56.2 blobfile==3.0.0 numpy==1.26.4 accelerate==1.10.0 wandb==0.19.11 datasets==3.6.0 grpcio==1.71.0 omegaconf==2.3.0 hydra-core==1.3.2 msgspec==0.19.0 mathruler==0.1.0 pylatexenc==2.10 langgraph==0.6.6 ray[default]==2.46.0 -i https://mirrors.aliyun.com/pypi/simple/
10+
11+
#由于安装VLLM会重新安装pytorch,因此需要重新安装flash attention以及apex
12+
pip uninstall -y flash_attn && pip install https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/csrc/flash-attention/torch2.6.0-cu12x/flash_attn-2.4.2-cp312-cp312-linux_x86_64.whl --no-cache-dir -i https://mirrors.aliyun.com/pypi/simple/
1113

12-
可以使用vpc地址来加速镜像拉取速度,需要根据当前region信息来更改镜像地址。比如,启动在上海的DSW实例,可以使用如下镜像`dsw-registry-vpc.cn-shanghai.cr.aliyuncs.com/pai-training-algorithm/chatlearn:torch2.6.0-vllm0.8.5-ubuntu24.04-cuda12.6-py312
13-
`。
14+
pip uninstall -y apex && pip install https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/csrc/apex/torch2.6.0-cuda12x/apex-0.1-cp312-cp312-linux_x86_64.whl --no-cache-dir -i https://mirrors.aliyun.com/pypi/simple/
1415

15-
2. 代码准备
16+
#升级Transformer Engine
17+
pip uninstall -y transformer-engine transformer-engine-cu12 transformer-engine-torch
18+
git clone --recursive https://github.com/NVIDIA/TransformerEngine.git
19+
cd TransformerEngine
20+
git submodule update --init --recursive
21+
git checkout release_v2.7
22+
export CUDNN_PATH=/usr/local/lib/python3.12/dist-packages/nvidia/cudnn/
23+
cp /usr/local/lib/python3.12/dist-packages/nvidia/cudnn/include/* /usr/local/cuda/include/
24+
python setup.py bdist_wheel -vvv
25+
cd dist
26+
export NVTE_FRAMEWORK=pytorch
27+
pip install transformer_engine-2.7.0-cp312-cp312-linux_x86_64.whl --no-cache-dir -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.cloud.aliyuncs.com
1628

29+
#升级CUDNN,用以解决MLA模型训练时出现的问题
30+
pip install -U nvidia-cudnn-cu12==9.8.0.87 -i http://mirrors.cloud.aliyuncs.com/pypi/simple --trusted-host mirrors.cloud.aliyuncs.com
31+
32+
```
33+
34+
## 代码准备
1735
```bash
1836
git clone --recurse-submodules https://github.com/alibaba/Pai-Megatron-Patch.git
1937
```
2038

2139
## 数据准备
22-
[GSM8k](https://modelscope.cn/datasets/AI-ModelScope/gsm8k)数据集作为示例.
40+
[MATH-lighteval](https://www.modelscope.cn/datasets/AI-ModelScope/MATH-lighteval)数据集作为示例.
2341
```bash
2442
# 下载数据集
2543
mkdir -p /mnt/data/datasets
26-
请按照链接指引准备GSM8K数据集:https://verl.readthedocs.io/en/latest/examples/gsm8k_example.html
44+
modelscope download --dataset AI-ModelScope/MATH-lighteval --local_dir dataset/MATH-lighteval
45+
cd ~/Pai-Megatron-Patch/toolkits/verl_data_preprocessing
46+
python math_lighteval.py --input_dir dataset/MATH-lighteval --local_dir dataset/MATH-lighteval
47+
2748
# 下载模型权重
2849
modelscope download --model moonshotai/Moonlight-16B-A3B-Instruct --local_dir /mnt/data/ckpts/huggingface/Moonlight-16B-A3B-Instruct
2950
```
3051

3152
## 代码&CKPT修改
3253
```bash
33-
vim ~/Pai-Megatron-Patch/backends/megatron/Megatron-LM-250624/megatron/training/tokenizer/tokenizer.py
34-
143行修改为
35-
self._tokenizer = transformers.AutoTokenizer.from_pretrained(
36-
pretrained_model_name_or_path=pretrained_model_name_or_path, trust_remote_code=True, **kwargs
37-
)
54+
vim ~/Pai-Megatron-Patch/backends/megatron/Megatron-LM-250908/megatron/core/models/gpt/gpt_layer_specs.py
55+
145行修改为
56+
linear_q_down_proj=backend.linear() -> linear_q_down_proj=backend.column_parallel_linear()
57+
linear_kv_down_proj=backend.linear() -> linear_kv_down_proj=backend.column_parallel_linear()
58+
3859
vim /mnt/data/ckpts/huggingface/Moonlight-16B-A3B-Instruct/config.json
3960
"AutoModel""AutoModelForCausalLM"修改为:
4061
"auto_map": {
@@ -45,6 +66,8 @@ vim /mnt/data/ckpts/huggingface/Moonlight-16B-A3B-Instruct/config.json
4566
cp ~/Pai-Megatron-Patch/examples/moonlight/modeling_deepseek_pai.py /mnt/data/ckpts/huggingface/Moonlight-16B-A3B-Instruct
4667
```
4768

69+
vim
70+
4871
## 模型转换
4972

5073
模型格式转换可以参考 [Pai-Megatron-Patch](https://github.com/alibaba/Pai-Megatron-Patch) 项目提供的转换脚本。
@@ -64,19 +87,12 @@ true \
6487
bf16
6588
```
6689

90+
## 模型转换
91+
6792
## 训练
6893
运行以下命令开始训练:
6994

7095
```bash
7196
cd ~/Pai-Megatron-Patch/examples/moonlight
7297
bash run_mcore_moonlight_verl.sh
73-
```
74-
75-
## 使用 Wandb 监控
76-
如需使用 Wandb 记录训练过程,请参考如下配置:
77-
78-
```bash
79-
export enable_wandb=True
80-
export wandb_project="Your-Wandb-Project-Name"
81-
export WANDB_API_KEY="Your-Wandb-api-key"
8298
```

examples/moonlight/run_mcore_moonlight_chatlearn.sh

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,11 +18,10 @@ done
1818
export CUSTOM_PORTS=$ports
1919
export num_device=$(($WORLD_SIZE * $GPUS_PER_NODE))
2020

21-
CURRENT_DIR="$( cd "$( dirname "$0" )" && pwd )"
21+
CURRENT_DIR=$(pwd)
2222
MEGATRON_PATCH_PATH=$( dirname $( dirname ${CURRENT_DIR}))
2323
CHATLEARN_ROOT_PATH=${MEGATRON_PATCH_PATH}/backends/rl/ChatLearn
24-
CHATLEARN_KERNEL_PATH=${MEGATRON_PATCH_PATH}/backends/rl/ChatLearn/chatlearn
25-
export PYTHONPATH=${MEGATRON_PATCH_PATH}/backends/megatron/Megatron-LM-250624:${CHATLEARN_ROOT_PATH}:${CHATLEARN_KERNEL_PATH}:$PYTHONPATH
24+
export PYTHONPATH=${MEGATRON_PATCH_PATH}/backends/megatron/Megatron-LM-250624:${CHATLEARN_ROOT_PATH}:$PYTHONPATH
2625

2726
export RAY_CGRAPH_get_timeout=200
2827
export CUDA_DEVICE_MAX_CONNECTIONS=1
@@ -34,7 +33,7 @@ export VLLM_USE_RAY_COMPILED_DAG=1
3433
hf_ckpt_path=/mnt/data/ckpts/huggingface/Moonlight-16B-A3B-Instruct
3534
mcore_ckpt_path=/mnt/data/ckpts/mcore/Moonlight-16B-A3B-Instruct-to-mcore
3635
exp_name="test_moonlight_16b"
37-
export output_dir=${MEGATRON_PATCH_PATH}/output/${exp_name}
36+
export output_dir=${CURRENT_DIR}/chatlearn_outputs/${exp_name}
3837
mkdir -p $output_dir/
3938
export log_dir=${output_dir}/logs
4039
mkdir -p $log_dir

0 commit comments

Comments
 (0)