alibaba
diff --git a/‎README.md‎
Lines changed: 3 additions & 2 deletions b/‎README.md‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎README_zh-CN.md‎
Lines changed: 1 addition & 0 deletions b/‎README_zh-CN.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎backends/rl/ChatLearn‎ b/‎backends/rl/ChatLearn‎
diff --git a/‎examples/deepseek_v3/run_mcore_deepseek_verl.sh‎
Lines changed: 38 additions & 27 deletions b/‎examples/deepseek_v3/run_mcore_deepseek_verl.sh‎
Lines changed: 38 additions & 27 deletions
diff --git a/‎examples/moonlight/README_verl.md‎
Lines changed: 40 additions & 24 deletions b/‎examples/moonlight/README_verl.md‎
Lines changed: 40 additions & 24 deletions
diff --git a/‎examples/moonlight/run_mcore_moonlight_chatlearn.sh‎
Lines changed: 3 additions & 4 deletions b/‎examples/moonlight/run_mcore_moonlight_chatlearn.sh‎
Lines changed: 3 additions & 4 deletions
@@ -7,9 +7,9 @@
 |Qwen3       |[ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen3/README.md)|[ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen3/README_chatlearn.md) | [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen3/README_verl.md) |
 |Qwen3-VL  |Coming Soon| Coming Soon | Coming Soon |
 |Qwen2.5-VL  |[ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/qwen2_5_vl/README.md)| [ReadMe](https://github.com/alibaba/ChatLearn/blob/main/docs/en/tutorial/tutorial_grpo_mcore_qwenvl.md) | Coming Soon |
-|Moonlight   |[ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/moonlight/README.md)|[ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/moonlight/README_chatlearn.md)| [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/moonlight/README_verl.md) |
+|Moonlight   |[ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/moonlight/README.md)|[ReadMe](https://github.com/alibaba/ChatLearn/blob/main/docs/zh/tutorial/tutorial_grpo_mcore_moonlight_and_deepseek.md)| [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/moonlight/README_verl.md) |
 |DeepSeek-V3 |[ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/deepseek_v3/README.md)| N/A | N/A |
-|DeepSeek-R1 | N/A |[ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/deepseek_v3/README_grpo.md)| [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/deepseek_v3/README_verl.md) |
+|DeepSeek-R1 | N/A |[ReadMe](https://github.com/alibaba/ChatLearn/blob/main/docs/zh/tutorial/tutorial_grpo_mcore_moonlight_and_deepseek.md)| [ReadMe](https://github.com/alibaba/Pai-Megatron-Patch/blob/main/examples/deepseek_v3/README_verl.md) |
 
 
 
@@ -19,6 +19,7 @@ English | [简体中文](./README_zh-CN.md)
 Pai-Megatron-Patch (https://github.com/alibaba/Pai-Megatron-Patch) is a deep learning training toolkit built for developers to train and predict LLMs & VLMs by using Megatron framework easily. With the continuous development of LLMs, the model structure and scale are rapidly evolving. Although these models can be conveniently manufactured using Transformers or DeepSpeed training framework, the training efficiency is comparably low. This phenomenon becomes even severer when the model scale exceeds 10 billion. The primary objective of Pai-Megatron-Patch is to effectively utilize the computational power of GPUs for LLM. This tool allows convenient training of commonly used LLM with all the accelerating techniques provided by Megatron-LM.
 
 What's New:
+- **Improve MLA models such as Moonlight/DeepSeek-V3 RL Training Stability and Efficiency with Context Parallel and Sequence Packing** [🔥🔥 2025.10.10]
 - **[Experimental]Support Qwen3-Next-80B-A3B Pre-Training using Megatron-Core** [🔥🔥 2025.09.22]
 - **Support Qwen3 & DeepSeek-R1 GRPO Reinforcement Training using Megatron-Core and Verl** [🔥🔥 2025.09.19]
 - **Support Moonlight GRPO Reinforcement Training using Megatron-Core and Verl** [🔥🔥 2025.09.11]
 
@@ -41,6 +41,7 @@ Pai-Megatron-Patch是各类开源大模型和Megatron训练加速引擎之间的
 - [阿里云PAI获得FewCLUE基于大模型的小样本学习双料冠军](https://developer.aliyun.com/article/788081?spm=a2c6h.12873639.article-detail.17.11c5383cHpFZks&tlog=yuekan_8)
 
 新功能：
+- **通过上下文并行(Context Parallel)与序列打包(Sequence Packing)提升Moonlight/DeepSeek-V3等MLA模型的强化学习训练稳定性和效率** [🔥🔥 2025.10.10]
 - **[实验性]支持Qwen3-Next-80B-A3B使用Mcore进行预训练** [🔥🔥 2025.09.22]
 - **支持Qwen3和DeepSeek-R1模型使用Mcore+Verl进行强化学习GRPO训练** [🔥🔥 2025.09.19]
 - **支持Moonlight模型使用Mcore+Verl进行强化学习GRPO训练** [🔥🔥 2025.09.11]
 
@@ -1,44 +1,55 @@
 #!/bin/bash
 
 ray stop
-CURRENT_DIR="$( cd "$( dirname "$0" )" && pwd )"
-VERL_PATCH_PATH=$( dirname $( dirname ${CURRENT_DIR}))
-export PYTHONPATH=${VERL_PATCH_PATH}:${VERL_PATCH_PATH}/backends/megatron/Megatron-LM-250624:${VERL_PATCH_PATH}/backends/rl/verl:$PYTHONPATH
-export HYDRA_FULL_ERROR=1
-gsm8k_train_path=/mnt/data/datasets/gsm8k/train.parquet
-gsm8k_test_path=/mnt/data/datasets/gsm8k/test.parquet
-
-train_files="['$gsm8k_train_path']"
-test_files="['$gsm8k_test_path']"
-
-
-MODEL_PATH=/mnt/data/ckpts/huggingface/DeepSeek-V3-0324
-MCORE_MODEL_PATH=/mnt/data/ckpts/mcore/DeepSeek-R1-BF16-to-mcore
-
-
-# If you are using vllm<=0.6.3, you might need to set the following environment variable to avoid bugs:
-# export VLLM_ATTENTION_BACKEND=XFORMERS
-export CUDA_DEVICE_MAX_CONNECTIONS=1 # For megatron communication/computation overlapping
+rm -rf /tmp/ray/*
 
+export CUDA_DEVICE_MAX_CONNECTIONS=1
 export GPUS_PER_NODE=${MLP_WORKER_GPU:-${KUBERNETES_CONTAINER_RESOURCE_GPU:-8}}
 export RAY_num_server_call_thread=1
 export NNODES=${MLP_WORKER_NUM:-${WORLD_SIZE:-1}}
 export NODE_RANK=${MLP_WORKER_RACK_RANK_INDEX:-${MLP_ROLE_INDEX:-${RANK:-0}}}
 export MASTER_ADDR=${MLP_WORKER_0_HOST:-${MASTER_ADDR:-127.0.0.1}}
 export MASTER_PORT=${MLP_WORKER_0_PORT:-${MASTER_PORT:-1234}}
 
+CURRENT_DIR=$(pwd)
+MEGATRON_PATCH_PATH=$( dirname $( dirname ${CURRENT_DIR}))
+VERL_ROOT_PATH=${MEGATRON_PATCH_PATH}/backends/rl/verl
+export PYTHONPATH=${MEGATRON_PATCH_PATH}:${MEGATRON_PATCH_PATH}/backends/megatron/Megatron-LM-250624:${VERL_ROOT_PATH}:$PYTHONPATH
+
+export RAY_CGRAPH_get_timeout=200
+export CUDA_DEVICE_MAX_CONNECTIONS=1
+export RAY_num_server_call_thread=1
+export RAY_DEDUP_LOGS=0
+export VLLM_USE_RAY_SPMD_WORKER=1
+export VLLM_USE_RAY_COMPILED_DAG=1
+
+train_path=/mnt/data/datasets/MATH-lighteval/train.parquet
+test_path=/mnt/data/datasets/MATH-lighteval/test.parquet
+
+train_files="['$train_path']"
+test_files="['$test_path']"
+
+hf_ckpt_path=/mnt/data/ckpts/huggingface/DeepSeek-V3-0324-BF16
+mcore_ckpt_path=/mnt/data/ckpts/mcore/DeepSeek-V3-0324-BF16-to-mcore
+proj_name="jerry_debug"
+exp_name="test_deepseek_verl"
+export output_dir=${CURRENT_DIR}/verl_outputs/${exp_name}
+export WANDB_DIR=${output_dir}
+mkdir -p $output_dir/
+export log_dir=${output_dir}/logs
+mkdir -p $log_dir
+log_file=$log_dir/${exp_name}_rank${NODE_RANK}.log
+
 
-project_name='DAPO'
-exp_name='Test_Verl_Mcore_DeepSeek671b_Loss'
 adv_estimator=grpo
 use_kl_in_reward=True
 kl_coef=0.0
 use_kl_loss=True
 kl_loss_coef=0.0
 clip_ratio_low=0.2
 clip_ratio_high=0.28
-max_prompt_length=$((1024 * 2))
-max_response_length=$((1024 * 4))
+max_prompt_length=1536
+max_response_length=2048
 enable_overlong_buffer=True
 overlong_buffer_len=$((1024 * 4))
 overlong_penalty_factor=0.1
@@ -135,17 +146,17 @@ python ../qwen3/verl_entrypoint.py --config-path=../qwen3/verl_configs \
     +reward_model.reward_kwargs.overlong_buffer_cfg.log=False \
     +reward_model.reward_kwargs.max_resp_len=${max_response_length} \
     trainer.logger=['console'] \
-    trainer.project_name="${project_name}" \
-    trainer.experiment_name="${exp_name}" \
+    trainer.project_name=${proj_name} \
+    trainer.experiment_name=${exp_name} \
     trainer.n_gpus_per_node=${GPUS_PER_NODE} \
     trainer.nnodes=${NNODES} \
     trainer.val_before_train=False \
-    trainer.test_freq=50000000 \
+    trainer.test_freq=5 \
     trainer.save_freq=50000000 \
-    trainer.total_epochs=10 \
+    trainer.total_epochs=200 \
     trainer.total_training_steps=1000 \
     trainer.resume_mode=auto \
-    trainer.log_val_generations=10 2>&1 | tee ${NNODES}nodes_verl_debug.log
+    2>&1 | tee ${log_file} ; exit ${PIPESTATUS[0]}
 
 else
 ray start --block --address=${MASTER_ADDR}:6379
 
@@ -2,39 +2,60 @@
 
 本文档提供使用Verl、Mcore 和 vLLM 框架来对Moonlight模型进行GRPO训练的快速开始指南。
 
-## 环境配置
-1. Docker镜像准备
-我们建议在PAI [DSW](https://help.aliyun.com/zh/pai/user-guide/create-and-manage-dsw-instances/)/[DLC](https://help.aliyun.com/zh/pai/user-guide/create-a-training-task?spm=a2c4g.11186623.help-menu-30347.d_3_3_5_5.2dfb1925l3QjwG)中运行该示例，你需要填写如下镜像地址来启动实例：
+## 开发环境配置
+建议在PAI平台DSW环境中基于nvcr.io/nvidia/pytorch:24.12-py3来构建镜像。
 ```bash
-dsw-registry.cn-shanghai.cr.aliyuncs.com/pai-training-algorithm/chatlearn:torch2.6.0-vllm0.8.5-ubuntu24.04-cuda12.6-py312
-```
+#安装VLLM, Transformers等Chatlearn的依赖包
+pip install modelscope==1.30.0 tensordict==0.10.0 torchdata==0.11.0 codetiming==1.4.0 vllm==0.8.5 transformers==4.56.2 blobfile==3.0.0 numpy==1.26.4 accelerate==1.10.0 wandb==0.19.11 datasets==3.6.0 grpcio==1.71.0 omegaconf==2.3.0  hydra-core==1.3.2 msgspec==0.19.0 mathruler==0.1.0 pylatexenc==2.10 langgraph==0.6.6 ray[default]==2.46.0 -i https://mirrors.aliyun.com/pypi/simple/ 
+
+#由于安装VLLM会重新安装pytorch，因此需要重新安装flash attention以及apex
+pip uninstall -y flash_attn && pip install https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/csrc/flash-attention/torch2.6.0-cu12x/flash_attn-2.4.2-cp312-cp312-linux_x86_64.whl --no-cache-dir -i https://mirrors.aliyun.com/pypi/simple/ 
 
-可以使用vpc地址来加速镜像拉取速度，需要根据当前region信息来更改镜像地址。比如，启动在上海的DSW实例，可以使用如下镜像`dsw-registry-vpc.cn-shanghai.cr.aliyuncs.com/pai-training-algorithm/chatlearn:torch2.6.0-vllm0.8.5-ubuntu24.04-cuda12.6-py312
-`。
+pip uninstall -y apex && pip install https://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/csrc/apex/torch2.6.0-cuda12x/apex-0.1-cp312-cp312-linux_x86_64.whl --no-cache-dir -i https://mirrors.aliyun.com/pypi/simple/ 
 
-2. 代码准备
+#升级Transformer Engine
+pip uninstall -y transformer-engine transformer-engine-cu12 transformer-engine-torch
+git clone --recursive https://github.com/NVIDIA/TransformerEngine.git
+cd TransformerEngine
+git submodule update --init --recursive
+git checkout release_v2.7
+export CUDNN_PATH=/usr/local/lib/python3.12/dist-packages/nvidia/cudnn/
+cp /usr/local/lib/python3.12/dist-packages/nvidia/cudnn/include/*  /usr/local/cuda/include/
+python setup.py bdist_wheel  -vvv
+cd dist
+export NVTE_FRAMEWORK=pytorch 
+pip install transformer_engine-2.7.0-cp312-cp312-linux_x86_64.whl --no-cache-dir -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.cloud.aliyuncs.com
 
+#升级CUDNN，用以解决MLA模型训练时出现的问题
+pip install -U nvidia-cudnn-cu12==9.8.0.87 -i http://mirrors.cloud.aliyuncs.com/pypi/simple --trusted-host mirrors.cloud.aliyuncs.com
+
+```
+
+## 代码准备
 ```bash
 git clone --recurse-submodules https://github.com/alibaba/Pai-Megatron-Patch.git
 ```
 
 ## 数据准备
-以[GSM8k](https://modelscope.cn/datasets/AI-ModelScope/gsm8k)数据集作为示例.
+以[MATH-lighteval](https://www.modelscope.cn/datasets/AI-ModelScope/MATH-lighteval)数据集作为示例.
 ```bash
 # 下载数据集
 mkdir -p /mnt/data/datasets
-请按照链接指引准备GSM8K数据集：https://verl.readthedocs.io/en/latest/examples/gsm8k_example.html
+modelscope download --dataset AI-ModelScope/MATH-lighteval --local_dir dataset/MATH-lighteval
+cd ~/Pai-Megatron-Patch/toolkits/verl_data_preprocessing
+python math_lighteval.py --input_dir dataset/MATH-lighteval --local_dir dataset/MATH-lighteval
+
 # 下载模型权重
 modelscope download --model moonshotai/Moonlight-16B-A3B-Instruct --local_dir /mnt/data/ckpts/huggingface/Moonlight-16B-A3B-Instruct
 ```
 
 ## 代码&CKPT修改
 ```bash
-vim ~/Pai-Megatron-Patch/backends/megatron/Megatron-LM-250624/megatron/training/tokenizer/tokenizer.py
-143行修改为：
-self._tokenizer = transformers.AutoTokenizer.from_pretrained(
-    pretrained_model_name_or_path=pretrained_model_name_or_path, trust_remote_code=True, **kwargs
-)
+vim ~/Pai-Megatron-Patch/backends/megatron/Megatron-LM-250908/megatron/core/models/gpt/gpt_layer_specs.py
+145行修改为：
+linear_q_down_proj=backend.linear() -> linear_q_down_proj=backend.column_parallel_linear()
+linear_kv_down_proj=backend.linear() -> linear_kv_down_proj=backend.column_parallel_linear()
+
 vim /mnt/data/ckpts/huggingface/Moonlight-16B-A3B-Instruct/config.json
 将"AutoModel"和"AutoModelForCausalLM"修改为：
 "auto_map": {
@@ -45,6 +66,8 @@ vim /mnt/data/ckpts/huggingface/Moonlight-16B-A3B-Instruct/config.json
 cp ~/Pai-Megatron-Patch/examples/moonlight/modeling_deepseek_pai.py /mnt/data/ckpts/huggingface/Moonlight-16B-A3B-Instruct
 ```
 
+vim
+
 ## 模型转换
 
 模型格式转换可以参考 [Pai-Megatron-Patch](https://github.com/alibaba/Pai-Megatron-Patch) 项目提供的转换脚本。
@@ -64,19 +87,12 @@ true \
 bf16
 ```
 
+## 模型转换
+
 ## 训练
 运行以下命令开始训练：
 
 ```bash
 cd ~/Pai-Megatron-Patch/examples/moonlight
 bash run_mcore_moonlight_verl.sh
-```
-
-## 使用 Wandb 监控
-如需使用 Wandb 记录训练过程，请参考如下配置：
-
-```bash
-export enable_wandb=True
-export wandb_project="Your-Wandb-Project-Name"
-export WANDB_API_KEY="Your-Wandb-api-key"
 ```
@@ -18,11 +18,10 @@ done
 export CUSTOM_PORTS=$ports
 export num_device=$(($WORLD_SIZE * $GPUS_PER_NODE))
 
-CURRENT_DIR="$( cd "$( dirname "$0" )" && pwd )"
+CURRENT_DIR=$(pwd)
 MEGATRON_PATCH_PATH=$( dirname $( dirname ${CURRENT_DIR}))
 CHATLEARN_ROOT_PATH=${MEGATRON_PATCH_PATH}/backends/rl/ChatLearn
-CHATLEARN_KERNEL_PATH=${MEGATRON_PATCH_PATH}/backends/rl/ChatLearn/chatlearn
-export PYTHONPATH=${MEGATRON_PATCH_PATH}/backends/megatron/Megatron-LM-250624:${CHATLEARN_ROOT_PATH}:${CHATLEARN_KERNEL_PATH}:$PYTHONPATH
+export PYTHONPATH=${MEGATRON_PATCH_PATH}/backends/megatron/Megatron-LM-250624:${CHATLEARN_ROOT_PATH}:$PYTHONPATH
 
 export RAY_CGRAPH_get_timeout=200
 export CUDA_DEVICE_MAX_CONNECTIONS=1
@@ -34,7 +33,7 @@ export VLLM_USE_RAY_COMPILED_DAG=1
 hf_ckpt_path=/mnt/data/ckpts/huggingface/Moonlight-16B-A3B-Instruct
 mcore_ckpt_path=/mnt/data/ckpts/mcore/Moonlight-16B-A3B-Instruct-to-mcore
 exp_name="test_moonlight_16b"
-export output_dir=${MEGATRON_PATCH_PATH}/output/${exp_name}
+export output_dir=${CURRENT_DIR}/chatlearn_outputs/${exp_name}
 mkdir -p $output_dir/
 export log_dir=${output_dir}/logs
 mkdir -p $log_dir