fix bug: ValueError: UnquantizedFusedMoEMethod uses the new modular kernel initialization logic for all but the CPU backend. CPU backend is monolithic. So this function should not be called.

guqiong96 · guqiong96 · commit 9e9c5f21fc68 · 2026-04-16T17:05:48.000+08:00
diff --git a/README.md b/README.md
@@ -592,8 +592,7 @@ LVLLM_GPU_PREFILL_MIN_BATCH_SIZE=4096
 ### Disable GPU Prefill
 ```bash
 # Disable GPU prefill
-LVLLM_GPU_PREFILL_MIN_BATCH_SIZE=0
-LVLLM_GPU_PREFILL_MIN_BATCH_SIZE="" 
+LVLLM_GPU_PREFILL_MIN_BATCH_SIZE=0 
 # 1024 to 8192, too large is meaningless (occupies too much VRAM and long startup time)
 --max-num-batched-tokens 4096
 ``` 
diff --git a/README_cn.md b/README_cn.md
@@ -598,8 +598,7 @@ LVLLM_GPU_PREFILL_MIN_BATCH_SIZE=4096
 ### 关闭GPU预填充
 ```bash
 #  关闭GPU预填充
-LVLLM_GPU_PREFILL_MIN_BATCH_SIZE=0 
-LVLLM_GPU_PREFILL_MIN_BATCH_SIZE=""
+LVLLM_GPU_PREFILL_MIN_BATCH_SIZE=0
 # 1024至8192，太大无意义（占用显存及启动时间过长）
 --max-num-batched-tokens 4096 
 ``` 
diff --git a/vllm/model_executor/layers/fused_moe/unquantized_fused_moe_method.py b/vllm/model_executor/layers/fused_moe/unquantized_fused_moe_method.py
@@ -64,11 +64,12 @@ def maybe_make_prepare_finalize(
         self,
         routing_tables: tuple[torch.Tensor, torch.Tensor, torch.Tensor] | None = None,
     ):
-        raise ValueError(
-            f"{self.__class__.__name__} uses the new modular kernel initialization "
-            "logic for all but the CPU backend. CPU backend is monolithic. "
-            "So this function should not be called."
-        )
+        pass
+        # raise ValueError(
+        #     f"{self.__class__.__name__} uses the new modular kernel initialization "
+        #     "logic for all but the CPU backend. CPU backend is monolithic. "
+        #     "So this function should not be called."
+        # )
 
     def select_gemm_impl(
         self,