Skip to content

Commit 9e9c5f2

Browse files
committed
fix bug: ValueError: UnquantizedFusedMoEMethod uses the new modular kernel initialization logic for all but the CPU backend. CPU backend is monolithic. So this function should not be called.
1 parent 93956d6 commit 9e9c5f2

3 files changed

Lines changed: 8 additions & 9 deletions

File tree

README.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -592,8 +592,7 @@ LVLLM_GPU_PREFILL_MIN_BATCH_SIZE=4096
592592
### Disable GPU Prefill
593593
```bash
594594
# Disable GPU prefill
595-
LVLLM_GPU_PREFILL_MIN_BATCH_SIZE=0
596-
LVLLM_GPU_PREFILL_MIN_BATCH_SIZE=""
595+
LVLLM_GPU_PREFILL_MIN_BATCH_SIZE=0
597596
# 1024 to 8192, too large is meaningless (occupies too much VRAM and long startup time)
598597
--max-num-batched-tokens 4096
599598
```

README_cn.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -598,8 +598,7 @@ LVLLM_GPU_PREFILL_MIN_BATCH_SIZE=4096
598598
### 关闭GPU预填充
599599
```bash
600600
# 关闭GPU预填充
601-
LVLLM_GPU_PREFILL_MIN_BATCH_SIZE=0
602-
LVLLM_GPU_PREFILL_MIN_BATCH_SIZE=""
601+
LVLLM_GPU_PREFILL_MIN_BATCH_SIZE=0
603602
# 1024至8192,太大无意义(占用显存及启动时间过长)
604603
--max-num-batched-tokens 4096
605604
```

vllm/model_executor/layers/fused_moe/unquantized_fused_moe_method.py

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -64,11 +64,12 @@ def maybe_make_prepare_finalize(
6464
self,
6565
routing_tables: tuple[torch.Tensor, torch.Tensor, torch.Tensor] | None = None,
6666
):
67-
raise ValueError(
68-
f"{self.__class__.__name__} uses the new modular kernel initialization "
69-
"logic for all but the CPU backend. CPU backend is monolithic. "
70-
"So this function should not be called."
71-
)
67+
pass
68+
# raise ValueError(
69+
# f"{self.__class__.__name__} uses the new modular kernel initialization "
70+
# "logic for all but the CPU backend. CPU backend is monolithic. "
71+
# "So this function should not be called."
72+
# )
7273

7374
def select_gemm_impl(
7475
self,

0 commit comments

Comments
 (0)