support wInt4aFp8 for moe by Wangzheee · Pull Request #518 · vllm-project/compressed-tensors

Wangzheee · 2025-11-12T12:53:12Z

SUMMARY:
Add Int4PackedQuantizationCompressor for support wInt4aFp8

with llm-compressor PR: vllm-project/llm-compressor#2027
with sglang PR: sgl-project/sglang#11701

brian-dellabetta

Hi @Wangzheee , thanks for the contribution. Can you explain why this is needed, and why the PackedQuantizationCompressor can't be used instead? Perhaps it can subclass it to reduce redundant logic

brian-dellabetta · 2025-11-14T20:33:47Z

+@BaseCompressor.register(name=CompressionFormat.int4_quantized.value)
+class Int4PackedQuantizationCompressor(BaseQuantizationCompressor):
+    """
+    Compresses a quantized model by packing every eight 4-bit weights into an int8


Suggested change

Compresses a quantized model by packing every eight 4-bit weights into an int8

Compresses a quantized model by packing every two 4-bit weights into an int8

dsikka · 2026-01-19T19:49:01Z

@Mergifyio refresh

mergify · 2026-01-19T19:49:15Z

refresh

✅ Pull request refreshed

mergify · 2026-01-19T19:49:43Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Wangzheee.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Wangzheee added 3 commits November 12, 2025 20:48

support wInt4aFp8 for moe

649bbe4

fix

0ca8c35

fix

137705f

Wangzheee force-pushed the wInt4aFp8 branch from 4cff614 to 137705f Compare November 14, 2025 06:10

brian-dellabetta reviewed Nov 14, 2025

View reviewed changes

mergify Bot added the needs-rebase label Jan 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support wInt4aFp8 for moe#518

support wInt4aFp8 for moe#518
Wangzheee wants to merge 3 commits intovllm-project:mainfrom
Wangzheee:wInt4aFp8

Wangzheee commented Nov 12, 2025 •

edited

Loading

Uh oh!

brian-dellabetta left a comment

Uh oh!

brian-dellabetta Nov 14, 2025

Uh oh!

dsikka commented Jan 19, 2026

Uh oh!

mergify Bot commented Jan 19, 2026

Uh oh!

mergify Bot commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	Compresses a quantized model by packing every eight 4-bit weights into an int8
	Compresses a quantized model by packing every two 4-bit weights into an int8

Conversation

Wangzheee commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

brian-dellabetta Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

dsikka commented Jan 19, 2026

Uh oh!

mergify Bot commented Jan 19, 2026

✅ Pull request refreshed

Uh oh!

mergify Bot commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Wangzheee commented Nov 12, 2025 •

edited

Loading