Skip to content

support wInt4aFp8 for moe#518

Open
Wangzheee wants to merge 3 commits intovllm-project:mainfrom
Wangzheee:wInt4aFp8
Open

support wInt4aFp8 for moe#518
Wangzheee wants to merge 3 commits intovllm-project:mainfrom
Wangzheee:wInt4aFp8

Conversation

@Wangzheee
Copy link
Copy Markdown

@Wangzheee Wangzheee commented Nov 12, 2025

SUMMARY:
Add Int4PackedQuantizationCompressor for support wInt4aFp8

with llm-compressor PR: vllm-project/llm-compressor#2027
with sglang PR: sgl-project/sglang#11701

Copy link
Copy Markdown
Collaborator

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Wangzheee , thanks for the contribution. Can you explain why this is needed, and why the PackedQuantizationCompressor can't be used instead? Perhaps it can subclass it to reduce redundant logic

@BaseCompressor.register(name=CompressionFormat.int4_quantized.value)
class Int4PackedQuantizationCompressor(BaseQuantizationCompressor):
"""
Compresses a quantized model by packing every eight 4-bit weights into an int8
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Compresses a quantized model by packing every eight 4-bit weights into an int8
Compresses a quantized model by packing every two 4-bit weights into an int8

@dsikka
Copy link
Copy Markdown
Collaborator

dsikka commented Jan 19, 2026

@Mergifyio refresh

@mergify
Copy link
Copy Markdown

mergify Bot commented Jan 19, 2026

refresh

✅ Pull request refreshed

@mergify
Copy link
Copy Markdown

mergify Bot commented Jan 19, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Wangzheee.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Jan 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants