Fix Qwen Omni quantization model issue for long form audio generation by lvliang-intel · Pull Request #1698 · intel/auto-round

lvliang-intel · 2026-04-17T02:00:06Z

Description

Problem

The talker module was quantized, which should have been kept in float16. This caused severe audio
quality degradation for long form audio generation.

Fix

Exclude the talker part from quantization to maintain float16 precision.

Type of Change

Related Issues

https://huggingface.co/Intel/Qwen3-Omni-30B-A3B-Instruct-int4-AutoRound/discussions/1

Checklist Before Submitting

My code has been tested locally.
Documentation has been updated as needed.
New or updated tests are included where applicable.

Signed-off-by: lvliang-intel <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: lvliang-intel <[email protected]>

…-round into lvl/add_claude_skills

for more information, see https://pre-commit.ci

Signed-off-by: lvliang-intel <[email protected]>

…-round into lvl/add_claude_skills

Signed-off-by: lvliang-intel <[email protected]>

…ix_omni_long_audio

Signed-off-by: lvliang-intel <[email protected]>

for more information, see https://pre-commit.ci

Copilot

Pull request overview

This PR fixes long-form audio quality degradation for Qwen Omni models by excluding the talker module from quantization (keeping it in float16/BF16), while ensuring the model can still be exported correctly (including save-time handling for fused MoE expert tensors).

Changes:

Exclude talker blocks from default quantization block discovery for Qwen2.5-Omni and Qwen3-Omni-MoE.
Add MoE skip-prefix support so talker.* MoE modules remain fused during quantization, and expand fused expert tensors at save time.
Adjust missing-tensor copying/WOQ behavior to preserve talker.* tensors by exact key and prevent unintended quantization.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`auto_round/special_model_handler.py`	Removes `talker` from default block lists for Qwen Omni models and documents rationale.
`auto_round/modeling/fused_moe/replace_modules.py`	Introduces `MOE_SKIP_PREFIXES` and wires it into MoE preparation.
`auto_round/modeling/fused_moe/moe_experts_interface.py`	Adds `skip_prefixes` to skip unfusing modules under excluded prefixes.
`auto_round/compressors/shard_writer.py`	Expands fused 3D expert params into per-expert 2D tensors at save time for skipped prefixes.
`auto_round/utils/missing_tensors.py`	Ensures `talker.*` tensors are treated as truly missing when absent and scopes WOQ quantization to `block_name_to_quantize`.
`auto_round/utils/common.py`	Adds `collapse_ignore_layers()` helper to reduce per-layer ignore config churn.
`auto_round/compressors/base.py`	Filters predefined ignore layers to quantized blocks and collapses numbered ignore layers into regex patterns.
`auto_round/modeling/fused_moe/qwen3_omni.py`	Drops talker replacement path; keeps only thinker replacement and documents save-time conversion.
`test/test_cpu/models/test_omni_model.py`	Updates assertions to reflect thinker-only default quantization and no talker replacement.
`test/test_cpu/utils/test_shard_writer.py`	Adds tests for save-time expansion of fused experts under skipped prefixes.
`test/test_cpu/utils/test_missing_tensors.py`	Adds tests ensuring missing `talker` tensors are copied exactly and never WOQ-quantized.
`.claude/skills/readme.md`	Adds contributor skill documentation index.
`.claude/skills/review-pr/SKILL.md`	Adds a structured PR review checklist (project-specific).
`.claude/skills/add-vlm-model/SKILL.md`	Adds workflow documentation for integrating new VLMs.
`.claude/skills/add-quantization-datatype/SKILL.md`	Adds workflow documentation for adding quantization datatypes.
`.claude/skills/add-inference-backend/SKILL.md`	Adds workflow documentation for adding inference backends.
`.claude/skills/add-export-format/SKILL.md`	Adds workflow documentation for adding export formats.
`.claude/skills/adapt-new-llm/SKILL.md`	Adds workflow documentation for adapting new LLM architectures.
`.claude/skills/adapt-new-diffusion-model/SKILL.md`	Adds workflow documentation for adapting new diffusion architectures.

Copilot · 2026-04-17T02:08:49Z

    - Optionally: visual encoder blocks, audio encoder layers
+
+    talker is excluded by default because quantizing it has been observed to
+    degrade audio quality in long-form generation .


The docstring has an extra space before the period in "long-form generation ." which reads as a typo. Please remove the stray space for consistent documentation formatting.

Suggested change

degrade audio quality in long-form generation .

degrade audio quality in long-form generation.

Copilot · 2026-04-17T02:08:50Z

+                    result.append((f"{prefix}.{i}.{split_name}.weight", chunk[i].clone()))
+        else:
+            for i in range(num_experts):
+                result.append((f"{prefix}.{i}.{attr_name}.weight", tensor[i].clone()))


_expand_fused_experts() clones each expert slice before passing it into _add_tensor(), but _add_tensor() immediately detaches and copies to CPU. The clone creates an extra full-size device-side copy and can significantly increase peak RAM/VRAM for large MoE weights. Prefer to avoid clone here (e.g., rely on detach().cpu() / contiguous() inside _add_tensor or just ensure the slice is contiguous) so save-time expansion doesn’t double memory usage.

Suggested change

result.append((f"{prefix}.{i}.{split_name}.weight", chunk[i].clone()))

else:

for i in range(num_experts):

result.append((f"{prefix}.{i}.{attr_name}.weight", tensor[i].clone()))

result.append((f"{prefix}.{i}.{split_name}.weight", chunk[i]))

else:

for i in range(num_experts):

result.append((f"{prefix}.{i}.{attr_name}.weight", tensor[i]))

lvliang-intel · 2026-04-17T02:51:58Z

/azp run Unit-Test-CUDA-AutoRound

azure-pipelines · 2026-04-17T02:52:07Z

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: lvliang-intel <[email protected]>

…to-round into lvl/fix_omni_long_audio

lvliang-intel · 2026-04-17T05:33:07Z

/azp run Unit-Test-CUDA-AutoRound

azure-pipelines · 2026-04-17T05:33:17Z

Azure Pipelines successfully started running 1 pipeline(s).

lvliang-intel and others added 9 commits April 15, 2026 15:33

Add Claude skills for AutoRound

e2e8e2e

Signed-off-by: lvliang-intel <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

1cb12b2

for more information, see https://pre-commit.ci

add another 2 skills

f763164

Signed-off-by: lvliang-intel <[email protected]>

Merge branch 'lvl/add_claude_skills' of https://github.com/intel/auto…

3f213cd

…-round into lvl/add_claude_skills

[pre-commit.ci] auto fixes from pre-commit.com hooks

99e19a5

for more information, see https://pre-commit.ci

update skill for comments

7d525c7

Signed-off-by: lvliang-intel <[email protected]>

Merge branch 'lvl/add_claude_skills' of https://github.com/intel/auto…

76c69f8

…-round into lvl/add_claude_skills

Fix Qwen Omni quantization model issue for long-form audio generation

e410064

Signed-off-by: lvliang-intel <[email protected]>

Merge branch 'main' of https://github.com/intel/auto-round into lvl/f…

d964c82

…ix_omni_long_audio

Copilot AI review requested due to automatic review settings April 17, 2026 02:00

lvliang-intel force-pushed the lvl/fix_omni_long_audio branch from 9414d33 to d964c82 Compare April 17, 2026 02:04

Copilot started reviewing on behalf of lvliang-intel April 17, 2026 02:05 View session

lvliang-intel and others added 3 commits April 17, 2026 10:06

remove wrong files

e335c61

Signed-off-by: lvliang-intel <[email protected]>

Merge branch 'main' into lvl/fix_omni_long_audio

9171e68

[pre-commit.ci] auto fixes from pre-commit.com hooks

40abcaf

for more information, see https://pre-commit.ci

Copilot AI reviewed Apr 17, 2026

View reviewed changes

lvliang-intel added 3 commits April 17, 2026 11:27

fix ci xpu issue, not caused by this PR

d617c79

Signed-off-by: lvliang-intel <[email protected]>

Merge branch 'lvl/fix_omni_long_audio' of https://github.com/intel/au…

64c86c9

…to-round into lvl/fix_omni_long_audio

Merge branch 'main' into lvl/fix_omni_long_audio

8a3946c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Qwen Omni quantization model issue for long form audio generation#1698

Fix Qwen Omni quantization model issue for long form audio generation#1698
lvliang-intel wants to merge 15 commits intomainfrom
lvl/fix_omni_long_audio

lvliang-intel commented Apr 17, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

lvliang-intel commented Apr 17, 2026

Uh oh!

azure-pipelines Bot commented Apr 17, 2026

Uh oh!

lvliang-intel commented Apr 17, 2026

Uh oh!

azure-pipelines Bot commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	degrade audio quality in long-form generation .
	degrade audio quality in long-form generation.

Conversation

lvliang-intel commented Apr 17, 2026

Description

Problem

Fix

Type of Change

Related Issues

Checklist Before Submitting

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

lvliang-intel commented Apr 17, 2026

Uh oh!

azure-pipelines Bot commented Apr 17, 2026

Uh oh!

lvliang-intel commented Apr 17, 2026

Uh oh!

azure-pipelines Bot commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants