Skip to content

add warnings for lm_head activation scale fallback#1728

Open
n1ck-guo wants to merge 1 commit intomainfrom
hengguo/add_log
Open

add warnings for lm_head activation scale fallback#1728
n1ck-guo wants to merge 1 commit intomainfrom
hengguo/add_log

Conversation

@n1ck-guo
Copy link
Copy Markdown
Contributor

Description

Please briefly describe your main changes, the motivation.

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Other (please specify):

Related Issues

Fixes or relates to #

Checklist Before Submitting

  • My code has been tested locally.
  • Documentation has been updated as needed.
  • New or updated tests are included where applicable.

@n1ck-guo n1ck-guo requested review from Copilot and xin3he April 23, 2026 06:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds user-facing warnings to clarify behavior when lm_head static activation quantization lacks calibration statistics/inputs, especially when activation scale falls back to a default.

Changes:

  • Emit a warning during quantization when lm_head calibration inputs are missing under static activation quantization.
  • Emit a warning during unwrap when lm_head static activation stats are missing and scale fallback occurs/likely occurs.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
auto_round/wrapper.py Adds a one-time warning during unwrap for lm_head when static activation stats are missing and a scale fallback is implied.
auto_round/compressors/base.py Adds a one-time warning when lm_head activation calibration inputs are missing for static activation quantization.

Comment thread auto_round/wrapper.py
):
logger.warning_once(
"Static activation quantization for lm_head is not fully supported yet. "
"lm_head activation statistics are missing, so activation scale falls back to unit scale."
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning message states that activation scale “falls back to unit scale”, but this conditional only infers missing stats (act_max is None) and the presence of act_scale, without verifying that a unit-scale fallback actually happened. Consider softening the wording (e.g., “may fall back”) or tightening the condition to detect the actual fallback value/state so the warning is always accurate.

Suggested change
"lm_head activation statistics are missing, so activation scale falls back to unit scale."
"lm_head activation statistics are missing, so activation scale may fall back to unit scale."

Copilot uses AI. Check for mistakes.
Comment on lines +1986 to +1991
if "lm_head" in layer_name:
logger.warning_once(
"Static activation quantization for lm_head is not fully supported yet. "
"If lm_head calibration inputs are missing, activation scale may fall back to unit scale "
"or quantization may be skipped."
)
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are now two separate lm_head static-activation warnings (here and in unwrapper_layer), with similar but not identical text. Because the message strings differ, warning_once will still emit both, which can be noisy and confusing. Consider centralizing this into a shared helper/message constant (or aligning the exact string) and/or choosing a single place to warn (calibration-time vs unwrap-time) so users see one consistent warning.

Suggested change
if "lm_head" in layer_name:
logger.warning_once(
"Static activation quantization for lm_head is not fully supported yet. "
"If lm_head calibration inputs are missing, activation scale may fall back to unit scale "
"or quantization may be skipped."
)

Copilot uses AI. Check for mistakes.
@n1ck-guo
Copy link
Copy Markdown
Contributor Author

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants