add warnings for lm_head activation scale fallback#1728
add warnings for lm_head activation scale fallback#1728
Conversation
Signed-off-by: n1ck-guo <[email protected]>
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds user-facing warnings to clarify behavior when lm_head static activation quantization lacks calibration statistics/inputs, especially when activation scale falls back to a default.
Changes:
- Emit a warning during quantization when
lm_headcalibration inputs are missing under static activation quantization. - Emit a warning during unwrap when
lm_headstatic activation stats are missing and scale fallback occurs/likely occurs.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| auto_round/wrapper.py | Adds a one-time warning during unwrap for lm_head when static activation stats are missing and a scale fallback is implied. |
| auto_round/compressors/base.py | Adds a one-time warning when lm_head activation calibration inputs are missing for static activation quantization. |
| ): | ||
| logger.warning_once( | ||
| "Static activation quantization for lm_head is not fully supported yet. " | ||
| "lm_head activation statistics are missing, so activation scale falls back to unit scale." |
There was a problem hiding this comment.
The warning message states that activation scale “falls back to unit scale”, but this conditional only infers missing stats (act_max is None) and the presence of act_scale, without verifying that a unit-scale fallback actually happened. Consider softening the wording (e.g., “may fall back”) or tightening the condition to detect the actual fallback value/state so the warning is always accurate.
| "lm_head activation statistics are missing, so activation scale falls back to unit scale." | |
| "lm_head activation statistics are missing, so activation scale may fall back to unit scale." |
| if "lm_head" in layer_name: | ||
| logger.warning_once( | ||
| "Static activation quantization for lm_head is not fully supported yet. " | ||
| "If lm_head calibration inputs are missing, activation scale may fall back to unit scale " | ||
| "or quantization may be skipped." | ||
| ) |
There was a problem hiding this comment.
There are now two separate lm_head static-activation warnings (here and in unwrapper_layer), with similar but not identical text. Because the message strings differ, warning_once will still emit both, which can be noisy and confusing. Consider centralizing this into a shared helper/message constant (or aligning the exact string) and/or choosing a single place to warn (calibration-time vs unwrap-time) so users see one consistent warning.
| if "lm_head" in layer_name: | |
| logger.warning_once( | |
| "Static activation quantization for lm_head is not fully supported yet. " | |
| "If lm_head calibration inputs are missing, activation scale may fall back to unit scale " | |
| "or quantization may be skipped." | |
| ) |
|
/azp run Unit-Test-CUDA-AutoRound |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Description
Please briefly describe your main changes, the motivation.
Type of Change
Related Issues
Fixes or relates to #
Checklist Before Submitting