Add MLX format export support for Apple Silicon and support vlm in AutoScheme#1732
Open
wenhuach21 wants to merge 37 commits intomainfrom
Open
Add MLX format export support for Apple Silicon and support vlm in AutoScheme#1732wenhuach21 wants to merge 37 commits intomainfrom
wenhuach21 wants to merge 37 commits intomainfrom
Conversation
- Support W2/W3/W4/W8 quantized model export to MLX format - Compatible with mlx-lm for inference on Apple Silicon - Handle cross-word bit packing for 3-bit quantization - Flatten rope_parameters for mlx-lm compatibility
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Signed-off-by: Wenhua Cheng <[email protected]>
Signed-off-by: Wenhua Cheng <[email protected]>
Signed-off-by: Wenhua Cheng <[email protected]>
Signed-off-by: Wenhua Cheng <[email protected]>
for more information, see https://pre-commit.ci
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds MLX export + Apple Silicon inference support by introducing an MLX backend, an MLX-format exporter, and MLX-specific quantized linear layers, plus tests covering export/inference behavior (including mixed-bit and VLM config validation).
Changes:
- Introduces
mlxoutput format and MLX exporter (export_to_mlx) that writes MLX-compatibleconfig.jsonquantization blocks. - Adds MLX inference backend and
QuantLinearMLX(including GPTQ→MLX post-init repacking on macOS). - Adds pytest coverage for MLX/native + auto_round flows and helper paths for Qwen models.
Reviewed changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
auto_round/export/export_to_mlx/export.py |
Implements MLX packing + config.json generation (mixed-bit + VLM handling). |
auto_round/formats.py |
Registers mlx / auto_round:mlx formats and routes saving through MLX exporter. |
auto_round/inference/backend.py |
Adds mlx backend and OS-based backend filtering; adjusts backend requirements. |
auto_round/inference/convert_model.py |
Adds MPS device support and macOS GPTQ→MLX post-init conversion. |
auto_round_extension/mlx/qlinear_mlx.py |
Adds QuantLinearMLX with MLX-kernel forward + GPTQ→MLX repacking logic. |
auto_round_extension/torch/qlinear_mlx.py |
Adds backward-compat shim to the new MLX module location. |
auto_round/utils/common.py |
Extends supported format list with mlx and auto_round:mlx. |
auto_round/schemes.py |
Adds W5A16 and W6A16 preset schemes. |
test/test_mlx/test_mlx_format.py |
Adds comprehensive MLX export/inference pytest suite (incl. mixed-bit + VLM config assertions). |
test/helpers.py |
Adds a new helper path variable for a Qwen3 VL 9B model. |
test_mlx_export.py |
Adds a standalone MLX export test script. |
Comments suppressed due to low confidence (1)
auto_round/inference/backend.py:1
- The
auto_awq:gemmbackend no longer declares its dependency requirement, butdynamic_import_inference_linear(...)still importsawq.modules.linear.WQLinear_GEMMfor AWQ backends. Withoutrequirements=[\"autoawq\"](or the correct package name used in your environment), backend selection may succeed and then fail at runtime with an ImportError. Re-add an explicit requirement for the AWQ package so compatibility checks prevent selecting this backend when the dependency is missing.
# Copyright (c) 2024 Intel Corporation
Co-authored-by: Copilot <[email protected]>
for more information, see https://pre-commit.ci
…o support_mlx Signed-off-by: Wenhua Cheng <[email protected]>
for more information, see https://pre-commit.ci
Contributor
Author
|
visual module's grad is 0 in autoscheme |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
hshen14
approved these changes
Apr 24, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
test qwen3.5-4b, qwen3-0.6b. As there is no devices to test, so the oob support may not enough
Type of Change
Related Issues
Fixes or relates to #
Checklist Before Submitting