Skip to content

Add MLX format export support for Apple Silicon and support vlm in AutoScheme#1732

Open
wenhuach21 wants to merge 37 commits intomainfrom
support_mlx
Open

Add MLX format export support for Apple Silicon and support vlm in AutoScheme#1732
wenhuach21 wants to merge 37 commits intomainfrom
support_mlx

Conversation

@wenhuach21
Copy link
Copy Markdown
Contributor

@wenhuach21 wenhuach21 commented Apr 23, 2026

Description

test qwen3.5-4b, qwen3-0.6b. As there is no devices to test, so the oob support may not enough

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Other (please specify):

Related Issues

Fixes or relates to #

Checklist Before Submitting

  • My code has been tested locally.
  • Documentation has been updated as needed.
  • New or updated tests are included where applicable.

wenhuach and others added 14 commits April 19, 2026 16:30
- Support W2/W3/W4/W8 quantized model export to MLX format
- Compatible with mlx-lm for inference on Apple Silicon
- Handle cross-word bit packing for 3-bit quantization
- Flatten rope_parameters for mlx-lm compatibility
Signed-off-by: Wenhua Cheng <[email protected]>
Signed-off-by: Wenhua Cheng <[email protected]>
Signed-off-by: Wenhua Cheng <[email protected]>
Signed-off-by: Wenhua Cheng <[email protected]>
Copilot AI review requested due to automatic review settings April 23, 2026 08:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds MLX export + Apple Silicon inference support by introducing an MLX backend, an MLX-format exporter, and MLX-specific quantized linear layers, plus tests covering export/inference behavior (including mixed-bit and VLM config validation).

Changes:

  • Introduces mlx output format and MLX exporter (export_to_mlx) that writes MLX-compatible config.json quantization blocks.
  • Adds MLX inference backend and QuantLinearMLX (including GPTQ→MLX post-init repacking on macOS).
  • Adds pytest coverage for MLX/native + auto_round flows and helper paths for Qwen models.

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
auto_round/export/export_to_mlx/export.py Implements MLX packing + config.json generation (mixed-bit + VLM handling).
auto_round/formats.py Registers mlx / auto_round:mlx formats and routes saving through MLX exporter.
auto_round/inference/backend.py Adds mlx backend and OS-based backend filtering; adjusts backend requirements.
auto_round/inference/convert_model.py Adds MPS device support and macOS GPTQ→MLX post-init conversion.
auto_round_extension/mlx/qlinear_mlx.py Adds QuantLinearMLX with MLX-kernel forward + GPTQ→MLX repacking logic.
auto_round_extension/torch/qlinear_mlx.py Adds backward-compat shim to the new MLX module location.
auto_round/utils/common.py Extends supported format list with mlx and auto_round:mlx.
auto_round/schemes.py Adds W5A16 and W6A16 preset schemes.
test/test_mlx/test_mlx_format.py Adds comprehensive MLX export/inference pytest suite (incl. mixed-bit + VLM config assertions).
test/helpers.py Adds a new helper path variable for a Qwen3 VL 9B model.
test_mlx_export.py Adds a standalone MLX export test script.
Comments suppressed due to low confidence (1)

auto_round/inference/backend.py:1

  • The auto_awq:gemm backend no longer declares its dependency requirement, but dynamic_import_inference_linear(...) still imports awq.modules.linear.WQLinear_GEMM for AWQ backends. Without requirements=[\"autoawq\"] (or the correct package name used in your environment), backend selection may succeed and then fail at runtime with an ImportError. Re-add an explicit requirement for the AWQ package so compatibility checks prevent selecting this backend when the dependency is missing.
# Copyright (c) 2024 Intel Corporation

Comment thread test/helpers.py
Comment thread auto_round/inference/backend.py
Comment thread auto_round_extension/mlx/qlinear_mlx.py
Comment thread auto_round/inference/convert_model.py
Comment thread test/test_mlx/test_mlx_format.py
Comment thread test_mlx_export.py
Comment thread auto_round/export/export_to_mlx/export.py Outdated
@wenhuach21 wenhuach21 changed the title Support mlx Add MLX format export support for Apple Silicon Apr 23, 2026
@wenhuach21
Copy link
Copy Markdown
Contributor Author

visual module's grad is 0 in autoscheme

@wenhuach21 wenhuach21 changed the title Add MLX format export support for Apple Silicon Add MLX format export support for Apple Silicon and support vlm in AutoScheme Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants