feat: batch text-only MLX VLM requests by qinxuye · Pull Request #4918 · xorbitsai/inference

qinxuye · 2026-05-16T17:08:02Z

Summary

enable continuous batching for text-only MLX VLM requests via the underlying language model
keep image requests on the mlx-vlm path
isolate MLX batch generator state per model instance and add a VLM text-only parallel regression test

Tests

python -m py_compile xinference/model/llm/mlx/core.py xinference/model/llm/mlx/tests/test_mlx.py
python -m black --check xinference/model/llm/mlx/core.py xinference/model/llm/mlx/tests/test_mlx.py
PYTHONPATH=... python -X faulthandler -m pytest xinference/model/llm/mlx/tests/test_mlx.py::test_mlx_vision_text_only_parallel_inference -q -s
PYTHONPATH=... python -X faulthandler -m pytest xinference/model/llm/mlx/tests/test_mlx.py::test_load_mlx_vision -q -s

gemini-code-assist

Code Review

This pull request enables continuous batching for text-only requests in MLX Vision models by introducing the _MLXLogitsModelAdapter and refactoring MLXBatchModel to use instance-level state instead of class-level variables. It also includes a new test case to verify parallel inference for text-only prompts. Feedback was provided to simplify the logic in _is_text_only_prompt for better readability and maintainability.

qinxuye · 2026-05-16T17:13:17Z

/gemini review

gemini-code-assist

Code Review

This pull request enables continuous batching for text-only requests in MLX vision models by introducing a logits adapter and routing logic. It also refactors MLXBatchModel to use instance-level storage for improved state isolation. A review comment suggests making the text-only prompt detection more robust by checking for additional modalities like video and audio.

qinxuye · 2026-05-16T17:18:15Z

/gemini review

gemini-code-assist

Code Review

This pull request refactors MLXBatchModel to use instance-level state instead of class-level variables and enables continuous batching for text-only prompts within MLXVisionModel. It introduces a _MLXLogitsModelAdapter to bridge mlx-vlm models with the mlx-lm API and includes new tests for prompt detection and parallel inference. Feedback was provided to move the _lock attribute in MLXBatchModel from the class level to the instance level to prevent unnecessary performance contention when multiple models are running in parallel.

feat: batch text-only MLX VLM requests

6d97129

XprobeBot added the feature label May 16, 2026

XprobeBot added this to the v2.x milestone May 16, 2026

gemini-code-assist Bot reviewed May 16, 2026

View reviewed changes

Comment thread xinference/model/llm/mlx/core.py Outdated

fix: simplify MLX VLM text-only prompt check

27de5d4

gemini-code-assist Bot reviewed May 16, 2026

View reviewed changes

Comment thread xinference/model/llm/mlx/core.py Outdated

fix: guard MLX VLM batching against other modalities

75941df

gemini-code-assist Bot reviewed May 16, 2026

View reviewed changes

Comment thread xinference/model/llm/mlx/core.py Outdated

qinxuye added 8 commits May 17, 2026 01:25

fix: isolate MLX batch locks per instance

c4d4625

test: cap MLX VLM text-only generation length

bb7481c

fix: reset MLX batch stream in worker thread

82790e4

fix: bind MLX batch stream before batch ops

ab2e7fb

fix: use default MLX stream for batch generation

22b5906

fix: avoid mlx-lm batch stream wrapper

b96c439

fix: run MLX generation on default stream

db38608

fix: pin mlx deps for metal stream regression

fe3f39f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: batch text-only MLX VLM requests#4918

feat: batch text-only MLX VLM requests#4918
qinxuye wants to merge 11 commits into
xorbitsai:mainfrom
qinxuye:feat/mlx-vlm-batch

qinxuye commented May 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

qinxuye commented May 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

qinxuye commented May 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

qinxuye commented May 16, 2026

Summary

Tests

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

qinxuye commented May 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

qinxuye commented May 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants