feat: batch text-only MLX VLM requests#4918
Conversation
There was a problem hiding this comment.
Code Review
This pull request enables continuous batching for text-only requests in MLX Vision models by introducing the _MLXLogitsModelAdapter and refactoring MLXBatchModel to use instance-level state instead of class-level variables. It also includes a new test case to verify parallel inference for text-only prompts. Feedback was provided to simplify the logic in _is_text_only_prompt for better readability and maintainability.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request enables continuous batching for text-only requests in MLX vision models by introducing a logits adapter and routing logic. It also refactors MLXBatchModel to use instance-level storage for improved state isolation. A review comment suggests making the text-only prompt detection more robust by checking for additional modalities like video and audio.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request refactors MLXBatchModel to use instance-level state instead of class-level variables and enables continuous batching for text-only prompts within MLXVisionModel. It introduces a _MLXLogitsModelAdapter to bridge mlx-vlm models with the mlx-lm API and includes new tests for prompt detection and parallel inference. Feedback was provided to move the _lock attribute in MLXBatchModel from the class level to the instance level to prevent unnecessary performance contention when multiple models are running in parallel.
Summary
Tests