Skip to content

fix(metal): GDN bfloat16, PA scheduler, error handling, MLX SDPA fixes#2047

Open
emanueleDiVizio wants to merge 9 commits intoEricLBuehler:masterfrom
emanueleDiVizio:fix/metal-fixes
Open

fix(metal): GDN bfloat16, PA scheduler, error handling, MLX SDPA fixes#2047
emanueleDiVizio wants to merge 9 commits intoEricLBuehler:masterfrom
emanueleDiVizio:fix/metal-fixes

Conversation

@emanueleDiVizio
Copy link
Copy Markdown

@emanueleDiVizio emanueleDiVizio commented Apr 2, 2026

Summary

This PR fixes multiple correctness, performance, and stability issues encountered while running mistral.rs on Apple Silicon (M-series) with real multi-user inference workloads (Qwen3.5 MoE + Mixtral).

The changes focus on:

  • Metal backend correctness (GDN + KV cache)
  • Scheduler behaviour under load (PagedAttention)
  • Robustness in concurrent serving scenarios
  • MLX integration improvements for attention kernels

Several of these issues only surface under concurrent decode or long-running sessions.

Key changes

Scheduler (from upstream PRs #2031/#2034)

  • Fix O(N²) thrashing in PagedAttention scheduler under mixed waiting/active workloads
  • Introduce FCFS priority ordering to prevent starvation

GDN / Metal

  • Fix dtype mismatch (bfloat vs bfloat16_t) in Metal kernels
  • Add per-sequence fallback for concurrent decode when recurrent offsets diverge

Stability

  • Replace panic on client disconnect with error handling
  • Return error instead of panic on block allocation failure (race condition)

Performance / Features

  • Increase Metal KV cache default max_seq_len (4K → 16K)
  • Add optional MLX SDPA backend with Metal flash attention (head_dim=256 support)

Test plan

  • Validated on Apple Silicon (M-series)
  • Tested with Qwen3.5 MoE (GDN) and Mixtral
  • Scheduler fixes verified under concurrent request load

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant