Skip to content

v0.2.0

Latest

Choose a tag to compare

@juntao juntao released this 28 Mar 01:21
· 5 commits to main since this release

What's New

OpenAI-Compatible API Server

  • New asr-server binary with HTTP API for audio transcription
  • POST /v1/audio/transcriptions — multipart file upload with json, text, and verbose_json response formats
  • GET /v1/models and GET /health endpoints
  • CLI options: --model-dir, --host, --port, --language, -v

Pure Rust Audio Decoding

  • Replaced FFmpeg (C dependency) with Symphonia (pure Rust)
  • Supports MP3, FLAC, AAC, OGG, and WAV without any system dependencies
  • No more brew install ffmpeg or build-ffmpeg feature flag needed

MLX Performance Optimizations

  • Fused RmsNorm (mlx_fast_rms_norm)
  • Fused scaled dot-product attention (mlx_fast_sdpa) with native GQA support
  • Strategic eval() placement to bound lazy computation graphs
  • Pre-transposed weights and precomputed MRoPE cos/sin table
  • ~8% inference speedup on Apple Silicon (M4)

Bug Fixes

  • Fix attention scale in tch SDPA (multiply vs divide)
  • Fix GQA head expansion for tch backend

Performance (Apple M4 Mac Mini, 16GB)

Model Audio CLI API Server
0.6B 8.0s English 2.35s 2.10s
0.6B 3.5s English 1.30s 1.05s
1.7B 8.0s English 6.26s 5.80s
1.7B 3.5s English 3.40s 3.06s