Releases
v0.2.0
Compare
Sorry, something went wrong.
No results found
juntao
released this
28 Mar 01:21
What's New
OpenAI-Compatible API Server
New asr-server binary with HTTP API for audio transcription
POST /v1/audio/transcriptions — multipart file upload with json, text, and verbose_json response formats
GET /v1/models and GET /health endpoints
CLI options: --model-dir, --host, --port, --language, -v
Pure Rust Audio Decoding
Replaced FFmpeg (C dependency) with Symphonia (pure Rust)
Supports MP3, FLAC, AAC, OGG, and WAV without any system dependencies
No more brew install ffmpeg or build-ffmpeg feature flag needed
MLX Performance Optimizations
Fused RmsNorm (mlx_fast_rms_norm)
Fused scaled dot-product attention (mlx_fast_sdpa) with native GQA support
Strategic eval() placement to bound lazy computation graphs
Pre-transposed weights and precomputed MRoPE cos/sin table
~8% inference speedup on Apple Silicon (M4)
Bug Fixes
Fix attention scale in tch SDPA (multiply vs divide)
Fix GQA head expansion for tch backend
Performance (Apple M4 Mac Mini, 16GB)
Model
Audio
CLI
API Server
0.6B
8.0s English
2.35s
2.10s
0.6B
3.5s English
1.30s
1.05s
1.7B
8.0s English
6.26s
5.80s
1.7B
3.5s English
3.40s
3.06s
You can’t perform that action at this time.