v0.2.0

Latest

Latest

juntao released this 28 Mar 01:21

· 5 commits to main since this release

0226270

What's New

OpenAI-Compatible API Server

New asr-server binary with HTTP API for audio transcription
POST /v1/audio/transcriptions — multipart file upload with json, text, and verbose_json response formats
GET /v1/models and GET /health endpoints
CLI options: --model-dir, --host, --port, --language, -v

Pure Rust Audio Decoding

Replaced FFmpeg (C dependency) with Symphonia (pure Rust)
Supports MP3, FLAC, AAC, OGG, and WAV without any system dependencies
No more brew install ffmpeg or build-ffmpeg feature flag needed

MLX Performance Optimizations

Fused RmsNorm (mlx_fast_rms_norm)
Fused scaled dot-product attention (mlx_fast_sdpa) with native GQA support
Strategic eval() placement to bound lazy computation graphs
Pre-transposed weights and precomputed MRoPE cos/sin table
~8% inference speedup on Apple Silicon (M4)

Bug Fixes

Fix attention scale in tch SDPA (multiply vs divide)
Fix GQA head expansion for tch backend

Performance (Apple M4 Mac Mini, 16GB)

Model	Audio	CLI	API Server
0.6B	8.0s English	2.35s	2.10s
0.6B	3.5s English	1.30s	1.05s
1.7B	8.0s English	6.26s	5.80s
1.7B	3.5s English	3.40s	3.06s

Assets 7