Releases: second-state/qwen3_asr_rs
Releases · second-state/qwen3_asr_rs
v0.2.0
What's New
OpenAI-Compatible API Server
- New
asr-serverbinary with HTTP API for audio transcription POST /v1/audio/transcriptions— multipart file upload with json, text, and verbose_json response formatsGET /v1/modelsandGET /healthendpoints- CLI options:
--model-dir,--host,--port,--language,-v
Pure Rust Audio Decoding
- Replaced FFmpeg (C dependency) with Symphonia (pure Rust)
- Supports MP3, FLAC, AAC, OGG, and WAV without any system dependencies
- No more
brew install ffmpegorbuild-ffmpegfeature flag needed
MLX Performance Optimizations
- Fused RmsNorm (
mlx_fast_rms_norm) - Fused scaled dot-product attention (
mlx_fast_sdpa) with native GQA support - Strategic
eval()placement to bound lazy computation graphs - Pre-transposed weights and precomputed MRoPE cos/sin table
- ~8% inference speedup on Apple Silicon (M4)
Bug Fixes
- Fix attention scale in tch SDPA (multiply vs divide)
- Fix GQA head expansion for tch backend
Performance (Apple M4 Mac Mini, 16GB)
| Model | Audio | CLI | API Server |
|---|---|---|---|
| 0.6B | 8.0s English | 2.35s | 2.10s |
| 0.6B | 3.5s English | 1.30s | 1.05s |
| 1.7B | 8.0s English | 6.26s | 5.80s |
| 1.7B | 3.5s English | 3.40s | 3.06s |
v0.1.9
v0.1.8
- Add
install.shone-step installer (detects platform, downloads binary + model + sample audio) - Add Linux x86_64 CUDA release build
- Pre-build tokenizers for 0.6B and 1.7B models in release assets
- Remove Python dependency from installer
- Update README Quick Start to use install script
v0.1.7
What's New
- Self-contained release zips: Each zip includes everything needed — no separate downloads
- Embedded rpath: Linux binaries find bundled
libtorch/libautomatically — noLD_LIBRARY_PATHneeded - MLX Metal GPU: macOS binary uses Apple MLX for native Metal acceleration
Release Artifacts
| File | Platform | Contents |
|---|---|---|
asr-linux-x86_64.zip |
Linux x86_64 | asr + libtorch/ (CPU) |
asr-linux-aarch64.zip |
Linux ARM64 | asr + libtorch/ (CPU) |
asr-macos-aarch64.zip |
macOS Apple Silicon | asr + mlx.metallib (Metal GPU) |
For CUDA GPU acceleration, download CUDA libtorch and build from source. See README.
Quick Start
# Download and extract (macOS example)
curl -LO https://github.com/second-state/qwen3_asr_rs/releases/download/v0.1.7/asr-macos-aarch64.zip
unzip asr-macos-aarch64.zip
# Download model
pip install huggingface_hub transformers
huggingface-cli download Qwen/Qwen3-ASR-0.6B --local-dir Qwen3-ASR-0.6B
python -c "
from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained('Qwen3-ASR-0.6B', trust_remote_code=True)
tok.backend_tokenizer.save('Qwen3-ASR-0.6B/tokenizer.json')
"
# Transcribe
./asr-macos-aarch64/asr Qwen3-ASR-0.6B input.wavv0.1.6
What's New
- Self-contained release zips: Each platform zip now includes everything needed to run — no separate downloads required
- Linux:
asrbinary + bundledlibtorch/ - macOS:
asrbinary +mlx.metallib
- Linux:
- Embedded rpath: Linux binaries find
libtorch/librelative to themselves — noLD_LIBRARY_PATHneeded - CUDA support: Linux x86_64 CUDA 12.8 build for NVIDIA GPU acceleration
Release Artifacts
| File | Platform | Contents |
|---|---|---|
asr-linux-x86_64.zip |
Linux x86_64 | asr + libtorch/ (CPU) |
asr-linux-x86_64-cuda.zip |
Linux x86_64 | asr + libtorch/ (CUDA 12.8) |
asr-linux-aarch64.zip |
Linux ARM64 | asr + libtorch/ (CPU) |
asr-macos-aarch64.zip |
macOS Apple Silicon | asr + mlx.metallib (Metal GPU) |
Quick Start
# Download and extract
curl -LO https://github.com/second-state/qwen3_asr_rs/releases/download/v0.1.6/asr-macos-aarch64.zip
unzip asr-macos-aarch64.zip
# Download model
pip install huggingface_hub transformers
huggingface-cli download Qwen/Qwen3-ASR-0.6B --local-dir Qwen3-ASR-0.6B
python -c "
from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained('Qwen3-ASR-0.6B', trust_remote_code=True)
tok.backend_tokenizer.save('Qwen3-ASR-0.6B/tokenizer.json')
"
# Transcribe
./asr-macos-aarch64/asr Qwen3-ASR-0.6B input.wavv0.1.5 release
What's Changed
Full Changelog: v0.1.4...v0.1.5
v0.1.4
What's New
- Apple MLX backend: Native Metal GPU acceleration on macOS Apple Silicon — no libtorch dependency needed
- CUDA release binary: Linux x86_64 CUDA 12.8 build for NVIDIA GPU acceleration
- 1.25x–1.80x faster on macOS: MLX Metal GPU vs libtorch CPU on Apple M4
- Fix: Ship
mlx.metallibalongside macOS binary so Metal GPU kernels are found at runtime
Release Artifacts
| File | Platform | Backend |
|---|---|---|
asr-linux-x86_64.zip |
Linux x86_64 | libtorch (CPU) |
asr-linux-x86_64-cuda.zip |
Linux x86_64 | libtorch (CUDA 12.8) |
asr-linux-aarch64.zip |
Linux ARM64 | libtorch (CPU) |
asr-macos-aarch64.zip |
macOS Apple Silicon | MLX (Metal GPU) |
Each zip extracts into a named directory containing the asr binary (and mlx.metallib for macOS).
Build from Source
# libtorch backend (default)
cargo build --release --features build-ffmpeg
# MLX backend (macOS Apple Silicon)
git submodule update --init --recursive
cargo build --release --no-default-features --features mlx,build-ffmpegv0.1.3
What's New
- Apple MLX backend: Native Metal GPU acceleration on macOS Apple Silicon — no libtorch dependency needed
- Dual backend architecture: Unified
Tensorabstraction supporting bothtch-backend(default, cross-platform) andmlx(macOS) - CUDA release binary: Linux x86_64 CUDA build for NVIDIA GPU acceleration
- 1.25x–1.80x faster on macOS: MLX Metal GPU vs libtorch CPU on Apple M4
Release Artifacts
| File | Platform | Backend |
|---|---|---|
asr-linux-x86_64.zip |
Linux x86_64 | libtorch (CPU) |
asr-linux-x86_64-cuda.zip |
Linux x86_64 | libtorch (CUDA 12.8) |
asr-linux-aarch64.zip |
Linux ARM64 | libtorch (CPU) |
asr-macos-aarch64.zip |
macOS Apple Silicon | MLX (Metal GPU) |
Each zip extracts into a directory containing the asr binary.
Build from Source
# libtorch backend (default)
cargo build --release --features build-ffmpeg
# MLX backend (macOS Apple Silicon)
git submodule update --init --recursive
cargo build --release --no-default-features --features mlx,build-ffmpeg