Skip to content

Releases: second-state/qwen3_asr_rs

v0.2.0

28 Mar 01:21

Choose a tag to compare

What's New

OpenAI-Compatible API Server

  • New asr-server binary with HTTP API for audio transcription
  • POST /v1/audio/transcriptions — multipart file upload with json, text, and verbose_json response formats
  • GET /v1/models and GET /health endpoints
  • CLI options: --model-dir, --host, --port, --language, -v

Pure Rust Audio Decoding

  • Replaced FFmpeg (C dependency) with Symphonia (pure Rust)
  • Supports MP3, FLAC, AAC, OGG, and WAV without any system dependencies
  • No more brew install ffmpeg or build-ffmpeg feature flag needed

MLX Performance Optimizations

  • Fused RmsNorm (mlx_fast_rms_norm)
  • Fused scaled dot-product attention (mlx_fast_sdpa) with native GQA support
  • Strategic eval() placement to bound lazy computation graphs
  • Pre-transposed weights and precomputed MRoPE cos/sin table
  • ~8% inference speedup on Apple Silicon (M4)

Bug Fixes

  • Fix attention scale in tch SDPA (multiply vs divide)
  • Fix GQA head expansion for tch backend

Performance (Apple M4 Mac Mini, 16GB)

Model Audio CLI API Server
0.6B 8.0s English 2.35s 2.10s
0.6B 3.5s English 1.30s 1.05s
1.7B 8.0s English 6.26s 5.80s
1.7B 3.5s English 3.40s 3.06s

v0.1.9

06 Mar 03:30

Choose a tag to compare

All Linux builds now bundle libtorch from libtorch-releases. Added ARM64 CUDA (Jetson) build.

v0.1.8

02 Mar 06:52

Choose a tag to compare

  • Add install.sh one-step installer (detects platform, downloads binary + model + sample audio)
  • Add Linux x86_64 CUDA release build
  • Pre-build tokenizers for 0.6B and 1.7B models in release assets
  • Remove Python dependency from installer
  • Update README Quick Start to use install script

v0.1.7

01 Mar 07:52

Choose a tag to compare

What's New

  • Self-contained release zips: Each zip includes everything needed — no separate downloads
  • Embedded rpath: Linux binaries find bundled libtorch/lib automatically — no LD_LIBRARY_PATH needed
  • MLX Metal GPU: macOS binary uses Apple MLX for native Metal acceleration

Release Artifacts

File Platform Contents
asr-linux-x86_64.zip Linux x86_64 asr + libtorch/ (CPU)
asr-linux-aarch64.zip Linux ARM64 asr + libtorch/ (CPU)
asr-macos-aarch64.zip macOS Apple Silicon asr + mlx.metallib (Metal GPU)

For CUDA GPU acceleration, download CUDA libtorch and build from source. See README.

Quick Start

# Download and extract (macOS example)
curl -LO https://github.com/second-state/qwen3_asr_rs/releases/download/v0.1.7/asr-macos-aarch64.zip
unzip asr-macos-aarch64.zip

# Download model
pip install huggingface_hub transformers
huggingface-cli download Qwen/Qwen3-ASR-0.6B --local-dir Qwen3-ASR-0.6B
python -c "
from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained('Qwen3-ASR-0.6B', trust_remote_code=True)
tok.backend_tokenizer.save('Qwen3-ASR-0.6B/tokenizer.json')
"

# Transcribe
./asr-macos-aarch64/asr Qwen3-ASR-0.6B input.wav

v0.1.6

01 Mar 07:19

Choose a tag to compare

What's New

  • Self-contained release zips: Each platform zip now includes everything needed to run — no separate downloads required
    • Linux: asr binary + bundled libtorch/
    • macOS: asr binary + mlx.metallib
  • Embedded rpath: Linux binaries find libtorch/lib relative to themselves — no LD_LIBRARY_PATH needed
  • CUDA support: Linux x86_64 CUDA 12.8 build for NVIDIA GPU acceleration

Release Artifacts

File Platform Contents
asr-linux-x86_64.zip Linux x86_64 asr + libtorch/ (CPU)
asr-linux-x86_64-cuda.zip Linux x86_64 asr + libtorch/ (CUDA 12.8)
asr-linux-aarch64.zip Linux ARM64 asr + libtorch/ (CPU)
asr-macos-aarch64.zip macOS Apple Silicon asr + mlx.metallib (Metal GPU)

Quick Start

# Download and extract
curl -LO https://github.com/second-state/qwen3_asr_rs/releases/download/v0.1.6/asr-macos-aarch64.zip
unzip asr-macos-aarch64.zip

# Download model
pip install huggingface_hub transformers
huggingface-cli download Qwen/Qwen3-ASR-0.6B --local-dir Qwen3-ASR-0.6B
python -c "
from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained('Qwen3-ASR-0.6B', trust_remote_code=True)
tok.backend_tokenizer.save('Qwen3-ASR-0.6B/tokenizer.json')
"

# Transcribe
./asr-macos-aarch64/asr Qwen3-ASR-0.6B input.wav

v0.1.5 release

01 Mar 06:29
75e4795

Choose a tag to compare

What's Changed

  • Fix where_self argument order in attention mask construction by @juntao in #6

Full Changelog: v0.1.4...v0.1.5

v0.1.4

22 Feb 06:20

Choose a tag to compare

What's New

  • Apple MLX backend: Native Metal GPU acceleration on macOS Apple Silicon — no libtorch dependency needed
  • CUDA release binary: Linux x86_64 CUDA 12.8 build for NVIDIA GPU acceleration
  • 1.25x–1.80x faster on macOS: MLX Metal GPU vs libtorch CPU on Apple M4
  • Fix: Ship mlx.metallib alongside macOS binary so Metal GPU kernels are found at runtime

Release Artifacts

File Platform Backend
asr-linux-x86_64.zip Linux x86_64 libtorch (CPU)
asr-linux-x86_64-cuda.zip Linux x86_64 libtorch (CUDA 12.8)
asr-linux-aarch64.zip Linux ARM64 libtorch (CPU)
asr-macos-aarch64.zip macOS Apple Silicon MLX (Metal GPU)

Each zip extracts into a named directory containing the asr binary (and mlx.metallib for macOS).

Build from Source

# libtorch backend (default)
cargo build --release --features build-ffmpeg

# MLX backend (macOS Apple Silicon)
git submodule update --init --recursive
cargo build --release --no-default-features --features mlx,build-ffmpeg

v0.1.3

22 Feb 04:04
e64d0f1

Choose a tag to compare

What's New

  • Apple MLX backend: Native Metal GPU acceleration on macOS Apple Silicon — no libtorch dependency needed
  • Dual backend architecture: Unified Tensor abstraction supporting both tch-backend (default, cross-platform) and mlx (macOS)
  • CUDA release binary: Linux x86_64 CUDA build for NVIDIA GPU acceleration
  • 1.25x–1.80x faster on macOS: MLX Metal GPU vs libtorch CPU on Apple M4

Release Artifacts

File Platform Backend
asr-linux-x86_64.zip Linux x86_64 libtorch (CPU)
asr-linux-x86_64-cuda.zip Linux x86_64 libtorch (CUDA 12.8)
asr-linux-aarch64.zip Linux ARM64 libtorch (CPU)
asr-macos-aarch64.zip macOS Apple Silicon MLX (Metal GPU)

Each zip extracts into a directory containing the asr binary.

Build from Source

# libtorch backend (default)
cargo build --release --features build-ffmpeg

# MLX backend (macOS Apple Silicon)
git submodule update --init --recursive
cargo build --release --no-default-features --features mlx,build-ffmpeg

v0.1.2

22 Feb 02:04

Choose a tag to compare

  • Fix SIGILL crash on x86_64 by removing target/ from cargo cache (prevents cross-runner CPU feature mismatch with build-ffmpeg)
  • Cap libtorch CPU ISA to AVX2 in CI

v0.1.1

21 Feb 20:53

Choose a tag to compare

  • Fix outdated documentation about audio preprocessing pipeline
  • Package release artifacts as .zip files