Releases · second-state/qwen3_asr_rs

28 Mar 01:21

juntao

v0.2.0

0226270

v0.2.0 Latest

Latest

What's New

OpenAI-Compatible API Server

New asr-server binary with HTTP API for audio transcription
POST /v1/audio/transcriptions — multipart file upload with json, text, and verbose_json response formats
GET /v1/models and GET /health endpoints
CLI options: --model-dir, --host, --port, --language, -v

Pure Rust Audio Decoding

Replaced FFmpeg (C dependency) with Symphonia (pure Rust)
Supports MP3, FLAC, AAC, OGG, and WAV without any system dependencies
No more brew install ffmpeg or build-ffmpeg feature flag needed

MLX Performance Optimizations

Fused RmsNorm (mlx_fast_rms_norm)
Fused scaled dot-product attention (mlx_fast_sdpa) with native GQA support
Strategic eval() placement to bound lazy computation graphs
Pre-transposed weights and precomputed MRoPE cos/sin table
~8% inference speedup on Apple Silicon (M4)

Bug Fixes

Fix attention scale in tch SDPA (multiply vs divide)
Fix GQA head expansion for tch backend

Performance (Apple M4 Mac Mini, 16GB)

Model	Audio	CLI	API Server
0.6B	8.0s English	2.35s	2.10s
0.6B	3.5s English	1.30s	1.05s
1.7B	8.0s English	6.26s	5.80s
1.7B	3.5s English	3.40s	3.06s

Assets 7

06 Mar 03:30

juntao

v0.1.9

c818a50

v0.1.9

All Linux builds now bundle libtorch from libtorch-releases. Added ARM64 CUDA (Jetson) build.

Assets 7

02 Mar 06:52

juntao

v0.1.8

86ff275

v0.1.8

Add install.sh one-step installer (detects platform, downloads binary + model + sample audio)
Add Linux x86_64 CUDA release build
Pre-build tokenizers for 0.6B and 1.7B models in release assets
Remove Python dependency from installer
Update README Quick Start to use install script

Assets 6

01 Mar 07:52

juntao

v0.1.7

8f35456

v0.1.7

What's New

Self-contained release zips: Each zip includes everything needed — no separate downloads
Embedded rpath: Linux binaries find bundled libtorch/lib automatically — no LD_LIBRARY_PATH needed
MLX Metal GPU: macOS binary uses Apple MLX for native Metal acceleration

Release Artifacts

File	Platform	Contents
`asr-linux-x86_64.zip`	Linux x86_64	`asr` + `libtorch/` (CPU)
`asr-linux-aarch64.zip`	Linux ARM64	`asr` + `libtorch/` (CPU)
`asr-macos-aarch64.zip`	macOS Apple Silicon	`asr` + `mlx.metallib` (Metal GPU)

For CUDA GPU acceleration, download CUDA libtorch and build from source. See README.

Quick Start

# Download and extract (macOS example)
curl -LO https://github.com/second-state/qwen3_asr_rs/releases/download/v0.1.7/asr-macos-aarch64.zip
unzip asr-macos-aarch64.zip

# Download model
pip install huggingface_hub transformers
huggingface-cli download Qwen/Qwen3-ASR-0.6B --local-dir Qwen3-ASR-0.6B
python -c "
from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained('Qwen3-ASR-0.6B', trust_remote_code=True)
tok.backend_tokenizer.save('Qwen3-ASR-0.6B/tokenizer.json')
"

# Transcribe
./asr-macos-aarch64/asr Qwen3-ASR-0.6B input.wav

Assets 5

01 Mar 07:19

juntao

v0.1.6

e497334

v0.1.6

What's New

Self-contained release zips: Each platform zip now includes everything needed to run — no separate downloads required
- Linux: asr binary + bundled libtorch/
- macOS: asr binary + mlx.metallib
Embedded rpath: Linux binaries find libtorch/lib relative to themselves — no LD_LIBRARY_PATH needed
CUDA support: Linux x86_64 CUDA 12.8 build for NVIDIA GPU acceleration

Release Artifacts

File	Platform	Contents
`asr-linux-x86_64.zip`	Linux x86_64	`asr` + `libtorch/` (CPU)
`asr-linux-x86_64-cuda.zip`	Linux x86_64	`asr` + `libtorch/` (CUDA 12.8)
`asr-linux-aarch64.zip`	Linux ARM64	`asr` + `libtorch/` (CPU)
`asr-macos-aarch64.zip`	macOS Apple Silicon	`asr` + `mlx.metallib` (Metal GPU)

Quick Start

# Download and extract
curl -LO https://github.com/second-state/qwen3_asr_rs/releases/download/v0.1.6/asr-macos-aarch64.zip
unzip asr-macos-aarch64.zip

# Download model
pip install huggingface_hub transformers
huggingface-cli download Qwen/Qwen3-ASR-0.6B --local-dir Qwen3-ASR-0.6B
python -c "
from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained('Qwen3-ASR-0.6B', trust_remote_code=True)
tok.backend_tokenizer.save('Qwen3-ASR-0.6B/tokenizer.json')
"

# Transcribe
./asr-macos-aarch64/asr Qwen3-ASR-0.6B input.wav

Assets 5

01 Mar 06:29

juntao

v0.1.5

75e4795

v0.1.5 release

What's Changed

Fix where_self argument order in attention mask construction by @juntao in #6

Full Changelog: v0.1.4...v0.1.5

Contributors

juntao

Assets 6

22 Feb 06:20

juntao

v0.1.4

14dbd5f

v0.1.4

What's New

Apple MLX backend: Native Metal GPU acceleration on macOS Apple Silicon — no libtorch dependency needed
CUDA release binary: Linux x86_64 CUDA 12.8 build for NVIDIA GPU acceleration
1.25x–1.80x faster on macOS: MLX Metal GPU vs libtorch CPU on Apple M4
Fix: Ship mlx.metallib alongside macOS binary so Metal GPU kernels are found at runtime

Release Artifacts

File	Platform	Backend
`asr-linux-x86_64.zip`	Linux x86_64	libtorch (CPU)
`asr-linux-x86_64-cuda.zip`	Linux x86_64	libtorch (CUDA 12.8)
`asr-linux-aarch64.zip`	Linux ARM64	libtorch (CPU)
`asr-macos-aarch64.zip`	macOS Apple Silicon	MLX (Metal GPU)

Each zip extracts into a named directory containing the asr binary (and mlx.metallib for macOS).

Build from Source

# libtorch backend (default)
cargo build --release --features build-ffmpeg

# MLX backend (macOS Apple Silicon)
git submodule update --init --recursive
cargo build --release --no-default-features --features mlx,build-ffmpeg

Assets 6

22 Feb 04:04

juntao

v0.1.3

e64d0f1

v0.1.3

What's New

Apple MLX backend: Native Metal GPU acceleration on macOS Apple Silicon — no libtorch dependency needed
Dual backend architecture: Unified Tensor abstraction supporting both tch-backend (default, cross-platform) and mlx (macOS)
CUDA release binary: Linux x86_64 CUDA build for NVIDIA GPU acceleration
1.25x–1.80x faster on macOS: MLX Metal GPU vs libtorch CPU on Apple M4

Release Artifacts

File	Platform	Backend
`asr-linux-x86_64.zip`	Linux x86_64	libtorch (CPU)
`asr-linux-x86_64-cuda.zip`	Linux x86_64	libtorch (CUDA 12.8)
`asr-linux-aarch64.zip`	Linux ARM64	libtorch (CPU)
`asr-macos-aarch64.zip`	macOS Apple Silicon	MLX (Metal GPU)

Each zip extracts into a directory containing the asr binary.

Build from Source

# libtorch backend (default)
cargo build --release --features build-ffmpeg

# MLX backend (macOS Apple Silicon)
git submodule update --init --recursive
cargo build --release --no-default-features --features mlx,build-ffmpeg

Assets 6

22 Feb 02:04

juntao

v0.1.2

a52e59d

v0.1.2

Fix SIGILL crash on x86_64 by removing target/ from cargo cache (prevents cross-runner CPU feature mismatch with build-ffmpeg)
Cap libtorch CPU ISA to AVX2 in CI

Assets 5

21 Feb 20:53

juntao

v0.1.1

c04e141

v0.1.1

Fix outdated documentation about audio preprocessing pipeline
Package release artifacts as .zip files

Assets 5

Releases: second-state/qwen3_asr_rs

v0.2.0

What's New

OpenAI-Compatible API Server

Pure Rust Audio Decoding

MLX Performance Optimizations

Bug Fixes

Performance (Apple M4 Mac Mini, 16GB)

Uh oh!

v0.1.9

Uh oh!

v0.1.8

Uh oh!

v0.1.7

What's New

Release Artifacts

Quick Start

Uh oh!

v0.1.6

What's New

Release Artifacts

Quick Start

Uh oh!

v0.1.5 release

What's Changed

Contributors

Uh oh!

v0.1.4

What's New

Release Artifacts

Build from Source

Uh oh!

v0.1.3

What's New

Release Artifacts

Build from Source

Uh oh!

v0.1.2

Uh oh!

v0.1.1

Uh oh!