Free, offline, open-source voice-to-text tool for Mac. Press a hotkey, speak, and the text appears at your cursor — in any app, in any of 30+ languages. No cloud, no API key, no subscription. Everything runs locally on your Mac.
Powered by Qwen3-ASR-0.6B. Originally built for Sichuan dialect (四川话), but works with Mandarin, English, Japanese, Korean, and 50+ languages & dialects total.
- Press Right Option key to start recording
- Speak naturally in any supported language
- Release the key — text appears at your cursor within a second
The app runs a pure C inference engine locally. Your audio never leaves your machine. No API keys, no internet connection needed (after initial model download).
- Fully local speech recognition — Qwen3-ASR-0.6B (~1.7GB) runs natively via a custom C inference engine with Apple Accelerate BLAS optimization
- Instant text injection — transcribed text is typed directly at your cursor position in any app (via clipboard + simulated Cmd+V, with automatic clipboard restoration)
- Two input modes:
- Hold mode (default): hold Right Option to talk, release to transcribe
- Toggle mode: tap Right Option to start, tap again to stop
- Menu bar app — lives quietly in your status bar, always one hotkey away
- Bilingual UI — switch between Chinese and English interface in Settings
- Auto model download — downloads the model (~1.7GB) from HuggingFace on first launch
- Smart audio handling — automatically mutes system audio during recording to avoid interference, restores when done
- Auto-start — registers as a login item so it's always available
- Remote access (optional) — Cloudflare Tunnel + Bonjour for accessing from other devices on the network
30 languages: Chinese, English, Japanese, Korean, French, German, Spanish, Portuguese, Italian, Russian, Arabic, Thai, Vietnamese, Indonesian, Turkish, Hindi, Malay, Dutch, Swedish, Danish, Finnish, Polish, Czech, Filipino, Persian, Greek, Hungarian, Macedonian, Romanian, Cantonese
22 Chinese dialects: Sichuan, Dongbei, Anhui, Fujian, Gansu, Guizhou, Hebei, Henan, Hubei, Hunan, Jiangxi, Ningxia, Shandong, Shaanxi, Shanxi, Tianjin, Yunnan, Zhejiang, Cantonese (Hong Kong), Cantonese (Guangdong), Wu, Minnan
| Requirement | Detail |
|---|---|
| macOS | 14.0 (Sonoma) or later |
| Processor | Apple Silicon (M1+) or Intel (universal binary) |
| RAM | ~3GB free (model loads into memory) |
| Disk | ~2GB (model ~1.7GB + app) |
| Permissions | Microphone + Accessibility |
- Download
Dialect.Voice.Input.dmgfrom Releases - Open the DMG and drag the app to Applications
- Important — first launch requires one extra step (the app is not signed with an Apple Developer certificate):
- Open Terminal and run:
xattr -cr "/Applications/Dialect Voice Input.app" - Then double-click the app to open it normally
- You only need to do this once
- Open Terminal and run:
- Grant Microphone and Accessibility permissions when prompted
- The model (~1.7GB) downloads automatically on first launch — wait for the menu bar icon to show a solid microphone
Requires Xcode Command Line Tools (xcode-select --install).
git clone https://github.com/liusqu/LocalSpeechtoText_Dialectinput.git
cd LocalSpeechtoText_Dialectinput
# Build the C inference library (universal binary)
./build-lib.sh
# Build the Swift app
swift build
# Run
.build/debug/SichuanVoiceInputTo build a distributable DMG:
./make-dmg.sh
# Output: dist/Dialect.Voice.Input.dmgThe app needs two macOS permissions to function:
- Microphone — to capture your speech
- Accessibility — to type text at your cursor (simulates Cmd+V)
Go to System Settings → Privacy & Security to manage these. The app will prompt you on first use, but you may need to manually add it in Accessibility settings.
Note: If you move the app to a different location, macOS treats it as a new app and you'll need to re-grant Accessibility permission.
- Launch the app — a microphone icon appears in the menu bar
- Hold Right Option and speak
- Release — text is typed at your cursor
Click the menu bar icon to access:
| Item | Description |
|---|---|
| Status indicator | Shows current state (Ready / Recording / Transcribing / Loading) |
| Recent transcriptions | Click any to copy to clipboard |
| Hold / Toggle mode | Switch between input modes |
| Reload Model | Re-initialize the ASR engine |
| Remote Tunnel | Enable Cloudflare Tunnel for remote access |
| Settings → Language | Switch UI between Chinese and English |
| Settings → Accessibility / Microphone | Quick links to system permission settings |
- System audio is muted during recording to prevent notification sounds from being captured. It's restored automatically when you stop.
- Clipboard is preserved — the app temporarily uses the clipboard for text injection, but restores your previous clipboard content afterward.
- The app auto-starts at login as a menu bar item.
├── Package.swift # Swift Package Manager configuration
├── build-lib.sh # Builds libqwen_asr.a (universal static library)
├── make-dmg.sh # Packages into distributable DMG
├── generate-icon.swift # Generates app icon
├── Sources/
│ ├── App/
│ │ ├── SichuanVoiceInputApp.swift # Entry point
│ │ └── AppDelegate.swift # Menu bar UI, recording lifecycle
│ ├── Services/
│ │ ├── QwenASREngine.swift # Qwen3-ASR C library Swift wrapper
│ │ ├── ModelManager.swift # Model download & caching
│ │ ├── AudioCaptureService.swift # 16kHz mono PCM recording
│ │ ├── HotkeyService.swift # Right Option key detection (30ms polling)
│ │ ├── TextInjectionService.swift # Clipboard + Cmd+V injection
│ │ ├── CloudflaredManager.swift # Optional Cloudflare Tunnel
│ │ └── BonjourBroadcaster.swift # mDNS service discovery
│ ├── Localization/
│ │ └── AppStrings.swift # Bilingual string management
│ ├── Views/ # SwiftUI views
│ ├── Design/ # Design tokens
│ └── CQwenASR/ # C bridge module
└── vendor/
└── qwen-asr/ # Pure C Qwen3-ASR inference engine
├── *.c, *.h # Source files
├── Makefile # Standalone build
├── LICENSE # Apache 2.0
└── include/ # Public headers
- ASR Engine: Pure C implementation of Qwen3-ASR transformer, no Python/PyTorch dependency
- Acceleration: Apple Accelerate framework (BLAS/LAPACK) for matrix operations, ARM NEON SIMD kernels
- Audio: 16kHz mono PCM via AudioQueue API, 0.3–60s recording duration
- Inference speed: ~8x realtime on Apple Silicon (e.g., 1.4s to transcribe 11s of audio on M3 Max)
- Memory: ~3GB RSS when model is loaded
- Model storage:
~/Library/Application Support/SichuanVoiceHost/qwen3-asr-0.6b/
MIT
The bundled vendor/qwen-asr/ inference engine is licensed under Apache 2.0.
The Qwen3-ASR model weights (downloaded at runtime) are subject to the Qwen License.
- Qwen3-ASR by Alibaba Qwen team — the speech recognition model
- Built with Swift, C, and Apple Accelerate framework