Dialect Voice Input — Local Speech-to-Text for macOS

Free, offline, open-source voice-to-text tool for Mac. Press a hotkey, speak, and the text appears at your cursor — in any app, in any of 30+ languages. No cloud, no API key, no subscription. Everything runs locally on your Mac.

Powered by Qwen3-ASR-0.6B. Originally built for Sichuan dialect (四川话), but works with Mandarin, English, Japanese, Korean, and 50+ languages & dialects total.

How It Works

Press Right Option key to start recording
Speak naturally in any supported language
Release the key — text appears at your cursor within a second

The app runs a pure C inference engine locally. Your audio never leaves your machine. No API keys, no internet connection needed (after initial model download).

Features

Fully local speech recognition — Qwen3-ASR-0.6B (~1.7GB) runs natively via a custom C inference engine with Apple Accelerate BLAS optimization
Instant text injection — transcribed text is typed directly at your cursor position in any app (via clipboard + simulated Cmd+V, with automatic clipboard restoration)
Two input modes:
- Hold mode (default): hold Right Option to talk, release to transcribe
- Toggle mode: tap Right Option to start, tap again to stop
Menu bar app — lives quietly in your status bar, always one hotkey away
Bilingual UI — switch between Chinese and English interface in Settings
Auto model download — downloads the model (~1.7GB) from HuggingFace on first launch
Smart audio handling — automatically mutes system audio during recording to avoid interference, restores when done
Auto-start — registers as a login item so it's always available
Remote access (optional) — Cloudflare Tunnel + Bonjour for accessing from other devices on the network

Supported Languages

30 languages: Chinese, English, Japanese, Korean, French, German, Spanish, Portuguese, Italian, Russian, Arabic, Thai, Vietnamese, Indonesian, Turkish, Hindi, Malay, Dutch, Swedish, Danish, Finnish, Polish, Czech, Filipino, Persian, Greek, Hungarian, Macedonian, Romanian, Cantonese

22 Chinese dialects: Sichuan, Dongbei, Anhui, Fujian, Gansu, Guizhou, Hebei, Henan, Hubei, Hunan, Jiangxi, Ningxia, Shandong, Shaanxi, Shanxi, Tianjin, Yunnan, Zhejiang, Cantonese (Hong Kong), Cantonese (Guangdong), Wu, Minnan

System Requirements

Requirement	Detail
macOS	14.0 (Sonoma) or later
Processor	Apple Silicon (M1+) or Intel (universal binary)
RAM	~3GB free (model loads into memory)
Disk	~2GB (model ~1.7GB + app)
Permissions	Microphone + Accessibility

Installation

Option 1: Download DMG (Recommended)

Download Dialect.Voice.Input.dmg from Releases
Open the DMG and drag the app to Applications
Important — first launch requires one extra step (the app is not signed with an Apple Developer certificate):
- Open Terminal and run:
```
xattr -cr "/Applications/Dialect Voice Input.app"
```
- Then double-click the app to open it normally
- You only need to do this once
Grant Microphone and Accessibility permissions when prompted
The model (~1.7GB) downloads automatically on first launch — wait for the menu bar icon to show a solid microphone

Option 2: Build from Source

Requires Xcode Command Line Tools (xcode-select --install).

git clone https://github.com/liusqu/LocalSpeechtoText_Dialectinput.git
cd LocalSpeechtoText_Dialectinput

# Build the C inference library (universal binary)
./build-lib.sh

# Build the Swift app
swift build

# Run
.build/debug/SichuanVoiceInput

To build a distributable DMG:

./make-dmg.sh
# Output: dist/Dialect.Voice.Input.dmg

Granting Permissions

The app needs two macOS permissions to function:

Microphone — to capture your speech
Accessibility — to type text at your cursor (simulates Cmd+V)

Go to System Settings → Privacy & Security to manage these. The app will prompt you on first use, but you may need to manually add it in Accessibility settings.

Note: If you move the app to a different location, macOS treats it as a new app and you'll need to re-grant Accessibility permission.

Usage

Basic

Launch the app — a microphone icon appears in the menu bar
Hold Right Option and speak
Release — text is typed at your cursor

Menu Bar

Click the menu bar icon to access:

Item	Description
Status indicator	Shows current state (Ready / Recording / Transcribing / Loading)
Recent transcriptions	Click any to copy to clipboard
Hold / Toggle mode	Switch between input modes
Reload Model	Re-initialize the ASR engine
Remote Tunnel	Enable Cloudflare Tunnel for remote access
Settings → Language	Switch UI between Chinese and English
Settings → Accessibility / Microphone	Quick links to system permission settings

Tips

System audio is muted during recording to prevent notification sounds from being captured. It's restored automatically when you stop.
Clipboard is preserved — the app temporarily uses the clipboard for text injection, but restores your previous clipboard content afterward.
The app auto-starts at login as a menu bar item.

Project Structure

├── Package.swift                 # Swift Package Manager configuration
├── build-lib.sh                  # Builds libqwen_asr.a (universal static library)
├── make-dmg.sh                   # Packages into distributable DMG
├── generate-icon.swift           # Generates app icon
├── Sources/
│   ├── App/
│   │   ├── SichuanVoiceInputApp.swift   # Entry point
│   │   └── AppDelegate.swift            # Menu bar UI, recording lifecycle
│   ├── Services/
│   │   ├── QwenASREngine.swift          # Qwen3-ASR C library Swift wrapper
│   │   ├── ModelManager.swift           # Model download & caching
│   │   ├── AudioCaptureService.swift    # 16kHz mono PCM recording
│   │   ├── HotkeyService.swift          # Right Option key detection (30ms polling)
│   │   ├── TextInjectionService.swift   # Clipboard + Cmd+V injection
│   │   ├── CloudflaredManager.swift     # Optional Cloudflare Tunnel
│   │   └── BonjourBroadcaster.swift     # mDNS service discovery
│   ├── Localization/
│   │   └── AppStrings.swift             # Bilingual string management
│   ├── Views/                           # SwiftUI views
│   ├── Design/                          # Design tokens
│   └── CQwenASR/                        # C bridge module
└── vendor/
    └── qwen-asr/                        # Pure C Qwen3-ASR inference engine
        ├── *.c, *.h                     # Source files
        ├── Makefile                     # Standalone build
        ├── LICENSE                      # Apache 2.0
        └── include/                     # Public headers

Technical Details

ASR Engine: Pure C implementation of Qwen3-ASR transformer, no Python/PyTorch dependency
Acceleration: Apple Accelerate framework (BLAS/LAPACK) for matrix operations, ARM NEON SIMD kernels
Audio: 16kHz mono PCM via AudioQueue API, 0.3–60s recording duration
Inference speed: ~8x realtime on Apple Silicon (e.g., 1.4s to transcribe 11s of audio on M3 Max)
Memory: ~3GB RSS when model is loaded
Model storage: ~/Library/Application Support/SichuanVoiceHost/qwen3-asr-0.6b/

License

MIT

The bundled vendor/qwen-asr/ inference engine is licensed under Apache 2.0. The Qwen3-ASR model weights (downloaded at runtime) are subject to the Qwen License.

Acknowledgments

Qwen3-ASR by Alibaba Qwen team — the speech recognition model
Built with Swift, C, and Apple Accelerate framework

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dialect Voice Input — Local Speech-to-Text for macOS

How It Works

Features

Supported Languages

System Requirements

Installation

Option 1: Download DMG (Recommended)

Option 2: Build from Source

Granting Permissions

Usage

Basic

Menu Bar

Tips

Project Structure

Technical Details

License

Acknowledgments

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Sources		Sources
vendor/qwen-asr		vendor/qwen-asr
.gitignore		.gitignore
LICENSE		LICENSE
Package.swift		Package.swift
README.md		README.md
build-lib.sh		build-lib.sh
generate-icon.swift		generate-icon.swift
make-dmg.sh		make-dmg.sh

Folders and files

Latest commit

History

Repository files navigation

Dialect Voice Input — Local Speech-to-Text for macOS

How It Works

Features

Supported Languages

System Requirements

Installation

Option 1: Download DMG (Recommended)

Option 2: Build from Source

Granting Permissions

Usage

Basic

Menu Bar

Tips

Project Structure

Technical Details

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages