Skip to content

liusqu/LocalSpeechtoText_Dialectinput

Repository files navigation

Dialect Voice Input — Local Speech-to-Text for macOS

Free, offline, open-source voice-to-text tool for Mac. Press a hotkey, speak, and the text appears at your cursor — in any app, in any of 30+ languages. No cloud, no API key, no subscription. Everything runs locally on your Mac.

Powered by Qwen3-ASR-0.6B. Originally built for Sichuan dialect (四川话), but works with Mandarin, English, Japanese, Korean, and 50+ languages & dialects total.

How It Works

  1. Press Right Option key to start recording
  2. Speak naturally in any supported language
  3. Release the key — text appears at your cursor within a second

The app runs a pure C inference engine locally. Your audio never leaves your machine. No API keys, no internet connection needed (after initial model download).

Features

  • Fully local speech recognition — Qwen3-ASR-0.6B (~1.7GB) runs natively via a custom C inference engine with Apple Accelerate BLAS optimization
  • Instant text injection — transcribed text is typed directly at your cursor position in any app (via clipboard + simulated Cmd+V, with automatic clipboard restoration)
  • Two input modes:
    • Hold mode (default): hold Right Option to talk, release to transcribe
    • Toggle mode: tap Right Option to start, tap again to stop
  • Menu bar app — lives quietly in your status bar, always one hotkey away
  • Bilingual UI — switch between Chinese and English interface in Settings
  • Auto model download — downloads the model (~1.7GB) from HuggingFace on first launch
  • Smart audio handling — automatically mutes system audio during recording to avoid interference, restores when done
  • Auto-start — registers as a login item so it's always available
  • Remote access (optional) — Cloudflare Tunnel + Bonjour for accessing from other devices on the network

Supported Languages

30 languages: Chinese, English, Japanese, Korean, French, German, Spanish, Portuguese, Italian, Russian, Arabic, Thai, Vietnamese, Indonesian, Turkish, Hindi, Malay, Dutch, Swedish, Danish, Finnish, Polish, Czech, Filipino, Persian, Greek, Hungarian, Macedonian, Romanian, Cantonese

22 Chinese dialects: Sichuan, Dongbei, Anhui, Fujian, Gansu, Guizhou, Hebei, Henan, Hubei, Hunan, Jiangxi, Ningxia, Shandong, Shaanxi, Shanxi, Tianjin, Yunnan, Zhejiang, Cantonese (Hong Kong), Cantonese (Guangdong), Wu, Minnan

System Requirements

Requirement Detail
macOS 14.0 (Sonoma) or later
Processor Apple Silicon (M1+) or Intel (universal binary)
RAM ~3GB free (model loads into memory)
Disk ~2GB (model ~1.7GB + app)
Permissions Microphone + Accessibility

Installation

Option 1: Download DMG (Recommended)

  1. Download Dialect.Voice.Input.dmg from Releases
  2. Open the DMG and drag the app to Applications
  3. Important — first launch requires one extra step (the app is not signed with an Apple Developer certificate):
    • Open Terminal and run:
      xattr -cr "/Applications/Dialect Voice Input.app"
    • Then double-click the app to open it normally
    • You only need to do this once
  4. Grant Microphone and Accessibility permissions when prompted
  5. The model (~1.7GB) downloads automatically on first launch — wait for the menu bar icon to show a solid microphone

Option 2: Build from Source

Requires Xcode Command Line Tools (xcode-select --install).

git clone https://github.com/liusqu/LocalSpeechtoText_Dialectinput.git
cd LocalSpeechtoText_Dialectinput

# Build the C inference library (universal binary)
./build-lib.sh

# Build the Swift app
swift build

# Run
.build/debug/SichuanVoiceInput

To build a distributable DMG:

./make-dmg.sh
# Output: dist/Dialect.Voice.Input.dmg

Granting Permissions

The app needs two macOS permissions to function:

  • Microphone — to capture your speech
  • Accessibility — to type text at your cursor (simulates Cmd+V)

Go to System Settings → Privacy & Security to manage these. The app will prompt you on first use, but you may need to manually add it in Accessibility settings.

Note: If you move the app to a different location, macOS treats it as a new app and you'll need to re-grant Accessibility permission.

Usage

Basic

  1. Launch the app — a microphone icon appears in the menu bar
  2. Hold Right Option and speak
  3. Release — text is typed at your cursor

Menu Bar

Click the menu bar icon to access:

Item Description
Status indicator Shows current state (Ready / Recording / Transcribing / Loading)
Recent transcriptions Click any to copy to clipboard
Hold / Toggle mode Switch between input modes
Reload Model Re-initialize the ASR engine
Remote Tunnel Enable Cloudflare Tunnel for remote access
Settings → Language Switch UI between Chinese and English
Settings → Accessibility / Microphone Quick links to system permission settings

Tips

  • System audio is muted during recording to prevent notification sounds from being captured. It's restored automatically when you stop.
  • Clipboard is preserved — the app temporarily uses the clipboard for text injection, but restores your previous clipboard content afterward.
  • The app auto-starts at login as a menu bar item.

Project Structure

├── Package.swift                 # Swift Package Manager configuration
├── build-lib.sh                  # Builds libqwen_asr.a (universal static library)
├── make-dmg.sh                   # Packages into distributable DMG
├── generate-icon.swift           # Generates app icon
├── Sources/
│   ├── App/
│   │   ├── SichuanVoiceInputApp.swift   # Entry point
│   │   └── AppDelegate.swift            # Menu bar UI, recording lifecycle
│   ├── Services/
│   │   ├── QwenASREngine.swift          # Qwen3-ASR C library Swift wrapper
│   │   ├── ModelManager.swift           # Model download & caching
│   │   ├── AudioCaptureService.swift    # 16kHz mono PCM recording
│   │   ├── HotkeyService.swift          # Right Option key detection (30ms polling)
│   │   ├── TextInjectionService.swift   # Clipboard + Cmd+V injection
│   │   ├── CloudflaredManager.swift     # Optional Cloudflare Tunnel
│   │   └── BonjourBroadcaster.swift     # mDNS service discovery
│   ├── Localization/
│   │   └── AppStrings.swift             # Bilingual string management
│   ├── Views/                           # SwiftUI views
│   ├── Design/                          # Design tokens
│   └── CQwenASR/                        # C bridge module
└── vendor/
    └── qwen-asr/                        # Pure C Qwen3-ASR inference engine
        ├── *.c, *.h                     # Source files
        ├── Makefile                     # Standalone build
        ├── LICENSE                      # Apache 2.0
        └── include/                     # Public headers

Technical Details

  • ASR Engine: Pure C implementation of Qwen3-ASR transformer, no Python/PyTorch dependency
  • Acceleration: Apple Accelerate framework (BLAS/LAPACK) for matrix operations, ARM NEON SIMD kernels
  • Audio: 16kHz mono PCM via AudioQueue API, 0.3–60s recording duration
  • Inference speed: ~8x realtime on Apple Silicon (e.g., 1.4s to transcribe 11s of audio on M3 Max)
  • Memory: ~3GB RSS when model is loaded
  • Model storage: ~/Library/Application Support/SichuanVoiceHost/qwen3-asr-0.6b/

License

MIT

The bundled vendor/qwen-asr/ inference engine is licensed under Apache 2.0. The Qwen3-ASR model weights (downloaded at runtime) are subject to the Qwen License.

Acknowledgments

  • Qwen3-ASR by Alibaba Qwen team — the speech recognition model
  • Built with Swift, C, and Apple Accelerate framework

About

Free offline speech-to-text for macOS. Press a hotkey, speak, text appears at your cursor. 30+ languages, local Qwen3-ASR model, no cloud. 本地语音转文字工具,支持四川话等50+语言方言。

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors