Voice dictation tool that converts speech to text using AI models running locally on your machine. It is designed to work fully offline: audio is processed on-device, without sending data to external services.
This tool uses OpenVINO GenAI Whisper and takes advantage of modern hardware acceleration such as NPUs or GPUs when available, while remaining usable on standard CPUs.
The focus is on privacy, autonomy, and predictable behaviour: your audio never leaves the machine, models are locally managed, and the software can be integrated into scripts, editors, or command line workflows.
Use Python 3.11. The Makefile defaults to py -3.11, and the audio
dependencies in this repo are expected to be installed against that
interpreter.
Browser streaming requires:
websocketsforuvicornWebSocket supportavfor decoding browser Opus chunks on the server
py -3.11 -m venv .venv
.venv\Scripts\activate
make installOpen the desktop app wrapper around the local web UI:
py -3.11 ./local-ai-voice.pyThe desktop window runs the same local browser UI through pywebview and starts
the local transcription server automatically.
Make target:
make runRun the raw web server without the desktop wrapper:
py -3.11 ./local-ai-voice.py --serverMake target:
make run-serverForce CLI mode for file or live microphone transcription:
Transcribe a WAV file:
py -3.11 ./local-ai-voice.py --cli input.wav --model ./whisper-tiny-fp16-ovRun live microphone transcription:
py -3.11 ./local-ai-voice.py --cli --model ./whisper-tiny-fp16-ov --chunk-seconds 1.0Noise reduction and WebRTC VAD speech gating are enabled by default. Disable them with:
py -3.11 ./local-ai-voice.py --cli --no-silence-detect --model ./whisper-tiny-fp16-ov input.wavOpen the desktop web UI:
make runEquivalent direct command:
py -3.11 ./local-ai-voice.py --model ./whisper-tiny-fp16-ovThe browser UI captures microphone audio in the browser and streams Opus over a
WebSocket connection to the local server. The server decodes Opus, runs noise
reduction and VAD at the decoded sample rate, and only resamples to 16 kHz
immediately before Whisper inference.
Browser defaults:
- browser DSP is off by default:
Echo cancellationNoise suppressionAuto gain control
Voice enhanceis on by defaultVADdefault is3- overlap default is
0.00s
Notes:
- Chromium commonly uses
audio/webm;codecs=opus Save WAV capturerecords the exact16 kHzmono audio sent to WhisperClient debugcontrols browser-side and server-side debug messages shown in the page
Build the unified executable:
make buildThe resulting binary supports both modes:
.\dist\local-ai-voice.exe
.\dist\local-ai-voice.exe --cli input.wav --model .\whisper-tiny-fp16-ov
.\dist\local-ai-voice.exe --server --model .\whisper-tiny-fp16-ovLocal-AI-voice is Copyright (C) 2026 by the Dyne.org Foundation
It is distributed under the Affero GNU General Public License v3