Local-AI-voice

Voice dictation tool that converts speech to text using AI models running locally on your machine. It is designed to work fully offline: audio is processed on-device, without sending data to external services.

This tool uses OpenVINO GenAI Whisper and takes advantage of modern hardware acceleration such as NPUs or GPUs when available, while remaining usable on standard CPUs.

The focus is on privacy, autonomy, and predictable behaviour: your audio never leaves the machine, models are locally managed, and the software can be integrated into scripts, editors, or command line workflows.

Usage

Python Version

Use Python 3.11. The Makefile defaults to py -3.11, and the audio dependencies in this repo are expected to be installed against that interpreter.

Browser streaming requires:

websockets for uvicorn WebSocket support
av for decoding browser Opus chunks on the server

Install dependencies

py -3.11 -m venv .venv
.venv\Scripts\activate
make install

Run desktop web UI

Open the desktop app wrapper around the local web UI:

py -3.11 ./local-ai-voice.py

The desktop window runs the same local browser UI through pywebview and starts the local transcription server automatically.

Make target:

make run

Run the raw web server without the desktop wrapper:

py -3.11 ./local-ai-voice.py --server

Make target:

make run-server

Run local transcription

Force CLI mode for file or live microphone transcription:

Transcribe a WAV file:

py -3.11 ./local-ai-voice.py --cli input.wav --model ./whisper-tiny-fp16-ov

Run live microphone transcription:

py -3.11 ./local-ai-voice.py --cli --model ./whisper-tiny-fp16-ov --chunk-seconds 1.0

Noise reduction and WebRTC VAD speech gating are enabled by default. Disable them with:

py -3.11 ./local-ai-voice.py --cli --no-silence-detect --model ./whisper-tiny-fp16-ov input.wav

Run browser transcription

Open the desktop web UI:

make run

Equivalent direct command:

py -3.11 ./local-ai-voice.py --model ./whisper-tiny-fp16-ov

The browser UI captures microphone audio in the browser and streams Opus over a WebSocket connection to the local server. The server decodes Opus, runs noise reduction and VAD at the decoded sample rate, and only resamples to 16 kHz immediately before Whisper inference.

Browser defaults:

browser DSP is off by default:
- Echo cancellation
- Noise suppression
- Auto gain control
Voice enhance is on by default
VAD default is 3
overlap default is 0.00s

Notes:

Chromium commonly uses audio/webm;codecs=opus
Save WAV capture records the exact 16 kHz mono audio sent to Whisper
Client debug controls browser-side and server-side debug messages shown in the page

Build standalone executable

Build the unified executable:

make build

The resulting binary supports both modes:

.\dist\local-ai-voice.exe
.\dist\local-ai-voice.exe --cli input.wav --model .\whisper-tiny-fp16-ov
.\dist\local-ai-voice.exe --server --model .\whisper-tiny-fp16-ov

License

It is distributed under the Affero GNU General Public License v3

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.github/workflows		.github/workflows
frontend		frontend
hooks		hooks
local_ai		local_ai
tests		tests
web		web
.gitignore		.gitignore
AGENTS.md		AGENTS.md
ARCHITECTURE.md		ARCHITECTURE.md
GNUmakefile		GNUmakefile
README.md		README.md
browser_webrtc.py		browser_webrtc.py
local-ai-voice.py		local-ai-voice.py
local-ai-voice.spec		local-ai-voice.spec
local_ai_voice.py		local_ai_voice.py
network_guard.py		network_guard.py
pyspy_profile.py		pyspy_profile.py
pytest.ini		pytest.ini
voice_runtime.py		voice_runtime.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local-AI-voice

Usage

Python Version

Install dependencies

Run desktop web UI

Run local transcription

Run browser transcription

Build standalone executable

License

About

Uh oh!

Releases 6

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Local-AI-voice

Usage

Python Version

Install dependencies

Run desktop web UI

Run local transcription

Run browser transcription

Build standalone executable

License

About

Resources

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 6

Uh oh!

Contributors

Uh oh!

Languages