Skip to content

feat: Add OpenAI API compatible STT provider#5268

Open
angelplusultra wants to merge 6 commits intomasterfrom
stt-provider-expansion-openai-api-compatible
Open

feat: Add OpenAI API compatible STT provider#5268
angelplusultra wants to merge 6 commits intomasterfrom
stt-provider-expansion-openai-api-compatible

Conversation

@angelplusultra
Copy link
Copy Markdown
Contributor

@angelplusultra angelplusultra commented Mar 25, 2026

Pull Request Type

  • ✨ feat (New feature)
  • 🐛 fix (Bug fix)
  • ♻️ refactor (Code refactoring without changing behavior)
  • 💄 style (UI style changes)
  • 🔨 chore (Build, CI, maintenance)
  • 📝 docs (Documentation updates)

Relevant Issues

resolves #3812

Description

Adds a "Generic OpenAI Compatible" STT provider option, allowing users to use any OpenAI-compatible speech-to-text service (OpenAI Whisper, Groq, Deepgram, self-hosted faster-whisper, etc.) for voice-to-text transcription in the chat prompt input.

What changed:

  • New STT provider selection: Users can now choose between "System native" (browser Web Speech API) and "OpenAI Compatible" in Settings > Audio Preference > Speech-to-text
  • Server-backed transcription: When using the OpenAI Compatible provider, audio is recorded via the browser's MediaRecorder API, sent to a new POST /system/stt endpoint, which proxies the audio to the configured OpenAI-compatible transcription API (/audio/transcriptions)
  • Configurable settings: Base URL, API Key, and Model are configurable via the settings UI
  • Silence detection: Uses Web Audio API's AnalyserNode to detect silence and auto-stop recording after 3.2s (matching native provider behavior)
  • Auto-submit support: Works with the existing "Auto Submit Speech Input" setting
  • CTRL+M shortcut: Works with both providers
  • Decoupled ChatContainer from react-speech-recognition: ChatContainer now uses a custom STOP_STT_EVENT to signal STT stop, making it provider-agnostic
  • Loading spinner on mic button: When using the server-backed provider, clicking the mic shows a spinner while the browser acquires the microphone via getUserMedia. Unlike the native Web Speech API (which manages the mic internally and returns synchronously), the server provider must explicitly request the raw audio stream from the OS — this hardware initialization takes 1-3 seconds depending on the device and browser. The spinner provides visual feedback during this unavoidable delay so users know the app is responding.

Visuals (if applicable)

image

Additional Information

  • The native browser STT provider is completely unchanged — this is purely additive
  • Tested with OpenAI (whisper-1) and Groq (whisper-large-v3) successfully
  • Audio is recorded as audio/webm (browser default) and sent through multer memory storage — no files written to disk

Developer Validations

  • I ran yarn lint from the root of the repo & committed changes
  • Relevant documentation has been updated (if applicable)
  • I have tested my code functionality
  • Docker build succeeds locally

@angelplusultra angelplusultra marked this pull request as draft March 25, 2026 21:58
@angelplusultra angelplusultra changed the title feat: Add Generic OpenAI compatible STT provider feat: Add OpenAI API compatible STT provider Mar 25, 2026
@angelplusultra angelplusultra marked this pull request as ready for review March 25, 2026 22:11
@angelplusultra angelplusultra added the PR:needs review Needs review by core team label Mar 25, 2026
Copy link
Copy Markdown
Collaborator

@shatfield4 shatfield4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown

@codeCraft-Ritik codeCraft-Ritik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on this PR. The code is well-structured and easy to follow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PR:needs review Needs review by core team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEAT]: Extend STT provider selection

4 participants