Skip to content

9bow/OST

Repository files navigation

🇬🇧 English | 🇰🇷 한국어 | 🇨🇳 中文 | 🇯🇵 日本語

OST — On-Screen Translator

Real-time speech recognition and translation overlay for macOS.

Captures system audio, transcribes speech using Apple's Speech framework, and displays translated subtitles in a floating overlay window. Works with any audio source — YouTube, podcasts, Zoom/Teams meetings, and more.

Screenshots

Translation overlay on YouTube video

Split mode — separate recognition and translation windows

More screenshots
Menu Bar Settings — Display
Menu bar Display settings
Settings — Languages Settings — Setup
Language settings Setup prerequisites
Session History
Session history

Disclaimer

This project was created and maintained through AI-assisted development. The code, build scripts, documentation, and CI/CD configuration should be reviewed and tested carefully before production use.

Features

  • Real-time system audio capture via ScreenCaptureKit (16kHz mono PCM)
  • Speech-to-text using SFSpeechRecognizer (on-device or server-based)
  • Live translation via Apple Translation framework — translates text as it's being recognized, not just after finalization
  • Dual display modes:
    • Combined — single overlay with both recognized and translated text
    • Split — separate recognition and translation windows, independently positionable
  • Floating overlay — resizable, movable, always-on-top window with customizable appearance
  • Lock/Unlock — locked = click-through, unlocked = move/resize/scroll
  • Scrollable subtitle history with auto-scroll
  • Customizable appearance — separate font size/color for original and translated text, background color/opacity
  • Automatic language detection (English, Korean, Japanese, Chinese)
  • Smart text processing — sentence-based segmentation, pause detection, duplicate filtering, punctuation cleanup
  • Session history recording with export
  • Menu bar app — no Dock icon, minimal footprint

Requirements

  • macOS 15.0 (Sequoia) or later
  • Apple Silicon (arm64)

Installation

Option A: Download Pre-built Binary (Recommended)

  1. Download OST.zip from the latest release
  2. Unzip and move OST.app to your Applications folder
  3. If macOS blocks the app on first run:
    xattr -dr com.apple.quarantine /Applications/OST.app

Option B: Build from Source

Requires Xcode Command Line Tools:

xcode-select --install

See the Build section below for full instructions.

Setup Guide

Step 1: Grant Required Permissions

On first launch, macOS may prompt for the following permissions. If not prompted, enable them manually:

Permission Purpose How to Enable
Screen Recording System audio capture via ScreenCaptureKit System Settings > Privacy & Security > Screen & System Audio Recording > Enable OST
System Audio Recording System audio capture permission on macOS 15+ System Settings > Privacy & Security > Screen & System Audio Recording > Enable OST
Speech Recognition SFSpeechRecognizer access System Settings > Privacy & Security > Speech Recognition > Enable OST

If you enable permissions manually in System Settings, restart OST for changes to take effect.

Step 2: Enable Siri & Dictation

Speech recognition (especially server-based) requires Siri & Dictation to be enabled:

  1. Open System Settings > Siri & Spotlight
  2. Turn on Siri (or "Listen for...")
  3. If using on-device recognition only, Siri does not need to be active — but the speech model must be downloaded (see Step 3)

Step 3: Download On-Device Speech Model (Recommended)

For faster, offline, and more reliable recognition:

  1. Open System Settings > General > Keyboard > Dictation
  2. Under Languages, download the speech model for your source language (e.g., English, Korean, Japanese)
  3. After download, confirm "On-device recognition" remains enabled in OST Settings > Languages tab

Without the on-device model, server-based recognition is used. This requires internet and may have higher latency.

Step 4: Download Translation Language Pack (Recommended)

For offline translation using Apple Translation framework:

  1. Open System Settings > General > Language & Region > Translation Languages
  2. Download the language pair you need (e.g., English ↔ Korean)

Without the translation pack, translation will not work offline.

Build

# Clone the repository
git clone https://github.com/9bow/OST.git
cd OST

# Full build → produces build/OST.app
./build.sh

# Type-check only (no binary)
./build.sh --typecheck

# Run project checks
./test.sh

# Clean build
./build.sh --clean

# Run
open build/OST.app

No Xcode project is required. The build script compiles all Swift sources via xcrun swiftc. ./test.sh uses system command-line tools only and runs documentation, workflow, regression, behavioral, and type-check gates. For release checks that require real macOS permissions, audio capture, Apple Translation language packs, or online fallback network behavior, use docs/manual-qa.md.

If macOS blocks the app on first run, execute:

xattr -dr com.apple.quarantine build/OST.app

Usage

Starting a Session

  1. Click the captions bubble icon in the menu bar
  2. Select source and target languages (or use "Auto" for automatic detection)
  3. Click Start Capture to begin capturing system audio
  4. The overlay window(s) will appear with live transcription and translation

Overlay Controls

Action How
Lock/Unlock Menu bar > Lock Overlay, or Settings > Display > Overlay Window
Move Unlock, then drag the overlay window
Resize Unlock, then drag the window edges
Scroll Unlock, then scroll through subtitle history
Reset position Settings > Display > "Reset All Overlay Windows"
  • Locked mode: The overlay is click-through — interact with windows behind it normally
  • Unlocked mode: Drag to move, resize edges, scroll through subtitle history. Auto-scrolls to the latest text

Display Modes

Configure in Settings > Display > Mode:

  • Combined: Single window showing both original and translated text
  • Split: Default mode with two separate windows — recognition (original text) and translation. Each window can be independently positioned and resized. Menu bar Lock/Unlock applies to both windows simultaneously; Settings can lock each window independently

Tips

  • Speech Pause: Adjust in Settings > Display > "Speech Pause" slider (default 3s). Shorter values finalize text faster; longer values wait for natural sentence endings
  • Subtitle Expiry: Old subtitles automatically fade after the configured time (default 20s)
  • Max Lines: Control how many subtitle entries are visible at once (default 3)
  • Session History: Enabled by default. View past transcription sessions via menu bar > Session History, export them for reference, or disable saving in Settings > Debug
  • On-device recognition: Enabled by default. If the selected language model is unavailable or you prefer server-based recognition, disable it in Settings > Languages
  • Online fallback translation: Disabled by default. Enable it in Settings > Languages only if you want OST to send text to Google Translate when Apple Translation is unavailable

Architecture

ScreenCaptureKit (16kHz mono) → SpeechRecognizer → AppState → TranslationService → Overlay Views
     SystemAudioCapture              SFSpeech          entries      Translation.framework     NSPanel

Source Layout

OST/Sources/
├── App/             AppState, OSTApp, WindowManager, Logger, SessionRecorder
├── Audio/           SystemAudioCapture (ScreenCaptureKit)
├── Speech/          SpeechRecognizer, SupportedLanguages
├── Translation/     TranslationService, TranslationConfig
├── Settings/        UserSettings
├── UI/              SubtitleView, RecognitionOverlayView, TranslationOverlayView,
│                    OverlayWindow, MenuBarView, SettingsView, FontSettingsView, etc.
└── Accessibility/   AccessibilityManager

Troubleshooting

Problem Solution
No audio captured Grant Screen Recording and System Audio Recording permissions. If you changed them in System Settings, restart OST
Speech recognition not working Grant Speech Recognition permission; ensure Siri & Dictation is enabled
Translation not appearing Download the translation language pack, or enable online fallback in Settings > Languages if sending text to Google Translate is acceptable
Overlay invisible but blocking clicks Use Settings > Display > "Reset All Overlay Windows" to restore default position
macOS blocks the app Run xattr -dr com.apple.quarantine /Applications/OST.app for an installed app, or xattr -dr com.apple.quarantine build/OST.app for a local build
On-device recognition produces no results Download the speech model for your language in System Settings > Keyboard > Dictation

Known Issues

  • Endpoint detection (EPD) — Speech segmentation uses a pause timer combined with sentence boundary detection, not proper endpoint detection. Subtitle boundaries may sometimes split mid-sentence or merge unrelated phrases.
  • Automatic language detection — Auto-detect uses NLLanguageRecognizer on the first ~15 characters, which may misidentify the language from short or ambiguous input. Detection only runs once per session.
  • Translation consistency — Translation is triggered per speech segment. Short or fragmented segments may produce less coherent translations.
  • Speech recognition restart gap — SFSpeechRecognizer's recognition task expires after ~60 seconds and auto-restarts. Overlap detection minimizes duplicate text, but a brief gap in recognition may still occur.

License

MIT

About

[WIP] Real-time speech recognition and translation overlay for macOS

Topics

Resources

License

Stars

Watchers

Forks

Contributors