OST — On-Screen Translator

Real-time speech recognition and translation overlay for macOS.

Captures system audio, transcribes speech using Apple's Speech framework, and displays translated subtitles in a floating overlay window. Works with any audio source — YouTube, podcasts, Zoom/Teams meetings, and more.

Screenshots

More screenshots

Menu Bar	Settings — Display

Settings — Languages	Settings — Setup

Session History

Disclaimer

This project was created and maintained through AI-assisted development. The code, build scripts, documentation, and CI/CD configuration should be reviewed and tested carefully before production use.

Features

Real-time system audio capture via ScreenCaptureKit (16kHz mono PCM)
Speech-to-text using SFSpeechRecognizer (on-device or server-based)
Live translation via Apple Translation framework — translates text as it's being recognized, not just after finalization
Dual display modes:
- Combined — single overlay with both recognized and translated text
- Split — separate recognition and translation windows, independently positionable
Floating overlay — resizable, movable, always-on-top window with customizable appearance
Lock/Unlock — locked = click-through, unlocked = move/resize/scroll
Scrollable subtitle history with auto-scroll
Customizable appearance — separate font size/color for original and translated text, background color/opacity
Automatic language detection (English, Korean, Japanese, Chinese)
Smart text processing — sentence-based segmentation, pause detection, duplicate filtering, punctuation cleanup
Session history recording with export
Menu bar app — no Dock icon, minimal footprint

Requirements

macOS 15.0 (Sequoia) or later
Apple Silicon (arm64)

Installation

Option A: Download Pre-built Binary (Recommended)

Download OST.zip from the latest release
Unzip and move OST.app to your Applications folder

If macOS blocks the app on first run:

xattr -dr com.apple.quarantine /Applications/OST.app

Option B: Build from Source

Requires Xcode Command Line Tools:

xcode-select --install

See the Build section below for full instructions.

Setup Guide

Step 1: Grant Required Permissions

On first launch, macOS may prompt for the following permissions. If not prompted, enable them manually:

Permission	Purpose	How to Enable
Screen Recording	System audio capture via ScreenCaptureKit	System Settings > Privacy & Security > Screen & System Audio Recording > Enable OST
System Audio Recording	System audio capture permission on macOS 15+	System Settings > Privacy & Security > Screen & System Audio Recording > Enable OST
Speech Recognition	SFSpeechRecognizer access	System Settings > Privacy & Security > Speech Recognition > Enable OST

If you enable permissions manually in System Settings, restart OST for changes to take effect.

Step 2: Enable Siri & Dictation

Speech recognition (especially server-based) requires Siri & Dictation to be enabled:

Open System Settings > Siri & Spotlight
Turn on Siri (or "Listen for...")
If using on-device recognition only, Siri does not need to be active — but the speech model must be downloaded (see Step 3)

Step 3: Download On-Device Speech Model (Recommended)

For faster, offline, and more reliable recognition:

Open System Settings > General > Keyboard > Dictation
Under Languages, download the speech model for your source language (e.g., English, Korean, Japanese)
After download, confirm "On-device recognition" remains enabled in OST Settings > Languages tab

Without the on-device model, server-based recognition is used. This requires internet and may have higher latency.

Step 4: Download Translation Language Pack (Recommended)

For offline translation using Apple Translation framework:

Open System Settings > General > Language & Region > Translation Languages
Download the language pair you need (e.g., English ↔ Korean)

Without the translation pack, translation will not work offline.

Build

# Clone the repository
git clone https://github.com/9bow/OST.git
cd OST

# Full build → produces build/OST.app
./build.sh

# Type-check only (no binary)
./build.sh --typecheck

# Run project checks
./test.sh

# Clean build
./build.sh --clean

# Run
open build/OST.app

No Xcode project is required. The build script compiles all Swift sources via xcrun swiftc. ./test.sh uses system command-line tools only and runs documentation, workflow, regression, behavioral, and type-check gates. For release checks that require real macOS permissions, audio capture, Apple Translation language packs, or online fallback network behavior, use docs/manual-qa.md.

If macOS blocks the app on first run, execute:
xattr -dr com.apple.quarantine build/OST.app

Usage

Starting a Session

Click the captions bubble icon in the menu bar
Select source and target languages (or use "Auto" for automatic detection)
Click Start Capture to begin capturing system audio
The overlay window(s) will appear with live transcription and translation

Overlay Controls

Action	How
Lock/Unlock	Menu bar > Lock Overlay, or Settings > Display > Overlay Window
Move	Unlock, then drag the overlay window
Resize	Unlock, then drag the window edges
Scroll	Unlock, then scroll through subtitle history
Reset position	Settings > Display > "Reset All Overlay Windows"

Locked mode: The overlay is click-through — interact with windows behind it normally
Unlocked mode: Drag to move, resize edges, scroll through subtitle history. Auto-scrolls to the latest text

Display Modes

Configure in Settings > Display > Mode:

Combined: Single window showing both original and translated text
Split: Default mode with two separate windows — recognition (original text) and translation. Each window can be independently positioned and resized. Menu bar Lock/Unlock applies to both windows simultaneously; Settings can lock each window independently

Tips

Speech Pause: Adjust in Settings > Display > "Speech Pause" slider (default 3s). Shorter values finalize text faster; longer values wait for natural sentence endings
Subtitle Expiry: Old subtitles automatically fade after the configured time (default 20s)
Max Lines: Control how many subtitle entries are visible at once (default 3)
Session History: Enabled by default. View past transcription sessions via menu bar > Session History, export them for reference, or disable saving in Settings > Debug
On-device recognition: Enabled by default. If the selected language model is unavailable or you prefer server-based recognition, disable it in Settings > Languages
Online fallback translation: Disabled by default. Enable it in Settings > Languages only if you want OST to send text to Google Translate when Apple Translation is unavailable

Architecture

ScreenCaptureKit (16kHz mono) → SpeechRecognizer → AppState → TranslationService → Overlay Views
     SystemAudioCapture              SFSpeech          entries      Translation.framework     NSPanel

Source Layout

OST/Sources/
├── App/             AppState, OSTApp, WindowManager, Logger, SessionRecorder
├── Audio/           SystemAudioCapture (ScreenCaptureKit)
├── Speech/          SpeechRecognizer, SupportedLanguages
├── Translation/     TranslationService, TranslationConfig
├── Settings/        UserSettings
├── UI/              SubtitleView, RecognitionOverlayView, TranslationOverlayView,
│                    OverlayWindow, MenuBarView, SettingsView, FontSettingsView, etc.
└── Accessibility/   AccessibilityManager

Troubleshooting

Problem	Solution
No audio captured	Grant Screen Recording and System Audio Recording permissions. If you changed them in System Settings, restart OST
Speech recognition not working	Grant Speech Recognition permission; ensure Siri & Dictation is enabled
Translation not appearing	Download the translation language pack, or enable online fallback in Settings > Languages if sending text to Google Translate is acceptable
Overlay invisible but blocking clicks	Use Settings > Display > "Reset All Overlay Windows" to restore default position
macOS blocks the app	Run `xattr -dr com.apple.quarantine /Applications/OST.app` for an installed app, or `xattr -dr com.apple.quarantine build/OST.app` for a local build
On-device recognition produces no results	Download the speech model for your language in System Settings > Keyboard > Dictation

Known Issues

Endpoint detection (EPD) — Speech segmentation uses a pause timer combined with sentence boundary detection, not proper endpoint detection. Subtitle boundaries may sometimes split mid-sentence or merge unrelated phrases.
Automatic language detection — Auto-detect uses NLLanguageRecognizer on the first ~15 characters, which may misidentify the language from short or ambiguous input. Detection only runs once per session.
Translation consistency — Translation is triggered per speech segment. Short or fragmented segments may produce less coherent translations.
Speech recognition restart gap — SFSpeechRecognizer's recognition task expires after ~60 seconds and auto-restarts. Overlap detection minimizes duplicate text, but a brief gap in recognition may still occur.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github		.github
OST		OST
assets		assets
docs		docs
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.ja.md		README.ja.md
README.ko.md		README.ko.md
README.md		README.md
README.zh.md		README.zh.md
build.sh		build.sh
project.yml		project.yml
test.sh		test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OST — On-Screen Translator

Screenshots

Disclaimer

Features

Requirements

Installation

Option A: Download Pre-built Binary (Recommended)

Option B: Build from Source

Setup Guide

Step 1: Grant Required Permissions

Step 2: Enable Siri & Dictation

Step 3: Download On-Device Speech Model (Recommended)

Step 4: Download Translation Language Pack (Recommended)

Build

Usage

Starting a Session

Overlay Controls

Display Modes

Tips

Architecture

Source Layout

Troubleshooting

Known Issues

License

About

Uh oh!

Releases 4

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OST — On-Screen Translator

Screenshots

Disclaimer

Features

Requirements

Installation

Option A: Download Pre-built Binary (Recommended)

Option B: Build from Source

Setup Guide

Step 1: Grant Required Permissions

Step 2: Enable Siri & Dictation

Step 3: Download On-Device Speech Model (Recommended)

Step 4: Download Translation Language Pack (Recommended)

Build

Usage

Starting a Session

Overlay Controls

Display Modes

Tips

Architecture

Source Layout

Troubleshooting

Known Issues

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Uh oh!

Contributors

Uh oh!

Languages