Skip to content

feat: add word highlighting, manual glossary terms, and speech audio uploads#315

Merged
elasticsounds merged 12 commits intomainfrom
feature/word-highlight-glossary-audio-upload
Apr 29, 2026
Merged

feat: add word highlighting, manual glossary terms, and speech audio uploads#315
elasticsounds merged 12 commits intomainfrom
feature/word-highlight-glossary-audio-upload

Conversation

@ravi-adt
Copy link
Copy Markdown
Contributor

Summary

  • add word-level read-aloud highlighting with timestamp support, plus toggles in Speech and Accessibility settings
  • add manual glossary term creation so users can add entries not generated by AI
  • add per-row speech audio upload/replace and allow uploaded audio to use AI-generated timestamps
  • align preview/export/runtime behavior with the new speech highlight and timestamp flow

What Changed

Speech

  • added word-by-word read-aloud highlighting instead of only sentence/block highlighting
  • added a Speech-stage Word Highlight toggle for book creators
  • added per-text manual audio upload/replace in the Speech list
  • made uploaded manual audio compatible with AI timestamp generation

Accessibility

  • added a reader-facing Word highlight toggle in Accessibility Settings
  • persisted the setting in runtime state and wired it into playback behavior

Glossary

  • added a manual Add Term flow in the Glossary stage
  • persisted manual glossary items so they survive reruns and flow into downstream catalog/package output

Preview and Packaging

  • updated preview/export/runtime behavior so highlight mode and timestamps stay consistent across environments
  • fixed punctuation loss, fallback timestamp handling, and old blue-box flicker during rollout

Testing

  • pnpm exec vitest run apps/api/src/routes/adt-preview.test.ts apps/api/src/routes/glossary.test.ts apps/api/src/routes/tts.test.ts apps/api/src/services/stage-runner.test.ts packages/pipeline/src/__tests__/glossary.test.ts packages/pipeline/src/__tests__/text-catalog.test.ts packages/pipeline/src/__tests__/package-web.test.ts assets/adt/modules/tts_highlighter_utils.test.js
  • pnpm exec tsc --noEmit -p apps/api/tsconfig.json
  • pnpm exec tsc --noEmit -p apps/studio/tsconfig.json

Notes

  • manual uploaded audio now works with the existing timestamp flow
  • a full Speech rerun still regenerates stage audio and can overwrite manually uploaded files; that behavior was not changed in this PR

@ravi-adt ravi-adt requested a review from elasticsounds April 22, 2026 13:21
ravi-adt and others added 3 commits April 22, 2026 19:07
* feat(glossary): autofill manual glossary entries from book text

Adds an autocomplete-powered Add Glossary dialog that suggests words
from the book and auto-generates definition, variations, and emoji via
a new single-term LLM route.

* fix(tts): scope audio element discovery to #content and gate word highlight toggle

- Narrow gatherAudioElements selector from .container to #content so UI chrome
  elements in interface.html don't pollute the TTS queue
- Conditionally show the word-highlight toggle based on the highlight feature flag
- Drop approximate evenly-divided timings when word timecodes are missing; fall
  back to simpler block highlight to avoid drift
- Force yellow-on-black styling for highlighted word spans regardless of parent
  text color
- Reposition collapsed accessibility card button

* chore(tts): remove debug console logs

* fix(tts): bust browser cache when replacing audio files

Uploaded audio reuses the same on-disk filename as the generated TTS
(`{textId}.{format}`), so the audio URL didn't change after replace. With
Cache-Control: public, max-age=86400 on the audio route, the browser kept
serving the previous bytes and the waveform never refreshed.

Include a per-entry cacheKey (audio file mtime) in the TTS GET response and
upload-one response, and append it as a `?v=` query param on getAudioUrl.
Also fold cacheKey into the WaveformPlayer React key so the player remounts
on replace.

* fix(tts): use URLSearchParams for cache-bust query to satisfy lint
@elasticsounds
Copy link
Copy Markdown
Contributor

Hey Ravi - I've made a lot of changes to make this functional. Here is the list.

  • Word level highlighting in adt preview would break since it was using class container to do gather audio elements - switched this to content id which is more reliable.
  • Word highlighter option is now in text to speech options area.
  • Manual glossary term creation uses the book text for selecting glossary terms to use. AI generates the definition, selects variations and glossary.
  • Per row speech upload was not working due to cache, fixed it so it functions. Will show in wavefrom and play.
  • Seems to work better.

Resolve conflicts in Lingui catalogs by accepting the feature branch,
re-running extract, and backfilling translations for 35 new strings
from main across es/fr/pt-BR.

Adapt package-web tests to main's section schema rename (parts -> nodes).
@nicpottier
Copy link
Copy Markdown
Contributor

Very cool set of features, having this broken out into separate PRs would have been good. Nice work on all of these though!

On highlighting, awesome feature to add, work great!

  • settings such as whether to enable highlighting belong in the settings menu, on the speech tab would be fine. That's also where rebuilds should be triggered. Currently the "word highlighting" toggle I take it controls whether we calculate word boundaries when rebuilding so that belongs in settings.
  • the "calculate word boundaries for all entries" feels like a weird thing (and also not very intuitive based on the icon). When is this used? I take it maybe when someone uploads an audio file? Seems like that should just be automatic based on the settings value? Or an option visible when you upload the item?
  • deleting all speech is maybe a useful thing to have but we should chat about a way to unify that kind of global destructive behavior. (I can imagine the same might be wanted for quizzes for example). The confirmation dialog uses a browser alert which we don't use anywhere else, that should be a normal dialog explaining what's going to happen etc..
  • The UI for shifting word boundaries allows me to set overlapping boundaries, is that by design? Will that work? Feels like maybe a UI that has a sliders could maybe be more user friendly but realize that's a lot of work. At the very least we should validate boundaries aren't overlapping unless that's by design.
  • The replace button is kind of giant, maybe it is just an icon with a tooltip?

On Glossary:

  • The dialog says "Start typing to see words from the book" but nothing happens when I start typing? Is that by design?
  • After I add an item it is at the bottom of the list instead of alphabetical.
  • I like how manual entries are preserved on rebuilds by using a manual source field on the versioned list and merging those back in. That's a pattern we should try elsewhere.

@elasticsounds
Copy link
Copy Markdown
Contributor

List of changes:

  • Moved timestamp generation from header to settings. (still able to trigger individual timestamps per item).
  • Added step in Speech for calculate timestamps.
  • Added timestamp toggle to landing page, disabled by default (it takes a long time to calculate these).
  • Left delete all speech - we can explore a global update for this across the application.
  • Updated UI for setting manual boundaries, will not allow you to enter a value before the preceding entry or after the following entry. You can have to manually update. Slider UI for sure in future version.
  • Made the replace button just an icon with a tooltip.
  • Start typing bug fixed for glossary.
  • Moved manually added items to be alphabetically sorted.
  • Agree on the manual tag - will follow up with another pull request on manually edited fields.

@elasticsounds
Copy link
Copy Markdown
Contributor

  • fixed no cache support for regeneration of word highlighting step

@elasticsounds
Copy link
Copy Markdown
Contributor

  • after adding manual glossary item, sets translation and speech to be re-run. Added banner at top of translation and speech as shortcut for user.

@elasticsounds
Copy link
Copy Markdown
Contributor

  • Timestamps no longer overwritten is highlighting is disabled. Previously being overwritten by empty values.

#332)

- Make word_highlighting opt-in (default false) across runtime, packaging, and preview routes
- Split tts and word-timestamps into separate pipeline steps with their own progress
- Cache Whisper word timestamps on audio+language+prompt for fast re-runs
- Compute stage-missing counts from configured output languages so never-run langs surface
- Stop reordering glossary list when adding manual entries; add inline emoji editor
- Localize the word-highlight preview sentence and add ES/FR/PT-BR translations
- Add re-run banner, Switch-based highlight toggle, and timestamp clamping in TranslationsView
- Quick-toggle TTS button now starts playback immediately when enabling read-aloud
#334)

The autofill suggestions in AddGlossaryDialog read from the text-catalog
node, which is only built during the translate stage. Books that have
only run through glossary/quizzes had no catalog, so suggestions were
empty. The text-catalog GET endpoint now lazily builds the catalog from
existing pipeline outputs when missing.

Also disables Add Term and the version-picker Save button while the
glossary stage is running, to prevent a pending edit from clobbering a
freshly generated glossary on save.
@elasticsounds
Copy link
Copy Markdown
Contributor

  • Auto-builds text-catalog when opening modal to add manual glossary terms - consider in a later update to generate text-catalog progressively after each step. Keeping translate to only translations.

@elasticsounds
Copy link
Copy Markdown
Contributor

@nicpottier ready for your review before merge

Auto-generate now also fires on word input blur, but clicking the
regenerate icon button blurs the input first and then runs its onClick,
which caused two LLM generations per click. Skip the blur-triggered
auto-generate when focus is moving to the regenerate button.
@elasticsounds
Copy link
Copy Markdown
Contributor

@copilot resolve the merge conflicts in this pull request

Resolved conflicts in:
- packages/pipeline/src/package-web.ts: kept both speechConfig and features options
- apps/api/src/services/export-service.ts: kept both speechConfig and features in opts
- apps/studio/src/components/pipeline/components/StageSidebar.tsx: merged imports, used main's iconFilled logic
- apps/studio/src/locales/*.po: merged all new strings from both branches

Co-authored-by: elasticsounds <5905047+elasticsounds@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 29, 2026

@copilot resolve the merge conflicts in this pull request

Merge conflicts resolved in commit ff4126e. The 7 conflicting files were:

  • packages/pipeline/src/package-web.ts — kept both speechConfig (word highlighting) and features (glossary/readAloud/quizzes/signLanguage) in the options interface; updated the single hasTTS definition to include the features.readAloud gate to avoid a duplicate declaration
  • apps/api/src/services/export-service.ts — kept both speechConfig: config.speech and features in the package opts
  • apps/studio/src/components/pipeline/components/StageSidebar.tsx — merged all three imports (useStageMissingCounts, usePackageAdtStatus, useSignLanguageVideos); used main's simpler iconFilled logic which correctly uses stageCompleted (already accounts for completionOverrides)
  • en.po, es.po, fr.po, pt-BR.po — merged all new locale strings from both branches

TypeScript checks and tests all pass.

@elasticsounds elasticsounds merged commit 70e67f7 into main Apr 29, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants