feat: add word highlighting, manual glossary terms, and speech audio uploads by ravi-adt · Pull Request #315 · unicef/adt-studio

ravi-adt · 2026-04-22T13:17:44Z

Summary

add word-level read-aloud highlighting with timestamp support, plus toggles in Speech and Accessibility settings
add manual glossary term creation so users can add entries not generated by AI
add per-row speech audio upload/replace and allow uploaded audio to use AI-generated timestamps
align preview/export/runtime behavior with the new speech highlight and timestamp flow

What Changed

Speech

added word-by-word read-aloud highlighting instead of only sentence/block highlighting
added a Speech-stage Word Highlight toggle for book creators
added per-text manual audio upload/replace in the Speech list
made uploaded manual audio compatible with AI timestamp generation

Accessibility

added a reader-facing Word highlight toggle in Accessibility Settings
persisted the setting in runtime state and wired it into playback behavior

Glossary

added a manual Add Term flow in the Glossary stage
persisted manual glossary items so they survive reruns and flow into downstream catalog/package output

Preview and Packaging

updated preview/export/runtime behavior so highlight mode and timestamps stay consistent across environments
fixed punctuation loss, fallback timestamp handling, and old blue-box flicker during rollout

Testing

pnpm exec vitest run apps/api/src/routes/adt-preview.test.ts apps/api/src/routes/glossary.test.ts apps/api/src/routes/tts.test.ts apps/api/src/services/stage-runner.test.ts packages/pipeline/src/__tests__/glossary.test.ts packages/pipeline/src/__tests__/text-catalog.test.ts packages/pipeline/src/__tests__/package-web.test.ts assets/adt/modules/tts_highlighter_utils.test.js
pnpm exec tsc --noEmit -p apps/api/tsconfig.json
pnpm exec tsc --noEmit -p apps/studio/tsconfig.json

Notes

manual uploaded audio now works with the existing timestamp flow
a full Speech rerun still regenerates stage audio and can overwrite manually uploaded files; that behavior was not changed in this PR

* feat(glossary): autofill manual glossary entries from book text Adds an autocomplete-powered Add Glossary dialog that suggests words from the book and auto-generates definition, variations, and emoji via a new single-term LLM route. * fix(tts): scope audio element discovery to #content and gate word highlight toggle - Narrow gatherAudioElements selector from .container to #content so UI chrome elements in interface.html don't pollute the TTS queue - Conditionally show the word-highlight toggle based on the highlight feature flag - Drop approximate evenly-divided timings when word timecodes are missing; fall back to simpler block highlight to avoid drift - Force yellow-on-black styling for highlighted word spans regardless of parent text color - Reposition collapsed accessibility card button * chore(tts): remove debug console logs * fix(tts): bust browser cache when replacing audio files Uploaded audio reuses the same on-disk filename as the generated TTS (`{textId}.{format}`), so the audio URL didn't change after replace. With Cache-Control: public, max-age=86400 on the audio route, the browser kept serving the previous bytes and the waveform never refreshed. Include a per-entry cacheKey (audio file mtime) in the TTS GET response and upload-one response, and append it as a `?v=` query param on getAudioUrl. Also fold cacheKey into the WaveformPlayer React key so the player remounts on replace. * fix(tts): use URLSearchParams for cache-bust query to satisfy lint

elasticsounds · 2026-04-22T19:56:53Z

Hey Ravi - I've made a lot of changes to make this functional. Here is the list.

Word level highlighting in adt preview would break since it was using class container to do gather audio elements - switched this to content id which is more reliable.
Word highlighter option is now in text to speech options area.
Manual glossary term creation uses the book text for selecting glossary terms to use. AI generates the definition, selects variations and glossary.
Per row speech upload was not working due to cache, fixed it so it functions. Will show in wavefrom and play.
Seems to work better.

Resolve conflicts in Lingui catalogs by accepting the feature branch, re-running extract, and backfilling translations for 35 new strings from main across es/fr/pt-BR. Adapt package-web tests to main's section schema rename (parts -> nodes).

nicpottier · 2026-04-23T15:56:07Z

Very cool set of features, having this broken out into separate PRs would have been good. Nice work on all of these though!

On highlighting, awesome feature to add, work great!

settings such as whether to enable highlighting belong in the settings menu, on the speech tab would be fine. That's also where rebuilds should be triggered. Currently the "word highlighting" toggle I take it controls whether we calculate word boundaries when rebuilding so that belongs in settings.
the "calculate word boundaries for all entries" feels like a weird thing (and also not very intuitive based on the icon). When is this used? I take it maybe when someone uploads an audio file? Seems like that should just be automatic based on the settings value? Or an option visible when you upload the item?
deleting all speech is maybe a useful thing to have but we should chat about a way to unify that kind of global destructive behavior. (I can imagine the same might be wanted for quizzes for example). The confirmation dialog uses a browser alert which we don't use anywhere else, that should be a normal dialog explaining what's going to happen etc..
The UI for shifting word boundaries allows me to set overlapping boundaries, is that by design? Will that work? Feels like maybe a UI that has a sliders could maybe be more user friendly but realize that's a lot of work. At the very least we should validate boundaries aren't overlapping unless that's by design.
The replace button is kind of giant, maybe it is just an icon with a tooltip?

On Glossary:

The dialog says "Start typing to see words from the book" but nothing happens when I start typing? Is that by design?
After I add an item it is at the bottom of the list instead of alphabetical.
I like how manual entries are preserved on rebuilds by using a manual source field on the versioned list and merging those back in. That's a pattern we should try elsewhere.

elasticsounds · 2026-04-25T19:02:38Z

List of changes:

Moved timestamp generation from header to settings. (still able to trigger individual timestamps per item).
Added step in Speech for calculate timestamps.
Added timestamp toggle to landing page, disabled by default (it takes a long time to calculate these).
Left delete all speech - we can explore a global update for this across the application.
Updated UI for setting manual boundaries, will not allow you to enter a value before the preceding entry or after the following entry. You can have to manually update. Slider UI for sure in future version.
Made the replace button just an icon with a tooltip.
Start typing bug fixed for glossary.
Moved manually added items to be alphabetically sorted.
Agree on the manual tag - will follow up with another pull request on manually edited fields.

elasticsounds · 2026-04-26T17:19:33Z

fixed no cache support for regeneration of word highlighting step

elasticsounds · 2026-04-26T18:37:52Z

after adding manual glossary item, sets translation and speech to be re-run. Added banner at top of translation and speech as shortcut for user.

elasticsounds · 2026-04-26T19:23:41Z

Timestamps no longer overwritten is highlighting is disabled. Previously being overwritten by empty values.

#332) - Make word_highlighting opt-in (default false) across runtime, packaging, and preview routes - Split tts and word-timestamps into separate pipeline steps with their own progress - Cache Whisper word timestamps on audio+language+prompt for fast re-runs - Compute stage-missing counts from configured output languages so never-run langs surface - Stop reordering glossary list when adding manual entries; add inline emoji editor - Localize the word-highlight preview sentence and add ES/FR/PT-BR translations - Add re-run banner, Switch-based highlight toggle, and timestamp clamping in TranslationsView - Quick-toggle TTS button now starts playback immediately when enabling read-aloud

#334) The autofill suggestions in AddGlossaryDialog read from the text-catalog node, which is only built during the translate stage. Books that have only run through glossary/quizzes had no catalog, so suggestions were empty. The text-catalog GET endpoint now lazily builds the catalog from existing pipeline outputs when missing. Also disables Add Term and the version-picker Save button while the glossary stage is running, to prevent a pending edit from clobbering a freshly generated glossary on save.

elasticsounds · 2026-04-26T21:48:54Z

Auto-builds text-catalog when opening modal to add manual glossary terms - consider in a later update to generate text-catalog progressively after each step. Keeping translate to only translations.

elasticsounds · 2026-04-26T21:49:19Z

@nicpottier ready for your review before merge

Auto-generate now also fires on word input blur, but clicking the regenerate icon button blurs the input first and then runs its onClick, which caused two LLM generations per click. Skip the blur-triggered auto-generate when focus is moving to the regenerate button.

elasticsounds · 2026-04-29T04:31:20Z

@copilot resolve the merge conflicts in this pull request

Resolved conflicts in: - packages/pipeline/src/package-web.ts: kept both speechConfig and features options - apps/api/src/services/export-service.ts: kept both speechConfig and features in opts - apps/studio/src/components/pipeline/components/StageSidebar.tsx: merged imports, used main's iconFilled logic - apps/studio/src/locales/*.po: merged all new strings from both branches Co-authored-by: elasticsounds <5905047+elasticsounds@users.noreply.github.com>

Copilot · 2026-04-29T04:38:28Z

@copilot resolve the merge conflicts in this pull request

Merge conflicts resolved in commit ff4126e. The 7 conflicting files were:

packages/pipeline/src/package-web.ts — kept both speechConfig (word highlighting) and features (glossary/readAloud/quizzes/signLanguage) in the options interface; updated the single hasTTS definition to include the features.readAloud gate to avoid a duplicate declaration
apps/api/src/services/export-service.ts — kept both speechConfig: config.speech and features in the package opts
apps/studio/src/components/pipeline/components/StageSidebar.tsx — merged all three imports (useStageMissingCounts, usePackageAdtStatus, useSignLanguageVideos); used main's simpler iconFilled logic which correctly uses stageCompleted (already accounts for completionOverrides)
en.po, es.po, fr.po, pt-BR.po — merged all new locale strings from both branches

TypeScript checks and tests all pass.

feat: add word highlighting, glossary terms, and speech uploads

3036e88

ravi-adt requested a review from elasticsounds April 22, 2026 13:21

ravi-adt and others added 3 commits April 22, 2026 19:07

chore(i18n): update extracted locale catalogs

2a774cb

fix(i18n): add missing studio locale strings

843d6af

elasticsounds requested a review from nicpottier April 23, 2026 12:17

elasticsounds assigned ravi-adt Apr 23, 2026

elasticsounds requested a review from Eliezir April 23, 2026 14:29

fix: scope llm credentials to requests

28f0f25

elasticsounds added 2 commits April 26, 2026 15:49

elasticsounds added 2 commits April 29, 2026 00:14

chore: trigger CI

3998b0f

Copilot started work on behalf of elasticsounds April 29, 2026 04:31 View session

Copilot finished work on behalf of elasticsounds April 29, 2026 04:39

chore(i18n): re-run extract after main merge (#362)

8f8a360

elasticsounds merged commit 70e67f7 into main Apr 29, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add word highlighting, manual glossary terms, and speech audio uploads#315

feat: add word highlighting, manual glossary terms, and speech audio uploads#315
elasticsounds merged 12 commits intomainfrom
feature/word-highlight-glossary-audio-upload

ravi-adt commented Apr 22, 2026

Uh oh!

elasticsounds commented Apr 22, 2026

Uh oh!

nicpottier commented Apr 23, 2026

Uh oh!

elasticsounds commented Apr 25, 2026

Uh oh!

elasticsounds commented Apr 26, 2026

Uh oh!

elasticsounds commented Apr 26, 2026

Uh oh!

elasticsounds commented Apr 26, 2026

Uh oh!

elasticsounds commented Apr 26, 2026

Uh oh!

elasticsounds commented Apr 26, 2026

Uh oh!

elasticsounds commented Apr 29, 2026

Uh oh!

Copilot AI commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ravi-adt commented Apr 22, 2026

Summary

What Changed

Speech

Accessibility

Glossary

Preview and Packaging

Testing

Notes

Uh oh!

elasticsounds commented Apr 22, 2026

Uh oh!

nicpottier commented Apr 23, 2026

Uh oh!

elasticsounds commented Apr 25, 2026

Uh oh!

elasticsounds commented Apr 26, 2026

Uh oh!

elasticsounds commented Apr 26, 2026

Uh oh!

elasticsounds commented Apr 26, 2026

Uh oh!

elasticsounds commented Apr 26, 2026

Uh oh!

elasticsounds commented Apr 26, 2026

Uh oh!

elasticsounds commented Apr 29, 2026

Uh oh!

Copilot AI commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants