Skip to content

Qwen3TTS: cache reference audio embeddings across voice clone calls#113

Closed
Oliver2213 wants to merge 2 commits intoBlaizzy:mainfrom
Oliver2213:ref-audio-cache
Closed

Qwen3TTS: cache reference audio embeddings across voice clone calls#113
Oliver2213 wants to merge 2 commits intoBlaizzy:mainfrom
Oliver2213:ref-audio-cache

Conversation

@Oliver2213
Copy link
Copy Markdown

When generating multiple outputs with the same cloned voice, Qwen3TTSModel recomputes identical work on every call: speaker embedding extraction, codec encoding, ref text tokenization, TTS special token embeddings, and codec embedding construction.

This adds instance-level caching for all five on Qwen3TTSModel. Results are computed on first use and reused on subsequent calls with the same reference audio. Cache is keyed on refAudio.shape and invalidated automatically when the reference audio changes. A public clearRefCache() method is provided for explicit cleanup.

Co-written with CLaude when working on something else, but this looks fine to me. Happy to fix others if caching like this can benefit other models.

When generating multiple outputs with the same cloned voice,
Qwen3TTSModel recomputes identical work on every call: speaker
embedding extraction, codec encoding, ref text tokenization,
TTS special token embeddings, and codec embedding construction.

This adds instance-level caching for all five on Qwen3TTSModel.
Results are computed on first use and reused on subsequent calls
with the same reference audio. Cache is keyed on refAudio.shape
and invalidated automatically when the reference audio changes.
A public clearRefCache() method is provided for explicit cleanup.

Co-written with CLaude when working on something else, but this looks fine to me. Happy to fix others if caching like this can benefit other models.
@lucasnewman
Copy link
Copy Markdown
Collaborator

@Oliver2213 Thanks! This patch is going to have issues with concurrent access to the cache and the shape-based cache key isn't reliable -- I put up a modified version of it in #125 that should be safer and works for what you're trying to do.

@Oliver2213
Copy link
Copy Markdown
Author

@lucasnewman, thanks a bunch for fixing this up and refiling. I definitely missed those when reading the diff.

@Oliver2213 Oliver2213 closed this Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants