Qwen3TTS: cache reference audio embeddings across voice clone calls by Oliver2213 · Pull Request #113 · Blaizzy/mlx-audio-swift

Oliver2213 · 2026-03-21T21:41:08Z

When generating multiple outputs with the same cloned voice, Qwen3TTSModel recomputes identical work on every call: speaker embedding extraction, codec encoding, ref text tokenization, TTS special token embeddings, and codec embedding construction.

This adds instance-level caching for all five on Qwen3TTSModel. Results are computed on first use and reused on subsequent calls with the same reference audio. Cache is keyed on refAudio.shape and invalidated automatically when the reference audio changes. A public clearRefCache() method is provided for explicit cleanup.

Co-written with CLaude when working on something else, but this looks fine to me. Happy to fix others if caching like this can benefit other models.

When generating multiple outputs with the same cloned voice, Qwen3TTSModel recomputes identical work on every call: speaker embedding extraction, codec encoding, ref text tokenization, TTS special token embeddings, and codec embedding construction. This adds instance-level caching for all five on Qwen3TTSModel. Results are computed on first use and reused on subsequent calls with the same reference audio. Cache is keyed on refAudio.shape and invalidated automatically when the reference audio changes. A public clearRefCache() method is provided for explicit cleanup. Co-written with CLaude when working on something else, but this looks fine to me. Happy to fix others if caching like this can benefit other models.

lucasnewman · 2026-03-25T20:31:38Z

@Oliver2213 Thanks! This patch is going to have issues with concurrent access to the cache and the shape-based cache key isn't reliable -- I put up a modified version of it in #125 that should be safer and works for what you're trying to do.

Oliver2213 · 2026-03-27T05:26:28Z

@lucasnewman, thanks a bunch for fixing this up and refiling. I definitely missed those when reading the diff.

Oliver2213 added 2 commits March 21, 2026 15:38

Merge branch 'main' into ref-audio-cache

83ba008

lucasnewman mentioned this pull request Mar 25, 2026

[Qwen3 TTS] Cache reference audio between generations #125

Open

Oliver2213 closed this Mar 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qwen3TTS: cache reference audio embeddings across voice clone calls#113

Qwen3TTS: cache reference audio embeddings across voice clone calls#113
Oliver2213 wants to merge 2 commits intoBlaizzy:mainfrom
Oliver2213:ref-audio-cache

Oliver2213 commented Mar 21, 2026

Uh oh!

lucasnewman commented Mar 25, 2026

Uh oh!

Oliver2213 commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Oliver2213 commented Mar 21, 2026

Uh oh!

lucasnewman commented Mar 25, 2026

Uh oh!

Oliver2213 commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants