IMPORTANT: These special tokens are present in the v2 vocabulary but may not work as expected!
Based on community testing and the lack of official documentation from ResembleAI, these emotion/sound tokens appear to be an incomplete implementation:
- ✅ Tokens exist in the v2 tokenizer vocabulary
⚠️ Limited effect - tokens may produce minimal or no audible changes- ❌ No official documentation - ResembleAI has not published usage guidelines
- 🔬 Experimental - model may not be fully trained to respond to these tokens
What works (partially):
- Some users report
<laughter> hahahaproduces slight effects - Results are inconsistent and unreliable
Community Discussion:
- See ResembleAI/chatterbox Issue #186 - "Use of Emotional Tags Like [laughter] During Generation"
- No official response from ResembleAI team on proper usage
Our implementation is ready for when/if ResembleAI improves this feature. Feel free to experiment, but don't expect production-ready results.
ChatterBox v2 vocabulary includes special tokens for emotions, sounds, and vocal effects. While the tokenizer supports these tags, the model's response to them is limited and undocumented.
To avoid conflicts with the character switching system [CharacterName], use angle brackets <> for v2 special tokens:
- ✅ Correct:
<giggle>,<sigh>,<whisper> - ❌ Wrong:
[giggle],[sigh],[whisper](conflicts with character names)
The system will automatically convert <emotion> → [emotion] internally for ChatterBox v2.
<giggle>- Light laughter<laughter>- Full laughter<guffaw>- Loud, boisterous laugh<sigh>- Sighing sound<cry>- Crying sound<gasp>- Gasping sound<groan>- Groaning sound
<inhale>- Inhaling/breath in<exhale>- Exhaling/breath out<whisper>- Whispered speech<mumble>- Mumbled speech<UH>- Hesitation sound (uh)<UM>- Hesitation sound (um)
<singing>- Singing voice<music>- Musical sounds<humming>- Humming sound<whistle>- Whistling sound
<cough>- Coughing sound<sneeze>- Sneezing sound<sniff>- Sniffing sound<snore>- Snoring sound<clear_throat>- Throat clearing<chew>- Chewing sound<sip>- Sipping/drinking sound<kiss>- Kissing sound
<bark>- Dog barking<howl>- Howling sound<meow>- Cat meowing
<shhh>- Shushing sound<gibberish>- Nonsensical speech
Hello there! <giggle> I'm so happy to see you.
Wait... <UM> I think I forgot something. <sigh>
<whisper> This is a secret message.
Look at that! <gasp> It's amazing!
<singing> La la la la la!
The angle bracket syntax <emotion> is specifically designed to avoid conflicts:
- Character switching uses
[CharacterName]- no conflict - Pause tags use
[pause:2],[wait:1.5]- no conflict - v2 special tokens use
<giggle>,<sigh>- no conflict
You can freely mix all three systems:
[Alice] Hello! <giggle> Nice to meet you. [pause:0.5] How are you? <whisper> I have a secret.
[Bob] <gasp> Really? Tell me more! [wait:1]
Processing order:
- Character tags
[CharacterName]are extracted first - Pause tags
[pause:XX]are processed second - v2 special tags
<emotion>are converted last (to[emotion]for the engine)
This ensures everything works together seamlessly!
To use these special tokens:
- Set model_version to v2 in the ChatterBox Official 23-Lang Engine node
- The v2 model will automatically download the enhanced tokenizer files
- Special tokens will be processed during generation
t3_mtl23ls_v2.safetensors- Enhanced T3 model with improved tokenizationgrapheme_mtl_merged_expanded_v1.json- Enhanced grapheme/phoneme mappings with 118 special tokens- Improved Russian stress handling
- Enhanced multilingual tokenizer fixes
The TTS Audio Suite automatically manages cache keys to prevent conflicts:
- v1 and v2 generations are cached separately
- Switching between versions will regenerate audio with the correct model
- Cache keys include
model_versionto ensure proper invalidation