Describe the issue as clearly as possible:
Use case: I want to constrain my output with a CFG, and I want some arbitrary thinking to happen beforehand.
How I am solving this: pass a CFG with an explicit section in the beginning, and then use my grammar.
What I have found: when my LLM has a tokenizer that includes a token, this breaks; when the tokenizer doesn't have that in its vocabulary, everything is fine.
Potential workaround: run inference once, extract the thinking section, run inference again with CFG with the thinking section pre-stuffed in the assistant's response.
This is distinct from #1627 .
The attached code breaks on Qwen3-4B-Thinking, but works fine on SmolLM2. Crucially, there is a ParserTooComplex error when the tokenizer vocabulary includes , and there is no error when the vocabulary doesn't.
Steps/code to reproduce the bug:
"""
Minimal reproducible example for outlines CFG bug with <think> special tokens.
This demonstrates that when a model has special tokens for <think> and </think>,
outlines CFG grammar fails to parse them correctly.
Expected behavior: Grammar should constrain output to have <think>...</think> followed by yes|no
Actual behavior: Parser error when trying to match special tokens against literal strings
Model: Qwen/Qwen3-4B-Thinking-2507 (has <think> token ID 151667, </think> token ID 151668)
"""
import transformers
from outlines import Transformers
from outlines.types import CFG
def main():
print("=== Outlines CFG Bug: Special Tokens in Grammar ===\n")
print(f"Loading model...")
pipe = transformers.pipeline(
"text-generation",
# "HuggingFaceTB/SmolLM2-1.7B-Instruct",
"Qwen/Qwen3-4B-Thinking-2507",
)
# Show that <think> and </think> are special tokens
print("\n--- Tokenizer Analysis ---")
vocab = pipe.tokenizer.get_vocab()
think_start_id = vocab.get('<think>')
think_end_id = vocab.get('</think>')
print(f"<think> token ID: {think_start_id}")
print(f"</think> token ID: {think_end_id}")
# Show how they encode
encoded_start = pipe.tokenizer.encode('<think>', add_special_tokens=False)
encoded_end = pipe.tokenizer.encode('</think>', add_special_tokens=False)
print(f"<think> encodes to: {encoded_start} (single token)")
print(f"</think> encodes to: {encoded_end} (single token)")
# Create outlines model
print("\n--- Setting up Outlines ---")
model = Transformers(pipe.model, pipe.tokenizer)
# Define a grammar that includes <think> tags
# This SHOULD work but DOESN'T due to special token handling
grammar_with_thinking = '''
?start: thinking_section answer
thinking_section: "<think>" /[^<]*/ "</think>" /[\\r\\n\\t ]*/
answer: "yes" | "no"
'''
print("Grammar:")
print(grammar_with_thinking)
cfg_type = CFG(grammar_with_thinking)
prompt = "Is the sky blue?"
print(f"\n--- Attempting Generation ---")
print(f"Prompt: {prompt}")
print("Expected: <think>reasoning here</think>\\nyes")
print("\nGenerating...")
try:
response = model(prompt, cfg_type, max_new_tokens=10000)
print(f"\nSuccess! Response: {response}")
except Exception as e:
print(f"\n❌ ERROR: {type(e).__name__}: {e}")
print("\nThis demonstrates the bug: outlines cannot match special tokens")
print("in the grammar against the tokenizer's single-token representation.")
# Show that a grammar without <think> tags works fine
print("\n\n--- Testing Grammar Without Special Tokens ---")
grammar_without_thinking = '''
?start: answer
answer: "yes" | "no"
'''
print("Grammar (no special tokens):")
print(grammar_without_thinking)
cfg_type_simple = CFG(grammar_without_thinking)
try:
response = model(prompt, cfg_type_simple, max_new_tokens=10)
print(f"\n✓ Success! Response: {response}")
print("\nThis works because there are no special tokens in the grammar.")
except Exception as e:
print(f"\n❌ ERROR: {type(e).__name__}: {e}")
if __name__ == "__main__":
main()
Expected result:
By uncommenting the SmolLM2 model specification and commenting the Qwen3 model specification, the code runs through with two successes, constrained and unconstrained.
Error message:
.venv/lib/python3.13/site-packages/outlines/backends/llguidance.py:175: UserWarning: Error in LLMatcher: Parser Error: token "�[151667]" doesn't satisfy the grammar; forced bytes: got '<'; applying 'ÿ'
<state>
Tokens: ⟦<think>⟧
1 tokens, 0 bytes; grm_prefix: ""
Flags:
Parser: {
"compute_time_us": 0,
"rows": 2,
"cached_rows": 0,
"all_items": 4,
"lexer_cost": 3271,
"slices_applied": 0,
"trie_nodes_walked": 0,
"definitive_bytes": 7,
"lexer_ops": 0,
"num_lex_errors": 0,
"num_lexemes": 0
}
Stop: ParserTooComplex
Error: Parser Error: token "�[151667]" doesn't satisfy the grammar; forced bytes: got '<'; applying 'ÿ'
</state><grammar>
?start: thinking_section answer
thinking_section: "<think>" /[^<]*/ "</think>" /[\r\n\t ]*/
answer: "yes" | "no"
</grammar>
Outlines/Python version information:
Version information
Details
```
% python -c "from outlines import _version; print(_version.version)"; python -c "import sys; print('Python', sys.version)"; uv pip freeze;
1.2.7
Python 3.13.3 (main, Apr 8 2025, 13:54:08) [Clang 17.0.0 (clang-1700.0.13.3)]
accelerate==1.10.1
aiofiles==24.1.0
aiohappyeyeballs==2.6.1
aiohttp==3.13.0
aiosignal==1.4.0
annotated-types==0.7.0
anyio==4.11.0
attrs==25.4.0
audioop-lts==0.2.2
brotli==1.1.0
certifi==2025.10.5
charset-normalizer==3.4.4
click==8.3.0
cloudpickle==3.1.1
datasets==4.2.0
dill==0.4.0
diskcache==5.6.3
fastapi==0.119.0
ffmpy==0.6.3
filelock==3.20.0
frozenlist==1.8.0
fsspec==2025.9.0
genson==1.3.0
gradio==5.49.1
gradio-client==1.13.3
groovy==0.1.2
h11==0.16.0
hf-xet==1.1.10
httpcore==1.0.9
httpx==0.28.1
huggingface-hub==0.35.3
idna==3.11
iniconfig==2.1.0
jinja2==3.1.6
joblib==1.5.2
jsonpath-ng==1.7.0
jsonschema==4.25.1
jsonschema-specifications==2025.9.1
llguidance==1.2.0
markdown-it-py==4.0.0
markupsafe==3.0.3
mdurl==0.1.2
mpmath==1.3.0
multidict==6.7.0
multiprocess==0.70.16
networkx==3.5
ninja==1.13.0
numpy==2.3.4
optimum-quanto==0.2.7
orjson==3.11.3
outlines==1.2.7
outlines-core==0.2.11
packaging==25.0
pandas==2.3.3
pillow==11.3.0
pluggy==1.6.0
ply==3.11
propcache==0.4.1
psutil==7.1.0
pyarrow==21.0.0
pydantic==2.11.10
pydantic-core==2.33.2
pydub==0.25.1
pygments==2.19.2
pytest==8.4.2
python-dateutil==2.9.0.post0
python-multipart==0.0.20
pytz==2025.2
pyyaml==6.0.3
referencing==0.37.0
regex==2025.9.18
requests==2.32.5
rich==14.2.0
rpds-py==0.27.1
ruff==0.14.0
safehttpx==0.1.6
safetensors==0.6.2
scikit-learn==1.7.2
scipy==1.16.2
semantic-version==2.10.0
sentence-transformers==5.1.1
sentencepiece==0.2.1
setuptools==80.9.0
shellingham==1.5.4
six==1.17.0
sniffio==1.3.1
starlette==0.48.0
sympy==1.14.0
threadpoolctl==3.6.0
tokenizers==0.22.1
tomlkit==0.13.3
torch==2.9.0
tqdm==4.67.1
transformers==4.57.1
typer==0.19.2
typing-extensions==4.15.0
typing-inspection==0.4.2
tzdata==2025.2
urllib3==2.5.0
uvicorn==0.37.0
websockets==15.0.1
xxhash==3.6.0
yarl==1.22.0
```
Context for the issue:
No response
Describe the issue as clearly as possible:
Use case: I want to constrain my output with a CFG, and I want some arbitrary thinking to happen beforehand.
How I am solving this: pass a CFG with an explicit section in the beginning, and then use my grammar.
What I have found: when my LLM has a tokenizer that includes a token, this breaks; when the tokenizer doesn't have that in its vocabulary, everything is fine.
Potential workaround: run inference once, extract the thinking section, run inference again with CFG with the thinking section pre-stuffed in the assistant's response.
This is distinct from #1627 .
The attached code breaks on Qwen3-4B-Thinking, but works fine on SmolLM2. Crucially, there is a ParserTooComplex error when the tokenizer vocabulary includes , and there is no error when the vocabulary doesn't.
Steps/code to reproduce the bug:
Expected result:
Error message:
Outlines/Python version information:
Version information
Details
``` % python -c "from outlines import _version; print(_version.version)"; python -c "import sys; print('Python', sys.version)"; uv pip freeze; 1.2.7 Python 3.13.3 (main, Apr 8 2025, 13:54:08) [Clang 17.0.0 (clang-1700.0.13.3)] accelerate==1.10.1 aiofiles==24.1.0 aiohappyeyeballs==2.6.1 aiohttp==3.13.0 aiosignal==1.4.0 annotated-types==0.7.0 anyio==4.11.0 attrs==25.4.0 audioop-lts==0.2.2 brotli==1.1.0 certifi==2025.10.5 charset-normalizer==3.4.4 click==8.3.0 cloudpickle==3.1.1 datasets==4.2.0 dill==0.4.0 diskcache==5.6.3 fastapi==0.119.0 ffmpy==0.6.3 filelock==3.20.0 frozenlist==1.8.0 fsspec==2025.9.0 genson==1.3.0 gradio==5.49.1 gradio-client==1.13.3 groovy==0.1.2 h11==0.16.0 hf-xet==1.1.10 httpcore==1.0.9 httpx==0.28.1 huggingface-hub==0.35.3 idna==3.11 iniconfig==2.1.0 jinja2==3.1.6 joblib==1.5.2 jsonpath-ng==1.7.0 jsonschema==4.25.1 jsonschema-specifications==2025.9.1 llguidance==1.2.0 markdown-it-py==4.0.0 markupsafe==3.0.3 mdurl==0.1.2 mpmath==1.3.0 multidict==6.7.0 multiprocess==0.70.16 networkx==3.5 ninja==1.13.0 numpy==2.3.4 optimum-quanto==0.2.7 orjson==3.11.3 outlines==1.2.7 outlines-core==0.2.11 packaging==25.0 pandas==2.3.3 pillow==11.3.0 pluggy==1.6.0 ply==3.11 propcache==0.4.1 psutil==7.1.0 pyarrow==21.0.0 pydantic==2.11.10 pydantic-core==2.33.2 pydub==0.25.1 pygments==2.19.2 pytest==8.4.2 python-dateutil==2.9.0.post0 python-multipart==0.0.20 pytz==2025.2 pyyaml==6.0.3 referencing==0.37.0 regex==2025.9.18 requests==2.32.5 rich==14.2.0 rpds-py==0.27.1 ruff==0.14.0 safehttpx==0.1.6 safetensors==0.6.2 scikit-learn==1.7.2 scipy==1.16.2 semantic-version==2.10.0 sentence-transformers==5.1.1 sentencepiece==0.2.1 setuptools==80.9.0 shellingham==1.5.4 six==1.17.0 sniffio==1.3.1 starlette==0.48.0 sympy==1.14.0 threadpoolctl==3.6.0 tokenizers==0.22.1 tomlkit==0.13.3 torch==2.9.0 tqdm==4.67.1 transformers==4.57.1 typer==0.19.2 typing-extensions==4.15.0 typing-inspection==0.4.2 tzdata==2025.2 urllib3==2.5.0 uvicorn==0.37.0 websockets==15.0.1 xxhash==3.6.0 yarl==1.22.0 ```Context for the issue:
No response