Skip to content

OllamaBackend._embed_batch causes 400 errors due to per-text context overflow with dense-tokenizing content #87

@carterwickstrom-mapache

Description

What happened?

cce init fails with:

httpx.HTTPStatusError: Client error '400 Bad Request' for url 'http://localhost:11434/api/embed'
{"error":"the input length exceeds the context length"}

Root cause

Two compounding issues:

  • No per-text size guard before embedding. Certain file types (GitHub Actions YAML, Python files with long # ===...=== separator blocks) tokenize at close to 1 char/token due to special characters (${{, }}, =, :) each being their own token. A 8,745-char YAML file can exceed nomic-embed-text's 8,192-token context limit even though it looks small in bytes.
  • truncate: true is unreliable. Passing "truncate": true in the /api/embed request does not reliably prevent the 400 — Ollama 0.24.0 appears to ignore this parameter in some cases.

What did you expect?

Expected indexing should run and produce a summary. CCE should guard against oversized inputs before sending to Ollama.

Steps to reproduce

Run cce init on any project containing GitHub Actions workflows or Python analysis files with dense separator comments. Model: nomic-embed-text, Ollama 0.24.0.

Proposed fix

In OllamaBackend._embed_batch, truncate each text to a safe limit before batching, with a one-at-a-time halving fallback for any batch that still fails:

def _embed_batch(self, texts):
    safe = [t[:3_000] for t in texts]
    resp = httpx.post(..., json={"model": ..., "input": safe})
    if resp.status_code != 400:
        resp.raise_for_status()
        return resp.json().get("embeddings", [])
    # Fallback: one at a time, halving on context errors
    out = []
    for text in safe:
        while text:
            r = httpx.post(..., json={"model": ..., "input": [text]})
            if r.status_code == 400 and "context length" in r.text:
                text = text[:len(text) // 2]
                continue
            r.raise_for_status()
            break
        out.extend(r.json().get("embeddings", []))
    return out

The 3,000-char limit was empirically determined to be safe for even the most token-dense content encountered (YAML with ${{ }} expressions at ~1 char/token).

Relevant logs or error output

Python version

3.14.1

OS

macOS 26.4.1

CCE version

0.4.21

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions