Skip to content

fix: centralize embedding model config to prevent query/ingest mismatch#912

Open
OmkarKirpan wants to merge 6 commits intoMemPalace:developfrom
OmkarKirpan:fix/embedding-model-mismatch
Open

fix: centralize embedding model config to prevent query/ingest mismatch#912
OmkarKirpan wants to merge 6 commits intoMemPalace:developfrom
OmkarKirpan:fix/embedding-model-mismatch

Conversation

@OmkarKirpan
Copy link
Copy Markdown

@OmkarKirpan OmkarKirpan commented Apr 15, 2026

Summary

Closes #903

Adds centralized embedding model configuration so the MCP server, CLI search, and all ingest paths use the same model — fixing silent query failures when models mismatch.

Design Decisions

  1. Storage approach: Embedding model name stored in ChromaDB collection metadata (not a separate file). Atomic with the collection, can't desync. Absence of the key = legacy palace.

  2. Default for new palaces: all-mpnet-base-v2 (768-dim) — better search quality (+3.5pp on LoCoMo R@10 benchmarks over MiniLM).

  3. Default for existing palaces: all-MiniLM-L6-v2 (384-dim) — backwards compatible, no re-mining required. Detected by absence of embedding_model key in collection metadata.

  4. Resolution chain: Collection metadata (authoritative) > config file / env var (new palaces only) > built-in default. This means once a palace is created, its model is locked in and self-describing.

  5. No migration tool in this PR: Re-embedding existing palaces from MiniLM to mpnet is a separate concern. This PR prevents the mismatch; migrating existing palaces is a follow-up.

  6. All create paths stamp the model: Repair, rebuild, and migrate operations preserve the original model through the delete/recreate cycle.

Resolution chain

1. Collection metadata "embedding_model" key (authoritative, stamped at build)
2. If absent → legacy palace → all-MiniLM-L6-v2
3. config.json "embedding_model" or MEMPALACE_EMBEDDING_MODEL env var → new palace creation only

Files changed

  • New: mempalace/embedding.py — model registry, resolution, embedding function factory
  • Modified: mempalace/config.pyembedding_model property on MempalaceConfig
  • Modified: mempalace/backends/chroma.pyget_collection(), get_or_create_collection(), create_collection() accept embedding_function + embedding_model_name
  • Modified: mempalace/palace.py — resolves model from metadata on read, stamps on create
  • Modified: mempalace/mcp_server.py_get_collection() uses correct embedding function, tool_status() reports active model
  • Modified: mempalace/cli.py, mempalace/repair.py, mempalace/migrate.py — all collection-create paths now stamp embedding model

Known follow-ups (not in scope)

Test plan

  • 936/936 tests pass locally
  • Legacy palace (no metadata key) resolves to MiniLM
  • New palace creation stamps mpnet in collection metadata
  • Config override and env var override work
  • Repair/migrate preserve embedding model through rebuild
  • mempalace_status reports active embedding model
  • ruff check and ruff format clean

Single source of truth for embedding model resolution.
Resolves from collection metadata, falls back to MiniLM for legacy palaces.
New palaces default to all-mpnet-base-v2 (768-dim).

Part of MemPalace#903
Resolves from config.json or MEMPALACE_EMBEDDING_MODEL env var.
Used for new palace creation only; existing palaces read from
collection metadata.

Part of MemPalace#903
get_collection() and get_or_create_collection() now accept optional
embedding_function and embedding_model_name params. Model name is
stamped into collection metadata on create. Fully backwards compatible.

Part of MemPalace#903
On create: stamps new_palace_model() (mpnet) into collection metadata.
On read: resolves model from metadata, falls back to MiniLM for legacy.
All collection access now uses the correct embedding function.

Also fixes tests that opened bare PersistentClient instances without
the correct embedding function, causing dimension mismatches (768 vs 384).

Part of MemPalace#903
_get_collection() resolves the model from collection metadata and
passes the correct embedding_function to ChromaDB. tool_status()
reports the active embedding_model.

Closes MemPalace#903
- ChromaBackend.create_collection() now accepts embedding_function
  and embedding_model_name params
- cli.py repair, repair.py rebuild_index: read embedding model from
  existing collection before delete/recreate, preserve it
- migrate.py: stamp new_palace_model() on migrated palaces
- palace.get_collection(): accept optional config param so CLI mining
  respects config.json embedding_model setting
- Update test_rebuild_index_success to verify new embedding args

Addresses code review findings MemPalace#4, MemPalace#5, MemPalace#7 for MemPalace#903
@OmkarKirpan OmkarKirpan marked this pull request as ready for review April 15, 2026 12:15
@igorls igorls added bug Something isn't working area/search Search and retrieval area/mcp MCP server and tools labels Apr 15, 2026
rosschurchill added a commit to rosschurchill/mempalace that referenced this pull request Apr 18, 2026
Three Phase 2 fixes:

1. Embedding model guard — palace_meta.json (MemPalace#903/MemPalace#912):
   - Added embedding_model property to MempalaceConfig (default:
     all-MiniLM-L6-v2; env MEMPALACE_EMBEDDING_MODEL overrides).
   - write_palace_meta() in palace.py writes model name + timestamp to
     <palace>/palace_meta.json at the end of every mine run.
   - read_palace_meta() in palace.py reads it back at search time.
   - search_memories() in searcher.py compares the stored ingest model
     against the current config model; if they differ, a "warning" key
     is added to the search result and logged to stderr.
   - Non-fatal: old palaces without palace_meta.json get no warning.
   - Prevents silent garbage results when users switch embedding models.

2. silent_save config respected in hooks_cli.py (MemPalace#854):
   - hook_stop() now gates on MEMPAL_VERBOSE env var, matching the
     existing behavior in hooks/mempal_save_hook.sh.
   - Default (MEMPAL_VERBOSE unset): mine transcript in background,
     never block the AI or interrupt the conversation.
   - MEMPAL_VERBOSE=true: block with diary reason (developer mode).
   - Updated tests to reflect new default behavior.

3. sanitize_name() Unicode — confirmed already working; skipped (MemPalace#637).
   Python 3's re.UNICODE default makes [^\W_] match all Unicode letters.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
@NickShtefan
Copy link
Copy Markdown

@OmkarKirpan — heads-up that #442 just got rebased onto develop (mergeStateStatus moved from DIRTY → UNSTABLE; CI running). My fixes overlap on the _get_collection mismatch bug (#903) but the scopes differ:

#912 #442
MCP _get_collection reads model from collection metadata
EmbeddingModelMismatchError ✓ (raised + propagated through handle_request)
mempalace init --model <name> ✓ (model stamped at palace creation)
mempalace re-mine --model <new> ✓ (safe migration)
Auto-detect device (cuda > mps > cpu)
[multilingual] extra
Default for new palaces mpnet-base-v2 (768) configurable via --model
Backwards compat for legacy palaces ✓ (MiniLM fallback) ✓ (auto-stamped as chromadb-default)

#442 effectively contains your fix as a subset. If maintainers prefer the broader scope, #912 would be superseded. If they prefer the narrower minimal fix first, I can wait for #912 to merge and rebase #442 on top — happy to coordinate either way.

@igorls @bensig — would appreciate guidance on which sequencing you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/mcp MCP server and tools area/search Search and retrieval bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: embedding model mismatch — MCP server uses MiniLM (384-dim) while ingest can use mpnet (768-dim)

3 participants