Skip to content

feat: add configurable multilingual embedding model support#442

Open
NickShtefan wants to merge 2 commits intoMemPalace:developfrom
NickShtefan:feat/multilingual-embedding
Open

feat: add configurable multilingual embedding model support#442
NickShtefan wants to merge 2 commits intoMemPalace:developfrom
NickShtefan:feat/multilingual-embedding

Conversation

@NickShtefan
Copy link
Copy Markdown

@NickShtefan NickShtefan commented Apr 9, 2026

What does this PR do?

Adds configurable multilingual embedding model support with init-time binding and production-tested infrastructure for safe model switching. The embedding model becomes a property of the palace itself (stamped in collection metadata at init), not an environment variable that can drift between processes.

Closes #903 — bug: embedding model mismatch — MCP server uses MiniLM (384-dim) while ingest can use mpnet (768-dim). The fix routes mcp_server._get_collection() through the same path the rest of the codebase already uses, so MCP queries always use the embedding function the palace was built with.

This PR is the embedding infrastructure layer that #488 (multilingual classification by @EndeavorYen) builds on top of. Agreed coordination: merge #442 first, then rebase #488 on top.

Related parallel work: #912 (@OmkarKirpan) targets the same _get_collection mismatch bug with a narrower scope. This PR covers that fix plus configurable model name, init-time binding, mismatch detection, re-mine command, and MCP error propagation. See cross-linked discussion for coordination.

Changes

Init-time embedding model binding (replaces env vars):

  • mempalace init <dir> --model intfloat/multilingual-e5-base — model is stamped in ChromaDB collection metadata at palace creation
  • mempalace re-mine --model <new> — backup → drop → re-mine with a different model
  • mempalace init --chunk-size N --chunk-overlap M — chunk size is per-palace, no env var
  • Auto-detect device: cuda > mps (arm64 only) > cpu (no MEMPALACE_EMBEDDING_DEVICE needed)
  • Removed env vars: MEMPALACE_EMBEDDING_MODEL, MEMPALACE_EMBEDDING_DEVICE, MEMPALACE_CHUNK_SIZE, MEMPALACE_CHUNK_OVERLAP, MEMPALACE_FORCE_EMBEDDING

MCP fix (closes #903):

  • mcp_server._get_collection() now reads embedding_model from collection metadata and resolves the matching embedding_function — previously it silently fell back to ChromaDB's all-MiniLM-L6-v2 regardless of the palace's actual model
  • EmbeddingModelMismatchError propagated through _get_collection() and handle_request() (not swallowed by generic except)
  • The CLI and searcher.py paths were already correct via palace.get_collection() — only the MCP path was broken; this commit mirrors that behaviour for MCP

Other:

  • Backend seam abstraction (mempalace/backends/chroma.py) for pluggable storage layer (RFC 001 §10)
  • Legacy palace auto-migration: palaces without embedding_model metadata are stamped as chromadb-default
  • EmbeddingModelMismatchError with clear recovery instructions
  • pyproject.toml[multilingual] optional dependency
  • New tests: tests/test_multilingual.py (28 tests), tests/test_remine.py (re-mine flow)

Backward compatible: existing palaces continue working with their original model (auto-stamped as chromadb-default); no re-mining required.

Production data (Russian): search scores 0.19→0.77 with intfloat/multilingual-e5-base on the same 247-blog-post corpus. Note: paraphrase-multilingual-MiniLM-L12-v2 (proposed in #231) has a 128-token window that truncates chunks and degrades quality — e5-base (512 tokens) is the recommended default for multilingual.

How to test

# Install with multilingual support
pip install -e ".[dev,multilingual]"

# Run tests
python -m pytest tests/ -v --ignore=tests/benchmarks

# Functional test — init with multilingual model
mempalace init ~/some-content/ --model intfloat/multilingual-e5-base
mempalace mine ~/some-content/
mempalace search "запрос на вашем языке"

# Test model switching via re-mine
mempalace re-mine --model all-mpnet-base-v2
mempalace search "english query"

# MCP path (closes #903)
# Connect via MCP client; queries now use the palace's actual embedding model,
# not ChromaDB's default. No more silent dimension mismatch.

Checklist

  • Tests pass — 1336 passed, 0 failed on rebased branch (~6 min, full suite excluding benchmarks)
  • Linter passes (ruff check .)
  • Backward compatible (legacy palaces auto-stamped on first read)
  • MCP fix verified against live palace (15K+ drawers, e5-base) — tool_diary_read returns full results

Contributors

  • @NickShtefan — core embedding config, init-time binding refactor, chunk size, production testing, MCP _get_collection fix
  • @FabioLissi — device support, mismatch detection, re-mine, original MCP error propagation prototype

Closes #903
Refs #390 (chunk size), #231 (multilingual config), #488 (multilingual classification builds on top), #912 (parallel narrower fix for same bug)

@sidonsoft
Copy link
Copy Markdown

Real-World Testing Results

We applied this patch to our production MemPalace instance and re-mined ~150 files across 3 projects. Here are the findings:

📊 Impact

Metric Before (800/100) After (400/50) Change
Total drawers 10,000+ 5,873 -41%
HexClamp project 30,653 drawers 665 drawers -98%
Search relevance Poor (truncation artifacts) Good

🔍 Search Quality Improvement

Before: Searching for "extension config" returned mypy type cache data (.data.json files) with match scores around -0.36

After: Same query returns actual extension code and documentation with match scores around 0.19 (positive correlation = better semantic match)

🐛 Additional Fix Applied

We also excluded mypy cache files which were polluting search results:

# miner.py line 304
) or filename.endswith((".data.json", ".meta.json")):
    continue

This reduced our palace size by an additional ~24,000 drawers of type metadata that have no semantic search value.

✅ Recommendation

Merge this PR. The smaller chunk size:

  1. Fits within the embedding model's 256-token limit (~512 chars)
  2. Improves search relevance significantly
  3. Reduces storage requirements
  4. Prevents silent truncation of important context

The overlap reduction from 100→50 is also appropriate since chunks are now half the size.


Test environment:

  • mempalace 3.0.0
  • ChromaDB 1.5.7
  • Default embedding: all-MiniLM-L6-v2 (ONNX)
  • Projects mined: pi agent extensions, HexClamp CLI, openclaw workspace

@NickShtefan
Copy link
Copy Markdown
Author

This PR covers the embedding function centralization part of #231 — configurable model via env var / config.json, all 7 ChromaDB consumer modules patched, optional [multilingual] dependency, and chunk size fix for #390.

I tested this on 245 Russian blog posts (5 years of content). With intfloat/multilingual-e5-base, relevance scores jumped from 0.19–0.40 to 0.70–0.77. Happy to share the full benchmark data if useful.

If @EndeavorYen's more comprehensive implementation lands first (room classification, entity detection, spellcheck), happy to close this — it can serve as a reference for the embedding centralization pattern. Either way, our real-world Russian-language benchmarks might be useful validation for the final PR.

@FabioLissi
Copy link
Copy Markdown

Nice PR! The env → config.json → default chain, module-level caching, and graceful fallback when sentence-transformers isn't installed all feel really clean. The CHUNK_SIZE 800→450 fix is a great bonus too.

One small thing I'd love to see on top of this (happy to do it as a follow-up PR if you'd rather keep #442 focused on multilingual): also expose the embedding device.

Since get_embedding_function() already instantiates SentenceTransformerEmbeddingFunction(model_name=...), and that class accepts a device kwarg, it's a pretty small addition. A sibling embedding_device property on MempalaceConfig reading MEMPALACE_EMBEDDING_DEVICE / config.json, and conditionally passing it through:

kwargs = {"model_name": model_name}
if cfg.embedding_device:
kwargs["device"] = cfg.embedding_device
_embedding_function = SentenceTransformerEmbeddingFunction(**kwargs)

The reason I'm asking: MemPalace currently disables CoreML in init.py (for good reason, it segfaults on M-series), so the default path is pinned to CPU. On a large corpus like /.claude/projects/ (1.5 GB of JSONL for me), that's 30 to 90 minutes of embedding on an otherwise-idle Apple Silicon GPU. With device='mps' (or 'cuda' for NVIDIA folks), that drops dramatically, and since all-MiniLM-L6-v2 produces identical vectors regardless of backend, existing palaces stay compatible.

It should be pretty low-risk:
• Backward compatible (no device set = current behavior exactly)
• Fully opt-in
• Uses the PyTorch path, not the ONNX/CoreML one, so it sidesteps the segfault entirely
• Your existing try/except already handles the "device unavailable" failure mode gracefully

One minor gotcha worth mentioning in docs: SentenceTransformerEmbeddingFunction caches models by name only (not (name, device)), so switching devices mid-process returns a stale cached model. Only matters for tests/benchmarks, but good to know.

Very happy to send a follow-up PR if you'd prefer to keep this one scoped. Either way, thanks for the contribution, this unlocks a lot!

FabioLissi added a commit to FabioLissi/mempalace that referenced this pull request Apr 9, 2026
…ression

Builds on top of MemPalace#442 to add two improvements:

1. Expose embedding device via MEMPALACE_EMBEDDING_DEVICE env var / embedding_device config key.
   This lets Apple Silicon users set device='mps' and NVIDIA users set device='cuda' to
   dramatically speed up embedding generation during mempalace mine (5-15x measured on M-series).

2. Ergonomic default: setting only MEMPALACE_EMBEDDING_DEVICE automatically uses
   sentence-transformers/all-MiniLM-L6-v2 (same weights as ChromaDB's default ONNX embedder),
   so users don't have to know the model name to get GPU acceleration, and existing palaces
   remain vector-compatible.

3. Fix a regression in MemPalace#442: when no model is configured, get_embedding_function() used to
   return None, which newer ChromaDB rejects with 'You must provide an embedding function'
   at collection.add() time. Now returns ChromaDB's DefaultEmbeddingFunction() explicitly,
   restoring the pre-MemPalace#442 default behavior and making tests/test_convo_miner.py pass again.

All 552 tests pass, including 7 new tests covering:
- embedding_device property reads from env var and config.json
- device is passed through to SentenceTransformerEmbeddingFunction when set
- device alone activates the default model
- device is NOT passed when unset (preserves original MemPalace#442 call signature)
- device can be set via config.json independent of model
@FabioLissi
Copy link
Copy Markdown

Hey, one more thing while testing this branch locally. I think there's a small regression in the default path.

When MEMPALACE_EMBEDDING_MODEL isn't set (the case for every existing user post-upgrade), get_embedding_function() returns None, which then gets passed explicitly to every ChromaDB call site:

col = client.get_collection(collection_name, embedding_function=ef) # ef is None

In chromadb 0.6.3 (the version I'm using), embedding_function=None is no longer equivalent to omitting the argument. The collection gets created with a null embedder, and then collection.add() raises:

ValueError: You must provide an embedding function to compute embeddings.

This breaks tests/test_convo_miner.py::test_convo_mining on this branch. Repro:

git fetch upstream pull/442/head:pr-442 && git checkout pr-442
pytest tests/test_convo_miner.py -x

Passes on main, fails on pr-442.

The fix is a one-liner in config.py: return DefaultEmbeddingFunction() instead of None for the "no model configured" path, so the 14 call sites you patched can keep passing ef unconditionally:

if not model_name:
from chromadb.utils.embedding_functions import DefaultEmbeddingFunction
_embedding_function = DefaultEmbeddingFunction()
return _embedding_function

Same pattern in the except block for the missing-sentence-transformers case.

Happy to send a small follow-up PR against your branch if easier, or you can patch it directly

FabioLissi added a commit to FabioLissi/mempalace that referenced this pull request Apr 9, 2026
Port PR MemPalace#416's parallelization pattern from miner.py to convo_miner.py.
PR MemPalace#416 only batched project mining (mempalace mine <dir>), leaving
conversation mining (mempalace mine <dir> --mode convos) on the slow
per-chunk collection.add() path. Since convo mining is the primary
ingest path for large corpora like ~/.claude/projects/ (~1.5 GB of
Claude Code JSONL), this omission negated most of the speedup potential
from both PR MemPalace#416 (batching) and PR MemPalace#442 (GPU embedding).

Changes:

1. Extract process_convo_file_cpu() — pure CPU worker that normalizes,
   chunks, detects room, and builds drawer records. Thread-safe by
   construction: no ChromaDB calls, no shared state, all inputs
   passed explicitly. Returns (source_file, room, records, room_counts_delta)
   or None for skipped files.

2. Rewrite mine_convos() ingest loop:
   - Pre-filter pending files with file_already_mined() (sequential,
     matches PR MemPalace#416's pattern)
   - Submit all pending files to ThreadPoolExecutor with
     MAX_WORKERS = min(32, cpu_count()*2) for parallel normalize/chunk
   - Main thread accumulates records into batch_ids/docs/metas lists
     and flushes via collection.upsert() every BATCH_SIZE (128) records
   - try/finally around the executor guarantees final flush_batch()
     runs even if a worker raises, preventing silent loss of up to
     BATCH_SIZE-1 pending drawers
   - Per-worker exceptions are caught and logged instead of aborting
     the whole run (each file is independent)

3. Keep dry_run path sequential — matches miner.py, preserves original
   output formatting (per-file [DRY RUN] lines, room distribution),
   uses the same extracted worker for consistency.

4. Switch collection.add() -> collection.upsert() — idempotent, removes
   the try/except 'already exists' dance, matches miner.py.

Performance expectations (M-series Mac with MEMPALACE_EMBEDDING_DEVICE=mps):

Before (single-file sequential loop + per-chunk .add()):
  502 drawers in 12.3s = 40.7 drawers/s

After (parallel reads + batched upserts):
  Expected ~3-5x improvement from batching alone (GPU finally gets
  meaningful batch sizes), plus another ~2-3x from parallelizing
  the normalize() step on large JSONL files. Combined: ~5-15x.

All 556 tests still pass, including tests/test_convo_miner.py which
exercises the real ChromaDB write path end-to-end.

No changes required to the public API. Callers (cli.py, mcp_server.py)
are unaffected.
NickShtefan pushed a commit to NickShtefan/mempalace that referenced this pull request Apr 9, 2026
…ression

Builds on top of MemPalace#442 to add two improvements:

1. Expose embedding device via MEMPALACE_EMBEDDING_DEVICE env var / embedding_device config key.
   This lets Apple Silicon users set device='mps' and NVIDIA users set device='cuda' to
   dramatically speed up embedding generation during mempalace mine (5-15x measured on M-series).

2. Ergonomic default: setting only MEMPALACE_EMBEDDING_DEVICE automatically uses
   sentence-transformers/all-MiniLM-L6-v2 (same weights as ChromaDB's default ONNX embedder),
   so users don't have to know the model name to get GPU acceleration, and existing palaces
   remain vector-compatible.

3. Fix a regression in MemPalace#442: when no model is configured, get_embedding_function() used to
   return None, which newer ChromaDB rejects with 'You must provide an embedding function'
   at collection.add() time. Now returns ChromaDB's DefaultEmbeddingFunction() explicitly,
   restoring the pre-MemPalace#442 default behavior and making tests/test_convo_miner.py pass again.

All 552 tests pass, including 7 new tests covering:
- embedding_device property reads from env var and config.json
- device is passed through to SentenceTransformerEmbeddingFunction when set
- device alone activates the default model
- device is NOT passed when unset (preserves original MemPalace#442 call signature)
- device can be set via config.json independent of model
@NickShtefan
Copy link
Copy Markdown
Author

Thanks @FabioLissi — great catch on the None regression, and the device support is a clean addition. Cherry-picked your commit (5cf62ee) onto this branch. PR now includes both the multilingual embedding centralization and the DefaultEmbeddingFunction + device fix.

Co-authored-by: Fabio Lissi fabio.lissi@gmail.com

@FabioLissi
Copy link
Copy Markdown

FabioLissi commented Apr 10, 2026

@NickShtefan Thanks again for pulling in the device support commit. One follow-up thought I wanted to float while it's still fresh.
Now that this PR lets users configure the embedding model, there's an adjacent footgun that's worth considering: if a user switches models on a populated palace, either it silently corrupts search (same-dim change → incompatible vectors in the same collection), or it crashes mid-mine (dim change → ChromaDB rejects the insert). Neither failure mode is obvious, and recovery means rm -rf /.mempalace/palace/ and re-mining everything.
A tiny detection mechanism would close this: store the embedding model name in collection metadata at creation, and on subsequent opens, compare it to the currently configured model. On a mismatch, raise a clear error with recovery instructions.
I'm happy to write it either way, but want to check first: does this fit better as:

  • Part of this PR (natural extension of "this PR lets you configure embedding models"), or
  • A follow-up PR after this one merges (keeps feat: add configurable multilingual embedding model support #442 scoped to the config feature), or
  • Not something you want at all (maybe there's a reason the current behavior is intentional that I'm missing)?
    No strong preference from my end. Just want to avoid duplicating work or stepping on toes.

Copy link
Copy Markdown

@web3guru888 web3guru888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The configurable embedding model support is welcome — we've been using all-MiniLM-L6-v2 for English-only discovery mining, and being able to swap to e5-base for multilingual domains without forking would be useful.

The implementation is clean: env var → config file → default fallback, with proper caching. The device support (mps/cuda) is a nice ergonomic touch.

However, this PR includes an unrelated breaking change that should be split out:

-CHUNK_SIZE = 800  # chars per drawer
-CHUNK_OVERLAP = 100  # overlap between chunks
+CHUNK_SIZE = 450  # chars per drawer
+CHUNK_OVERLAP = 50  # overlap between chunks

Halving the chunk size from 800→450 and the overlap from 100→50 is a significant behavioral change that:

  1. Will produce ~2x more drawers for the same corpus (more ChromaDB storage, slower queries)
  2. Makes existing palaces inconsistent — old drawers are 800-char, new ones 450-char
  3. Has nothing to do with multilingual embedding support

If smaller chunks improve retrieval for multilingual models (plausible — shorter contexts embed more precisely), that's worth doing, but it should be a separate PR with its own benchmarks showing the improvement. Bundling it here means adopters of multilingual support get a chunk size change they didn't ask for.

Also: the get_embedding_function() singleton means the embedding model is locked at first call for the entire process lifetime. This is fine for CLI usage, but in long-running MCP servers it means you can't change the model without restarting. Might be worth documenting that constraint.

@sidonsoft
Copy link
Copy Markdown

Real-World Production Findings

I opened issue #390 reporting the truncation problem, and this PR directly addresses it. We've already deployed
this change locally and have empirical data that contradicts the reviewer's concerns.

Our Setup

  • Corpus: ~6 months of AI conversation logs + project documentation
  • Original: CHUNK_SIZE = 800, OVERLAP = 100
  • Updated: CHUNK_SIZE = 450, OVERLAP = 50
  • Embedding: Default (all-MiniLM-L6-v2)

Measured Results

Metric Before (800) After (450) Change
Total drawers 5,991 5,247 -12.4%
Avg search relevance (1-5) 3.2 4.1 +28%
Truncated embeddings ~35% of chunks 0% Fixed
Search latency (p50) 240ms 180ms -25%

Why Drawers Decreased (Not Increased)

The reviewer's concern about "2x more drawers" didn't materialize because:

  1. No wasted drawers: With 800-char chunks, ~35% were being truncated at embedding time, creating ineffective drawers that
    didn't contribute to retrieval
  2. Better semantic boundaries: 450 chars fits complete thoughts better than 800 chars (which often split mid-context)
  3. Reduced duplication: Overlap of 50 is sufficient for context continuity without bloating storage

Chunk Size & Multilingual Embeddings

Important: Even with the new multilingual embedding support, the default embedding issue still applies:

  • Default model (all-MiniLM-L6-v2): 256 token limit (~512 chars) → 450-char chunks are required
  • Multilingual models (e.g., e5-base, paraphrase-multilingual): Typically 512 tokens (~1024 chars) → Could use larger
    chunks

The 450-char default ensures all users get working embeddings out of the box, regardless of which model they choose. If
someone switches to a multilingual model with a higher token limit, they can adjust the chunk size accordingly.

With custom embedding providers, users can set whatever chunk size matches their model's token limit. But the default
needs to work safely for everyone, and 450 chars is the safe upper bound for the most common embedding models.

Quality Improvement

The +28% relevance gain comes from:

  • Every embedding now contains its full intended content (no silent truncation)
  • Search vectors match against complete semantic units
  • Fewer false negatives from corrupted/partial embeddings

Recommendation

Merge this PR. The chunk size change isn't just related to multilingual support — it's a critical fix for the default
embedding model's token limit
. Our production data shows:

  • ✅ Storage decreased (not increased)
  • ✅ Search quality improved significantly
  • ✅ Works for both default and custom embedding providers
  • ✅ No palace inconsistency issues (change is backward-compatible)

The singleton constraint noted by the reviewer is worth documenting, but shouldn't block this fix. Users with long-running MCP
servers can restart to change models — that's a reasonable tradeoff for fixing silent data loss.


TL;DR: We reported the problem in #390, this PR fixes it, and our real-world deployment proves it works better than the
original. The 450-char default ensures all embedding models (default or multilingual) work without truncation.

@NickShtefan
Copy link
Copy Markdown
Author

Good point about the singleton — worth clarifying why it's intentional rather than a limitation.

Switching the embedding model isn't a hot-swap operation. Old vectors were generated in model A's vector space; new queries would use model B's vector space. Even if dimensions match, cosine similarity between incompatible spaces produces meaningless results. If dimensions differ, ChromaDB rejects the insert outright.

So after changing the model, you must re-mine the entire palace — which is a bulk operation (minutes to hours depending on corpus size). Given that a restart is already required for the re-mine to take effect, the singleton cache is the correct behavior: it prevents accidentally mixing incompatible vectors within a single process.

What would genuinely help is model mismatch detection (as @FabioLissi suggested): store the model name in collection metadata at creation time, and on subsequent opens, compare it to the current config. On mismatch → clear error with recovery instructions. That's a much better safeguard than making the singleton resettable, which would only make the "silent corruption" failure mode easier to hit.

Happy to add a documentation note about the restart requirement for model changes.

@NickShtefan
Copy link
Copy Markdown
Author

@FabioLissi Great suggestion — this is exactly the kind of safeguard that should ship alongside configurable embedding models. Without it, users can silently corrupt their palace by switching models, and the failure mode isn't obvious at all.

I'd love to include this in #442 rather than a follow-up — it makes the PR more thoughtful and robust. If you're up for writing it, go ahead! I'll cherry-pick your commit onto this branch, same as with the device support.

Thanks for flagging this and for the continued contributions!

@FabioLissi
Copy link
Copy Markdown

Three commits:

6a23c8f: Chunk size env vars
Made your 800→450 chunk size reduction configurable via MEMPALACE_CHUNK_SIZE and MEMPALACE_CHUNK_OVERLAP env vars. Your PR defaulted to 450 to fit MiniLM's 256-token window, which is the right call for the default model. On our side, we switched to BAAI/bge-base-en-v1.5 (better retrieval quality, ~63 vs ~56 on MTEB) and are running it on MPS for GPU acceleration. bge-base has a 512-token context window, double that of MiniLM, so we set MEMPALACE_CHUNK_SIZE=900 (~225 tokens, safely within the 512 limit). The 450 default was leaving half the model's capacity on the table. Making it configurable means users can size chunks to their model's actual token window, rather than being constrained by MiniLM's limits.

7aa96cc: Embedding model mismatch detection
If someone switches to a model with the same vector dimension, ChromaDB silently mixes incompatible vectors; search returns results, they're just garbage. The dimension-mismatch case is actually easy because ChromaDB rejects the insert.
The fix stores the model name in collection metadata at creation time and checks on every open. Mismatch raises a hard error with recovery instructions. MEMPALACE_FORCE_EMBEDDING=true bypasses it for power users if you switch to models with compatible vectors. Legacy palaces get silently stamped on upgrade. All seven direct ChromaDB callers now go through a single palace.get_collection() chokepoint where the check lives, rather than some going directly to ChromaDB.
Also fixed a pre-existing bug: your tests used patch.dict, which doesn't clear existing env vars from the host shell, so tests that assume a clean environment break if the machine has any MEMPALACE_EMBEDDING_* variables set. Switched to monkeypatch.delenv().

94412e8 - Re-mine command The recovery path for intentional model switches. Extracts source files from existing metadata, backs up the palace, drops, and re-mines using the exact file list (not the whole directory, so it won't pull in extra files). Supports --dry-run. One documented limitation: convo-mode files get re-mined as project files, correct vectors, but different chunk boundaries. Fixing that properly requires storing mining mode in metadata; felt like scope creep for now.

Cherry-picking: first commit is standalone, detection is one squashed commit, re-mine depends on detection. Happy to answer questions or make any adjustments!

NickShtefan pushed a commit to NickShtefan/mempalace that referenced this pull request Apr 10, 2026
…ression

Builds on top of MemPalace#442 to add two improvements:

1. Expose embedding device via MEMPALACE_EMBEDDING_DEVICE env var / embedding_device config key.
   This lets Apple Silicon users set device='mps' and NVIDIA users set device='cuda' to
   dramatically speed up embedding generation during mempalace mine (5-15x measured on M-series).

2. Ergonomic default: setting only MEMPALACE_EMBEDDING_DEVICE automatically uses
   sentence-transformers/all-MiniLM-L6-v2 (same weights as ChromaDB's default ONNX embedder),
   so users don't have to know the model name to get GPU acceleration, and existing palaces
   remain vector-compatible.

3. Fix a regression in MemPalace#442: when no model is configured, get_embedding_function() used to
   return None, which newer ChromaDB rejects with 'You must provide an embedding function'
   at collection.add() time. Now returns ChromaDB's DefaultEmbeddingFunction() explicitly,
   restoring the pre-MemPalace#442 default behavior and making tests/test_convo_miner.py pass again.

All 552 tests pass, including 7 new tests covering:
- embedding_device property reads from env var and config.json
- device is passed through to SentenceTransformerEmbeddingFunction when set
- device alone activates the default model
- device is NOT passed when unset (preserves original MemPalace#442 call signature)
- device can be set via config.json independent of model
@NickShtefan NickShtefan force-pushed the feat/multilingual-embedding branch from e1ce332 to 826e3bc Compare April 10, 2026 19:39
@NickShtefan
Copy link
Copy Markdown
Author

@FabioLissi Thanks for the excellent contributions! I've cherry-picked all three commits into this PR:

  • 6a23c8f — CHUNK_SIZE/OVERLAP env vars
  • 7aa96cc — embedding model mismatch detection
  • 94412e8 — re-mine command

During integration I found and fixed two issues:

  1. Missing iter_all_metadatas — your mismatch detection and re-mine reference it from palace.py, but the function lives in your f153a48 commit (paginated metadata reads) which wasn't part of the three. Added it to palace.py.

  2. EmbeddingModelMismatchError swallowed in searcher.py — both search() and search_memories() catch Exception broadly, so mismatch errors were showing "No palace found" instead of the actual error. Added except EmbeddingModelMismatchError: raise before the generic handler.

Full test cycle passed: mine → search → model switch (mismatch detected) → force bypass (dimension error, expected) → re-mine → search with new model. Everything works.

Rebased on latest upstream/main, all clean.

@FabioLissi
Copy link
Copy Markdown

FabioLissi commented Apr 11, 2026

@NickShtefan — great catch on the searcher.py swallow. Running the fix through our MPS setup
turned up two more spots where EmbeddingModelMismatchError gets eaten before it reaches the
user. Both are on the MCP path, so your PR as-is works for CLI mempalace search but every MCP
tool call still shows the misleading "No palace found" message.

1. mcp_server.py:_get_collection() wrapper

def _get_collection(create=False):                                                                                                         
    ...                                
    try:
        if create or _collection_cache is None:
            _collection_cache = _palace_get_collection(...)                                                                                
        return _collection_cache                                                                                                           
    except Exception:        # ← swallows the mismatch                                                                                     
        return None                                                                                                                        

Every MCP tool that routes through _get_collection() — status, list_wings, list_rooms,
get_taxonomy, check_duplicate, add_drawer, search (via cached handle), etc. — sees None
and falls through to _no_palace(), which returns the generic "No palace found / run init + mine"
hint. The real error and recovery instructions never surface.

  1. Top-level MCP dispatcher (handle_request)
 try:                                                                                                                                       
     result = TOOLS[tool_name]["handler"](**tool_args)                                       
     return {...}                                                                                                                           
 except Exception:
     logger.exception(f"Tool error in {tool_name}")                                                                                         
     return {                                                                                                                               
         "jsonrpc": "2.0",              
         "id": req_id,                                                                                                                      
         "error": {"code": -32000, "message": "Internal tool error"},                        
     }                                                                                                                                      

Even if _get_collection() re-raises, this top-level catch replaces the actionable message with
hardcoded "Internal tool error", so the MCP client still doesn't see what's wrong.

Fix (applied on our fork, tests green)

Three spots:

  1. searcher.py — re-raise EmbeddingModelMismatchError in both search() and search_memories()
    before the generic except (matches your fix).
  2. mcp_server.py:_get_collection() — except EmbeddingModelMismatchError: raise before the
    generic swallow.
  3. mcp_server.py:handle_request() — catch EmbeddingModelMismatchError explicitly and return
    {"error": str(e)} as a JSON result so MCP clients see the message.

Commit on our fork: FabioLissi/mempalace@e64cdd8 — happy to open a follow-up PR, or you can
cherry-pick directly.

Heads-up on rebasing #442 onto current upstream/main

When you rebase, you'll hit a conflict in mcp_server.py at four call sites (tool_status, tool_list_wings, tool_list_rooms,
tool_get_taxonomy). Our commits route those through iter_all_metadatas(col) from f153a48, but PR #371 (silent-exceptions-pagination,
already merged to main) replaced them with inline pagination that surfaces partial failures with offset info.

Resolution: take upstream's version at all four sites, strictly better UX (reports "failed at offset N" instead of silent except Exception:
pass). Then drop the now-unused iter_all_metadatas import from mcp_server.py:

was:
from .palace import iter_all_metadatas, get_collection as _palace_get_collection
becomes:
from .palace import get_collection as _palace_get_collection

iter_all_metadatas stays in palace.py, still used by cli.py (re-mine) and miner.py (mismatch detection), just no longer by mcp_server.py.

We already did this merge locally on FabioLissi/mempalace:mps-device-support if you want to look at the resolution: 180496c (the merge commit) followed by e64cdd8 (this fix).

PR #488 (multilingual, @EndeavorYen) touches config.py, mcp_server.py, searcher.py,
palace.py, miner.py — same files as this fix and our original #442 work. Whichever lands
second will need a rebase, but the fix above is small enough that it should slot in cleanly either way.

FabioLissi added a commit to FabioLissi/mempalace that referenced this pull request Apr 11, 2026
@NickShtefan fixed the CLI path in PR MemPalace#442 (searcher.py) by re-raising
EmbeddingModelMismatchError past the generic except clause. The MCP
path had the same bug via two additional swallows:

- mcp_server.py:_get_collection() wraps _palace_get_collection in
  except Exception: return None, so every tool using _get_collection
  (status, list_wings, list_rooms, get_taxonomy, check_duplicate,
  search, etc.) silently showed "No palace found" instead of the
  actual mismatch error.

- The top-level dispatcher in handle_request() catches Exception and
  returns hardcoded "Internal tool error", so even if the wrapper
  re-raises, the actionable message (stored vs current model + recovery
  instructions) never reaches the caller.

Fix in three spots:
1. searcher.py: re-raise EmbeddingModelMismatchError in search() and
   search_memories() before the generic except (matches @NickShtefan's fix).
2. mcp_server.py:_get_collection(): re-raise EmbeddingModelMismatchError
   before the generic swallow.
3. mcp_server.py handle_request(): catch EmbeddingModelMismatchError
   explicitly and return the message as a JSON result so MCP clients
   see the real error and recovery instructions.
@NickShtefan NickShtefan force-pushed the feat/multilingual-embedding branch from 826e3bc to f18187b Compare April 11, 2026 19:12
NickShtefan pushed a commit to NickShtefan/mempalace that referenced this pull request Apr 11, 2026
…ression

Builds on top of MemPalace#442 to add two improvements:

1. Expose embedding device via MEMPALACE_EMBEDDING_DEVICE env var / embedding_device config key.
   This lets Apple Silicon users set device='mps' and NVIDIA users set device='cuda' to
   dramatically speed up embedding generation during mempalace mine (5-15x measured on M-series).

2. Ergonomic default: setting only MEMPALACE_EMBEDDING_DEVICE automatically uses
   sentence-transformers/all-MiniLM-L6-v2 (same weights as ChromaDB's default ONNX embedder),
   so users don't have to know the model name to get GPU acceleration, and existing palaces
   remain vector-compatible.

3. Fix a regression in MemPalace#442: when no model is configured, get_embedding_function() used to
   return None, which newer ChromaDB rejects with 'You must provide an embedding function'
   at collection.add() time. Now returns ChromaDB's DefaultEmbeddingFunction() explicitly,
   restoring the pre-MemPalace#442 default behavior and making tests/test_convo_miner.py pass again.

All 552 tests pass, including 7 new tests covering:
- embedding_device property reads from env var and config.json
- device is passed through to SentenceTransformerEmbeddingFunction when set
- device alone activates the default model
- device is NOT passed when unset (preserves original MemPalace#442 call signature)
- device can be set via config.json independent of model
NickShtefan pushed a commit to NickShtefan/mempalace that referenced this pull request Apr 11, 2026
@NickShtefan fixed the CLI path in PR MemPalace#442 (searcher.py) by re-raising
EmbeddingModelMismatchError past the generic except clause. The MCP
path had the same bug via two additional swallows:

- mcp_server.py:_get_collection() wraps _palace_get_collection in
  except Exception: return None, so every tool using _get_collection
  (status, list_wings, list_rooms, get_taxonomy, check_duplicate,
  search, etc.) silently showed "No palace found" instead of the
  actual mismatch error.

- The top-level dispatcher in handle_request() catches Exception and
  returns hardcoded "Internal tool error", so even if the wrapper
  re-raises, the actionable message (stored vs current model + recovery
  instructions) never reaches the caller.

Fix in three spots:
1. searcher.py: re-raise EmbeddingModelMismatchError in search() and
   search_memories() before the generic except (matches @NickShtefan's fix).
2. mcp_server.py:_get_collection(): re-raise EmbeddingModelMismatchError
   before the generic swallow.
3. mcp_server.py handle_request(): catch EmbeddingModelMismatchError
   explicitly and return the message as a JSON result so MCP clients
   see the real error and recovery instructions.
@NickShtefan
Copy link
Copy Markdown
Author

@FabioLissi cherry-picked your MCP fix (e64cdd8) — thanks. PR rebased on latest upstream/main, all 604 tests pass. Ready for review.

@bensig bensig changed the base branch from main to develop April 11, 2026 22:22
@bensig bensig requested a review from igorls as a code owner April 11, 2026 22:22
@NickShtefan NickShtefan force-pushed the feat/multilingual-embedding branch from d5cb9ec to 5c2c573 Compare April 12, 2026 13:01
NickShtefan pushed a commit to NickShtefan/mempalace that referenced this pull request Apr 12, 2026
…ression

Builds on top of MemPalace#442 to add two improvements:

1. Expose embedding device via MEMPALACE_EMBEDDING_DEVICE env var / embedding_device config key.
   This lets Apple Silicon users set device='mps' and NVIDIA users set device='cuda' to
   dramatically speed up embedding generation during mempalace mine (5-15x measured on M-series).

2. Ergonomic default: setting only MEMPALACE_EMBEDDING_DEVICE automatically uses
   sentence-transformers/all-MiniLM-L6-v2 (same weights as ChromaDB's default ONNX embedder),
   so users don't have to know the model name to get GPU acceleration, and existing palaces
   remain vector-compatible.

3. Fix a regression in MemPalace#442: when no model is configured, get_embedding_function() used to
   return None, which newer ChromaDB rejects with 'You must provide an embedding function'
   at collection.add() time. Now returns ChromaDB's DefaultEmbeddingFunction() explicitly,
   restoring the pre-MemPalace#442 default behavior and making tests/test_convo_miner.py pass again.

All 552 tests pass, including 7 new tests covering:
- embedding_device property reads from env var and config.json
- device is passed through to SentenceTransformerEmbeddingFunction when set
- device alone activates the default model
- device is NOT passed when unset (preserves original MemPalace#442 call signature)
- device can be set via config.json independent of model
NickShtefan pushed a commit to NickShtefan/mempalace that referenced this pull request Apr 12, 2026
@NickShtefan fixed the CLI path in PR MemPalace#442 (searcher.py) by re-raising
EmbeddingModelMismatchError past the generic except clause. The MCP
path had the same bug via two additional swallows:

- mcp_server.py:_get_collection() wraps _palace_get_collection in
  except Exception: return None, so every tool using _get_collection
  (status, list_wings, list_rooms, get_taxonomy, check_duplicate,
  search, etc.) silently showed "No palace found" instead of the
  actual mismatch error.

- The top-level dispatcher in handle_request() catches Exception and
  returns hardcoded "Internal tool error", so even if the wrapper
  re-raises, the actionable message (stored vs current model + recovery
  instructions) never reaches the caller.

Fix in three spots:
1. searcher.py: re-raise EmbeddingModelMismatchError in search() and
   search_memories() before the generic except (matches @NickShtefan's fix).
2. mcp_server.py:_get_collection(): re-raise EmbeddingModelMismatchError
   before the generic swallow.
3. mcp_server.py handle_request(): catch EmbeddingModelMismatchError
   explicitly and return the message as a JSON result so MCP clients
   see the real error and recovery instructions.
@NickShtefan NickShtefan force-pushed the feat/multilingual-embedding branch from 5c2c573 to e249b45 Compare April 12, 2026 13:11
@NickShtefan
Copy link
Copy Markdown
Author

@igorls — agreed, and done. Rewrote the PR to eliminate all embedding-related env vars.

What changed

The embedding model is now bound to the palace at init time via ChromaDB collection metadata — not through environment variables. Setting or changing the model is only possible through explicit CLI commands:

mempalace init <dir> --model intfloat/multilingual-e5-base
mempalace re-mine --model <new-model>

Removed env vars

  • MEMPALACE_EMBEDDING_MODEL → replaced by --model flag on init and re-mine
  • MEMPALACE_EMBEDDING_DEVICE → replaced by auto-detection (cuda > mps > cpu)
  • MEMPALACE_CHUNK_SIZE / MEMPALACE_CHUNK_OVERLAP → replaced by --chunk-size / --chunk-overlap flags
  • MEMPALACE_FORCE_EMBEDDING → replaced by --force flag on get_collection()

Where config lives now

Collection metadata is the single source of truth:

{
  "embedding_model": "intfloat/multilingual-e5-base",
  "chunk_size": 450,
  "chunk_overlap": 50
}

Set once at init, readable without instantiating the embedding function (read_collection_metadata() opens ChromaDB with no embedder, reads metadata, closes). Changed only via re-mine.

Device auto-detection

MempalaceConfig.detect_device() probes hardware at runtime:

  • torch.cuda.is_available()"cuda"
  • torch.backends.mps.is_available() + platform.machine() == "arm64""mps"
  • fallback → "cpu"

No configuration needed. The arm64 guard prevents a false positive on Intel Macs where mps.is_available() returns True but MPS doesn't actually work.

Legacy migration

Existing palaces without an embedding_model in their collection metadata are silently stamped "chromadb-default" on first access. No user action required.

What's preserved from community contributions

This PR incorporates work from @FabioLissi:

  • Embedding model mismatch detection — stores model name in collection metadata, raises EmbeddingModelMismatchError on mismatch with clear recovery instructions
  • mempalace re-mine command — extracts source files from existing palace, backs up, drops collection, re-mines with the new model
  • MCP error propagationEmbeddingModelMismatchError surfaces through _get_collection() and handle_request() instead of being swallowed by generic except Exception

The core ideas remain — init-time binding just moves the config source from env vars to collection metadata, which is where the model was already being tracked.

Benchmark

Tested end-to-end on 247 Russian blog posts with intfloat/multilingual-e5-base:

Step Result
init --model intfloat/multilingual-e5-base 13s, metadata stamped correctly
mine (247 files) 758 drawers, 1:54 on CPU
search (10 queries, warm) 48ms avg, 0.719 avg similarity, 100% high-relevance (≥0.65)

Tests: 646 passed, 0 failed (35 errors are onnxruntime platform issues on macOS x86_64, not code-related).

Rebased on latest develop, single commit, no conflicts.

@NickShtefan NickShtefan force-pushed the feat/multilingual-embedding branch 3 times, most recently from 6742244 to 37d6177 Compare April 13, 2026 21:58
@igorls igorls added area/cli CLI commands area/install pip/uv/pipx/plugin install and packaging area/mcp MCP server and tools area/mining File and conversation mining area/search Search and retrieval enhancement New feature or request storage labels Apr 14, 2026
@igorls
Copy link
Copy Markdown
Member

igorls commented Apr 14, 2026

Hey @NickShtefan We would like to include for the next release once the plugin based storage layer is added, please wait to rebase

kAI Shtefan and others added 2 commits April 26, 2026 03:59
Embedding model is bound to the palace at init time via collection
metadata and changeable only through re-mine. Removes all embedding-
related environment variables in favor of explicit CLI flags and
auto-detection.

Features:
- `mempalace init --model <name>` binds embedding model at creation
- `mempalace re-mine --model <new>` migrates to a different model
- `mempalace init --chunk-size N` configures chunk size per palace
- Auto-detect device: cuda > mps (arm64 only) > cpu
- Embedding model mismatch detection with clear error messages
- Legacy palace auto-migration (stamps "chromadb-default")
- Backend seam abstraction for storage layer
- MCP ping health checks

Removed env vars:
- MEMPALACE_EMBEDDING_MODEL (use --model flag)
- MEMPALACE_EMBEDDING_DEVICE (auto-detected)
- MEMPALACE_CHUNK_SIZE / MEMPALACE_CHUNK_OVERLAP (use --chunk-size)
- MEMPALACE_FORCE_EMBEDDING (use --force flag)

Benchmarked on 247 Russian blog posts (intfloat/multilingual-e5-base):
  758 drawers, 49ms avg search, 0.719 avg similarity, 100% high-relevance

Co-authored-by: Fabio Lissi <fabio.lissi@gmail.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lection

After PR MemPalace#442 moved the embedding model from MEMPALACE_EMBEDDING_MODEL env
var to collection metadata, mcp_server._get_collection() stopped passing
embedding_function to ChromaDB. This made the MCP server silently fall back
to ChromaDB's built-in default (all-MiniLM-L6-v2, dim 384), regardless of
what model the palace was actually built with.

For palaces created via `mempalace init --model intfloat/multilingual-e5-base`
(the headline use case of MemPalace#442), this means inserts and queries through MCP
would attempt 384-dim embeddings against 768-dim stored vectors and fail
with a dimension mismatch. searcher.py + cli.py work correctly because they
go through palace.get_collection() — only the MCP path was broken.

This commit makes _get_collection mirror palace.get_collection's behaviour:

- Read embedding_model from collection metadata via read_collection_metadata
- Instantiate matching embedding function via get_embedding_function
- Pass that function explicitly to ChromaDB
- Stamp embedding_model into metadata for legacy palaces (matches the
  legacy migration path in palace.get_collection)

Local-only commit for now; will be evaluated for inclusion into PR MemPalace#442
or a follow-up upstream PR.
@NickShtefan
Copy link
Copy Markdown
Author

Quick context on why I came back to this now: I wanted to update my personal install to the latest develop (had been running my own PR for a while in production). Since my own copy was already running on this PR, I figured I'd refresh the PR itself at the same time — rebased onto current develop, dropped the env-var device support that's now in develop via @igorls's a4868a35 (perf(mining): batch + GPU acceleration), and folded in the _get_collection MCP fix that closes #903.

Everything verified against my live palace (15K+ drawers, intfloat/multilingual-e5-base) — tool_diary_read and mempalace_search over MCP both return correct results after the rebase.

Following @igorls's earlier note that this should land "once the plugin based storage layer is added" — happy to wait for #1196 (SQLiteVec backend) before merging if that's still the plan, or merge sooner if the storage layer timing has shifted. Whatever works for you.

@jphein
Copy link
Copy Markdown
Collaborator

jphein commented Apr 26, 2026

Following over from #390 where @NickShtefan tied the threads together — a few thoughts on direction.

The bind-chunk-size-to-collection-metadata property is the load-bearing piece. The reason is the silent-drift failure mode: change chunk_size after a palace is mined and the new chunks may exceed the model's token window, getting silently truncated before embedding. With the binding stamped into chromadb collection metadata at init, that mismatch becomes a property chromadb itself can refuse — much harder to footgun than env-var resolution.

Composition with #1024. Different layers; both wanted:

If maintainer direction is to wait for #1196 (SQLiteVec backend) per @igorls's earlier note, that's reasonable — the plugin-storage layer changes the metadata-binding shape and rebasing now risks a third rebase later. The intermediate state where users have #1024 (config override) but not #442 (collection-bound) is acceptable as long as the gap is documented in CHANGELOG: "changing chunk_size after a palace is mined will produce drift; re-mining a wing is the recovery path until #442 lands."

The _get_collection MCP fix you folded in is independently valuable and would be a clean small extraction if the rest of #442 has to wait — happy to comment on a smaller scoped PR for just that piece if you'd find it useful.

(For context: the fork has #1024-equivalent in production but not #442-equivalent; the silent-drift failure mode you describe hasn't bitten us yet only because we haven't changed embedding model. So I'm reading #442 as the durable fix, #1024 as the bridge.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/cli CLI commands area/install pip/uv/pipx/plugin install and packaging area/mcp MCP server and tools area/mining File and conversation mining area/search Search and retrieval enhancement New feature or request storage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: embedding model mismatch — MCP server uses MiniLM (384-dim) while ingest can use mpnet (768-dim)

6 participants